Click here to Skip to main content
15,568,681 members
Articles / Programming Languages / C#
Tip/Trick
Posted 24 Feb 2020

Stats

23.1K views
145 downloads
12 bookmarked

Efficiently Using Lists as Dictionary Keys in C#

Rate me:
Please Sign up or sign in to vote.
5.00/5 (8 votes)
24 Feb 2020MIT4 min read
A relatively safe, simple, yet high performance technique for using lists as dictionary keys.
Using collections as dictionary keys is sometimes necessary, but it can be a performance killer and unsafe. Here's how to make it faster and safer.

Introduction

When doing functional style programming in C#, you'll probably eventually run into a situation where you need to key a dictionary by an ordered list of items. There are good ways and not so good ways to do this. This article aims to present you with an efficient and relatively safe way to use lists as dictionary keys.

Conceptualizing this Mess

Dictionary keys must be comparable for equality, must present a proper hash code, and must be immutable. These requirements make it a little bit involved to use lists as dictionary keys since lists don't typically provide value semantics - in this case, item by item comparison and a proper hash code. Perhaps worse, lists allow you to modify them at any time.

In summary, there are two major issues here, and one is comparing lists item by item isn't very efficient, and using a standard IList<T> implementation as a dictionary key is fundamentally unsafe since again it can be modified at any time.

We can solve the latter more or less by requiring the use of IReadOnlyList<T> which provides list-like access without the methods that modify the list. We're going to take a liberty here though when we declare our KeyList<T> class - we'll be adding an Add() method to it. The reason for this is to allow you to fill the list in the first place. If we wanted to be extra safe, we'd nix the Add() method and force you to pass the items as an array into the constructor of the KeyList<T>. The reason this was not done was for performance. KeyList<T>'s primary purpose is performance, and passing in the data to the constructor would require a copy. Copies aren't horribly slow necessarily, but we don't need it here - we're okay here since anything that takes IReadOnlyList<T> will not see our Add() method anyway and this way, we avoid spurious copies.

The former requires a little more work. We have to implement value equality semantics on our KeyList<T>. Here, we pull an important trick though: We recompute the hash code as each item is added to the list, that way we don't have to compute it later. We then use that hash code as part of our equality check to short circuit if they aren't equal. Let's take a look at the code for the KeyList<T> class next.

Coding this Mess

C#
sealed class KeyList<T> : IReadOnlyList<T>, IEquatable<KeyList<T>>
{
    int _hashCode;
    List<T> _items;
    public KeyList()
    {
        _items = new List<T>();
        _hashCode = 0;
    }
    public KeyList(int capacity)
    {
        _items = new List<T>(capacity);
        _hashCode = 0;
    }
    public T this[int index] => _items[index];

    public int Count => _items.Count;

    public IEnumerator<T> GetEnumerator()
    {
        return _items.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    public void Add(T item)
    {
        _items.Add(item);
        if (null != item)
            _hashCode ^= item.GetHashCode();
    }
    public override int GetHashCode()
    {
        return _hashCode;
    }
    public bool Equals(KeyList<T> rhs)
    {
        if (ReferenceEquals(this, rhs))
            return true;
        if (ReferenceEquals(rhs, null))
            return false;
        if (rhs._hashCode != _hashCode)
            return false;
        var ic = _items.Count;
        if (ic != rhs._items.Count)
            return false;
        for(var i = 0;i<ic;++i)
            if (!Equals(_items[i], rhs._items[i]))
                return false;
        return true;
    }
    public override bool Equals(object obj)
    {
        return Equals(obj as KeyList<T>);
    }
}

The important methods here are Add() and the main Equals() method. You can see in Add() we're modifying the _hashCode field for each item added. You can see in the Equals() method we short circuit if the hash codes are not equal, before finally doing an item by item comparison. This can vastly improve performance. The other performance gain is in the GetHashCode() method which doesn't have to compute anything. Since these methods are typically called by the IDictionary<TKey,TValue> implementation, we want them to be as quick as possible, especially the GetHashCode() method which is called a lot by Dictionary<TKey, TValue>.

If you run the code in Release mode, you'll get accurate performance metrics. In Debug not so much, so the program will warn you if you're running in Debug. You can see we work the dictionaries, one filled with standard lists, the other filled with our special KeyList<T> instances. You should see a 5x+ performance increase with the KeyList<T> compared to using List<T> and since it only implements IReadOnlyList<T> for list access it's safer, too. Now both of our initial problems are solved. Enjoy.

Caveat

It has been pointed out that this is not correct if the T type is mutable. This is true. However, the same is true of Dictionary<TKey, TValue> and HashSet<T> so the same caveat applies there as here - do not change the values of items that are added to these lists, just like you should not change key values added to dictionaries and hashsets.

Using Other Containers

It is possible to adapt this technique to other collections if you need to. You can adapt it to a HashSet<T> if you need (relatively) efficient unordered comparisons. I've done this in some of my projects. Just make sure to use SetEquals() in your Equals() method to do the item by item comparison, because it's faster.

History

  • 24th February, 2020 - Initial submission

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
United States United States
Just a shiny lil monster. Casts spells in C++. Mostly harmless.

Comments and Discussions

 
QuestionA Suggestion Pin
George Swan25-Feb-20 3:22
George Swan25-Feb-20 3:22 
AnswerRe: A Suggestion Pin
honey the codewitch25-Feb-20 3:45
mvahoney the codewitch25-Feb-20 3:45 
AnswerUpdate Pin
honey the codewitch25-Feb-20 3:48
mvahoney the codewitch25-Feb-20 3:48 
GeneralRe: Update Pin
George Swan25-Feb-20 4:05
George Swan25-Feb-20 4:05 
GeneralRe: Update Pin
honey the codewitch25-Feb-20 4:15
mvahoney the codewitch25-Feb-20 4:15 
GeneralRe: Update Pin
George Swan25-Feb-20 5:42
George Swan25-Feb-20 5:42 
GeneralRe: Update Pin
honey the codewitch25-Feb-20 7:22
mvahoney the codewitch25-Feb-20 7:22 
GeneralRe: Update Pin
George Swan25-Feb-20 22:04
George Swan25-Feb-20 22:04 
GeneralRe: Update Pin
honey the codewitch25-Feb-20 22:35
mvahoney the codewitch25-Feb-20 22:35 
GeneralRe: Update Pin
George Swan28-Feb-20 21:28
George Swan28-Feb-20 21:28 
GeneralRe: Update Pin
honey the codewitch29-Feb-20 2:35
mvahoney the codewitch29-Feb-20 2:35 
GeneralRe: Update Pin
George Swan29-Feb-20 8:31
George Swan29-Feb-20 8:31 
GeneralRe: Update Pin
honey the codewitch29-Feb-20 11:36
mvahoney the codewitch29-Feb-20 11:36 
GeneralRe: Update Pin
George Swan29-Feb-20 23:22
George Swan29-Feb-20 23:22 
GeneralRe: Update Pin
honey the codewitch1-Mar-20 4:07
mvahoney the codewitch1-Mar-20 4:07 
GeneralCorrectness Pin
Dmitry Mukalov25-Feb-20 0:54
Dmitry Mukalov25-Feb-20 0:54 
GeneralRe: Correctness Pin
honey the codewitch25-Feb-20 1:21
mvahoney the codewitch25-Feb-20 1:21 
GeneralRe: Correctness Pin
Dmitry Mukalov25-Feb-20 3:14
Dmitry Mukalov25-Feb-20 3:14 
Roughly yes, but please consider the significant semantic difference between the scenarios around the keys and scenarios around the values. Not trying to justify any design decision made inside the Dictionary implementation, the frequency of scenarios which would imply mutation of a collection values is simply incomparable with frequency of scenarios when you need to mutate collection keys (if such scenarios exist at all). So being presented without any specific use case where such collection can be useful it seems to be claimed as a multipurpose collection with just additional specific properties. But it's not a multipurpose by its design as implicitly restricts the entire bunch of scenarios which are supposed to mutate the collection values. This indicates that either the collection should be more specialized (you can constrain the element type to be a struct to enforce an item copy when it's being added or require them to be cloneable in any way) or the problem can be solved in ad-hoc manner when some types which need this type of functionality can implement that ensuring all safety considerations without making such functionality generalized.
GeneralRe: Correctness Pin
honey the codewitch25-Feb-20 3:41
mvahoney the codewitch25-Feb-20 3:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.