Fast Dictionary and struct generic arguments
One of the most common issues that come up with performance tuning is that dictionaries are expensive. It isn’t so much that a single dictionary lookup is expensive, it is the sheer number of them. Dictionaries are used everywhere, and they are often used in very hot codepaths (as caching).
Numerous times we have dealt with that with trying to avoid the dictionary access (often favoring an array based lookup if we can get away with it). But at some point we have decided to implement our own dictionary. Here is how it looks like:
The actual dictionary impl is very close to the standard one, but that isn’t what make it fast. Note the generic argument? If we pass a struct implementing IEqualityComparer generic argument, then in most cases, the compiler and the JIT are going to generate code that is able to eliminate all virtual calls. And if there is a trivial equality comparison, that means that you can eliminate all calls and inline the whole thing inside that generic dictionary implementation.
In other words, we eliminate a minimum of two virtual calls per key lookup, and in some cases, we can eliminate even the method calls themselves, and that turn out to be quite important when the number of key lookups is in the billions.
And this is from midway through the optimizations.
Comments
Nice optimization technique. Does it work with every JITter or maybe there are some cases when it fails to inline these calls? Also, have you seen the recent work on interface devirtualization https://github.com/dotnet/coreclr/pull/10192 ?
Is that the same as passing an IEqualityComparer to the Dictionary ctor? It seems that I'm unable to find that class in Raven's source... are you using that there?
@Scooletz the ability of the JIT to optimize structs code has been around for a long time already (not only on CoreCLR). There are cases where it cannot inline, but still avoids the virtual calls. And yes, we have been following and supporting with code samples the work of Andy on devirtualization; that will have a big performance impact for us.
@CheloXL no. Given that Dictionary holds a reference to IEqualityComparer<T> then all calls are dispatched via virtual calls, so essentially what we avoid is exactly that.
This is the dictionary code: https://github.com/ravendb/ravendb/blob/v4.0/src/Sparrow/Collections/FastDictionary.cs#L75 And this is how it is used: https://github.com/Corvalius/ravendb/commit/75672387e8addd28302aeb7704f8b1edcc60bf08
Is there benchmark anywhere against the Dictionary.TryGet? Better with BenchmarkDotNet.
@Maksim why a micro-benchmark when you can get the real deal? ;)
We updated the blog-post with an actual example from an imperfectly inlined case. We couldn't inline CompareInline because the inliner refuse to do so on method size grounds, but we know the distribution of the keys so we can use an alternative more compact hashing function that will bring down the total size (and probably be even faster for this case). But this is a apples to apples comparison.
Interesting code. Does Ravendb's licensing prevent me from using this in another project? I did notice a typo here. Pretty sure it's supposed to be supported. At least multiple was spelled correctly unlike AllowMultipuleAsyncOperations ;-)
@Philippe you can use it, we are going to release on nuget the most common data structure under MIT License anyways. The typo fix will be included in my next commit.
I did a similar thing and found that the standard dictionary could be optimized by 6x for certain insert-only workloads. You can delete all kinds of code under that constraint. Also, get rid of the expensive modulo operator which is a historic accident.
@Federico Sweet, make sure to make a blog post about it here when it's available!
Thanks for the idea! I took S.C.G.Dictionary from CoreCLR, added
where TKey : IEquatable<TKey>
and replaced comparer with constrained calls toTKey.Equals
, the speedup for<int,int>
is c.70%, with just a few lines changed. Interesting to compare with your implementation when it is released as a standalone package!For custom types, GetHashCode() and Equality comparison make difference less visible, though.
source code
So how will we be notified when you publish the library with fast dictionary implementation?
Dejan, When that will happen, I'll post about it in this blog.
Interesting, that I've found the same idea in the post dated back to 2004: https://www.codeproject.com/Articles/8531/Using-generics-for-calculations
Anyway, a cool technique that I'am already applying for
HashMap
implementation in ImTools.Comment preview