One of the most common issues that come up with performance tuning is that dictionaries are expensive. It isn’t so much that a single dictionary lookup is expensive, it is the sheer number of them. Dictionaries are used everywhere, and they are often used in very hot codepaths (as caching).
Numerous times we have dealt with that with trying to avoid the dictionary access (often favoring an array based lookup if we can get away with it). But at some point we have decided to implement our own dictionary. Here is how it looks like:
The actual dictionary impl is very close to the standard one, but that isn’t what make it fast. Note the generic argument? If we pass a struct implementing IEqualityComparer generic argument, then in most cases, the compiler and the JIT are going to generate code that is able to eliminate all virtual calls. And if there is a trivial equality comparison, that means that you can eliminate all calls and inline the whole thing inside that generic dictionary implementation.
In other words, we eliminate a minimum of two virtual calls per key lookup, and in some cases, we can eliminate even the method calls themselves, and that turn out to be quite important when the number of key lookups is in the billions.
And this is from midway through the optimizations.