Blind optimizations: Making MurmurHash3 faster
I needed to use Bloom Filters, and I didn’t want to use the implementation we already have in RavenDB. That one is too tied up in our infrastructure to be easily used. So I found the Maybe.NET project. it is nice, but it doesn’t have a CoreCLR Nuget package. This meant that I had to go into the code and look at what is going on. And I started going, “nope, that isn’t the way I want it done”.
Now, to be clear, the code in the project is great. It is clear, obvious and idiomatic C#. It is also raising every red flag I have for inefficient code detection that I had built over the past few years of making RavenDB faster. Because this is such as small sample that I thought it would make a good blog post, because I can explain what the code is doing and what changes I’m doing there, and why. Before I can get to the Bloom Filter implementation, I need to use the hash function, and that was just a full stop for me. Let me show you what I mean. The key parts are below, and you can find the full code here.
This is a hash method, it produced several hashes for the purpose of the bloom filter. I underlined in red every time that this code allocates. As you can see, this allocates a lot.
There is also the fact that this accepts a generic object as a parameter and serialize that to a byte. I’m ignoring the allocations in that part of the code, but I’m assuming that they are significant. So let’s talk about how we can optimize this function?
Well, to start with, I’m going to decide that accepting an object is too high level. This is a hash function, the caller should give us bytes. Now, let’s see what impact that has on us, shall we?
Now this is much better. We don’t allocate anything in the ComputeHashes method and we give the compiler the chance to build really efficient code here. We can probably require that the maxHashValue be a power of two and avoid the mod operation in favor of bit shifting, but I’m not writing RavenDB here and worrying about every little thing. I’ll leave that part as an exercise for the reader.
Now, let’s look at the actual Hash function, shall we?
There is quite a bit going on, but essentially, I’m using the fixed to get the pointer from the span, then compute the hash in 4 bytes at once, then handle the remainder. There is not allocations and this has far fewer instructions that actually need to run. Note that this would be a great place to stop and run unit tests to verify that I didn’t break something, I’m going to assume that I got it write and close this post, I still want to talk about the optimizations that are available for the bloom filter.