Optimizing read transaction startup timeUnicode ate my perf and all I got was
As an aside, I wonder how much stuff will break because the title of this post has in it.
The topic of this post is the following profiler output. GetSliceFromKey takes over 6% of our performance, or about 24.5 seconds out of the total run. That kinda sucks.
What is the purpose of this function? Well, RavenDB’s document ids are case insensitive, so we need to convert the key to lower case and then do a search on our internal index. That has quite a big cost associated with it.
And yes, we are aware of the pain. Luckily, we are already working with highly optimized codebase, so we aren’t seeing this function dominate our costs, but still…
Here is the breakdown of this code:
As you can see, over 60% of this function is spent in just converting to lower case, which sucks. Now, we have some additional knowledge about this function. For the vast majority of cases, we know that this function will handle only ASCII characters, and that Unicode document ids are possible, but relatively rare. We can utilize this knowledge to optimize this method. Here is what this will look like, in code:
Basically, we scan through the string, and if there is a character whose value is over 127 we fall to the slower method (slower being relative, that is still a string conversion in less than 25 μs).
Then we just find if a character is in the upper case range and convert it to lower case (ASCII bit analysis is funny, it was intentionally designed to be used with bit masking, and all sort of tricks are possible there) and store it in the buffer, or just store the original value.
This method went from taking 5.89% to taking just 0.6%, and we saved over 22 seconds(!) in the run. Again, this is under the profiler, and the code is heavily multi threaded. In practice, this means that the run took 3 seconds less.
Either way, we are still doing pretty good, and we don’t have that pile of poo, so I we’re good.