GoTo based optimizations

time to read 2 min | 372 words

One of the things that we did recently was go over our internal data structures in RavenDB and see if we can optimize them. Some of those changes are pretty strange if you aren’t following what is actually going on. Here is an example:

Before After

image

image[10]

 

What is the point in this kind of change? Well, let us look at the actual assembly generated by this, shall we?

Before After

As you can see, the second option is much shorter, and in the common case, it involves no actual jumping. This ends up being extremely efficient. Note that because we return a value from the ThrowForEmptyStack, the assembly generated is extremely short, since we can rely on the caller to clean us up.

This was run in release mode, CoreCLR, x64. I got the assembly from the debugger, so it is possible that there are some optimizations that hasn’t been applied because the debugger is attached, but it is fairly closed to what should happen for real, I think. Note that the ThrowForEmptyStack is inlined, even though it is an exception only method. If we use [MethodImpl(MethodImplOptions.NoInlining)], it will stop it, but the goto version will still generate better code.

The end result is that we are running much less code, and that makes me happy. In general, a good guide for assembly reading is that shorter == faster, and if you are reading assembly, you are very likely in optimization mode, or debugging the compiler.

I’m pretty sure that the 2.0 release of CoreCLR already fixed this kind of issues, by the way, and it should allow us to write more idiomatic code that generates very tight machine code.