Excerpts from the RavenDB Performance team reportOptimizing Memory Comparisons
Note, this post was written by Federico. Where I had notes or stuff to extend, I explicitly marked it as such. In the previous post after inspecting the decompiled source using ILSpy we were able to uncover potential things we could do.
Getting rid of unwanted type conversion may seem as an small cost, but let’s make an example. Let’s say we are comparing 2 memory arrays of 16 bytes and they are equal (our worst case scenario).
Just for the sake of simplification from the 3 potential causes the memory is aligned so there is no need to the first 2 unwanted conversions. That leaves us with the main body as the only source of unwanted conversions.
Now this loops moves our pointer 4 bytes each time and causes 2 conversions. Therefore for a 16 bytes array (a pretty average size) we are performing 8 conversions, that is grand total of 8 conversions. Assuming our idealized processor, at 0.33 ns per conversion instruction we have 2.64 ns or roughly 3% of the total time per average call. Getting rid of that is easy, as the size of an unsigned int is a constant.
private const int sizeOfUint = sizeof(uint);
Therefore the final executable code will be:
Here we have 2 interesting side effects:
- We no longer have the conversion but also the constant got put instead of the indirection to an stack variable.
- Almost every comparison you do over a constant that is 2^n based can be converted to a shift operation.
If the JIT is smart enough, this check can be compiled into a shift of 2 places and asking if the result is bigger than 0. Squeezing 4 instructions into 2 per each while cycle.
You guessed right, the JIT is.
More posts in "Excerpts from the RavenDB Performance team report" series:
- (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
- (18 Feb 2015) JSON & Structs in Voron
- (13 Feb 2015) Facets of information, Part II
- (12 Feb 2015) Facets of information, Part I
- (06 Feb 2015) Do you copy that?
- (05 Feb 2015) Optimizing Compare – Conclusions
- (04 Feb 2015) Comparing Branch Tables
- (03 Feb 2015) Optimizers, Assemble!
- (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
- (29 Jan 2015) Optimizing Memory Comparisons, size does matter
- (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
- (27 Jan 2015) Optimizing Memory Comparisons
- (26 Jan 2015) Optimizing Memory Compare/Copy Costs
- (23 Jan 2015) Expensive headers, and cache effects
- (22 Jan 2015) The long tale of a lambda
- (21 Jan 2015) Dates take a lot of time
- (20 Jan 2015) Etags and evil code, part II
- (19 Jan 2015) Etags and evil code, Part I
- (16 Jan 2015) Voron vs. Esent
- (15 Jan 2015) Routing
Comments
I'm no expert here, but it seems to me like this kind of optimization is difficult enough when you compile straight down to machine language, and you only have to worry about hardware differences (and maybe compilers, if you allow others to build your code). In the managed world, it seems like throwing a jitter in the middle would really compound the problem. How are you guys handling those issues?
Chris, We can rely on the JIT to do the best work it can, if we send it code that it is easy to optimize well, it will likely do the right thing
A bit of toppic. Why use a const instead of a readonly var?
Edward, A read only variable still needs to be read. A const can be embedded in the machine code, and for small const, that are optimized instructions that already embed that const inside them.
What exactly is that conversion instruction you mentioned - how does it translate to machine code?
@Chris as Ayende said, the trick to optimize at finer level with a JIT is to help the compiler to emit optimization friendly code. The main tradeoff faced by a JIT is cost of analysis and execution time (usually leaning toward the latter). That means that if the JIT must do a deep analysis to find the optimal code, you will get a suboptimal optimization.
@Edward, your question is very on-topic. The semantic of a read only variable is very different to the semantic of the const. Const will need to be known at compile time, therefore the compiler can emit the proper code. With readonly the JIT must support inter-method optimization unless you are optimizing the constructor itself. Therefore, substitutions that would be safe to do by the compiler, are not safe anymore because they are context dependent.
@Rafal instructions like CWD to upcast, or NEG if you have an unsigned value and need to perform an operation on a signed operand or viceversa (and you don't have a dual opcode that can mix and match). The biggest offenders are typically the signed/unsigned boundary when you are working at the very fine level. So the general rule is: If you can do it, force everything to be on the same type.
@Federico Lois Thanks, you are quit right.
Comment preview