Note, this post was written by Federico. Where I had notes or stuff to extend, I explicitly marked it as such. In the previous post after inspecting the decompiled source using ILSpy we were able to uncover potential things we could do.
Getting rid of unwanted type conversion may seem as an small cost, but let’s make an example. Let’s say we are comparing 2 memory arrays of 16 bytes and they are equal (our worst case scenario).
Just for the sake of simplification from the 3 potential causes the memory is aligned so there is no need to the first 2 unwanted conversions. That leaves us with the main body as the only source of unwanted conversions.
Now this loops moves our pointer 4 bytes each time and causes 2 conversions. Therefore for a 16 bytes array (a pretty average size) we are performing 8 conversions, that is grand total of 8 conversions. Assuming our idealized processor, at 0.33 ns per conversion instruction we have 2.64 ns or roughly 3% of the total time per average call. Getting rid of that is easy, as the size of an unsigned int is a constant.
private const int sizeOfUint = sizeof(uint);
Therefore the final executable code will be:
Here we have 2 interesting side effects:
- We no longer have the conversion but also the constant got put instead of the indirection to an stack variable.
- Almost every comparison you do over a constant that is 2^n based can be converted to a shift operation.
If the JIT is smart enough, this check can be compiled into a shift of 2 places and asking if the result is bigger than 0. Squeezing 4 instructions into 2 per each while cycle.
You guessed right, the JIT is.
More posts in "Excerpts from the RavenDB Performance team report" series:
- (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
- (18 Feb 2015) JSON & Structs in Voron
- (13 Feb 2015) Facets of information, Part II
- (12 Feb 2015) Facets of information, Part I
- (06 Feb 2015) Do you copy that?
- (05 Feb 2015) Optimizing Compare – Conclusions
- (04 Feb 2015) Comparing Branch Tables
- (03 Feb 2015) Optimizers, Assemble!
- (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
- (29 Jan 2015) Optimizing Memory Comparisons, size does matter
- (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
- (27 Jan 2015) Optimizing Memory Comparisons
- (26 Jan 2015) Optimizing Memory Compare/Copy Costs
- (23 Jan 2015) Expensive headers, and cache effects
- (22 Jan 2015) The long tale of a lambda
- (21 Jan 2015) Dates take a lot of time
- (20 Jan 2015) Etags and evil code, part II
- (19 Jan 2015) Etags and evil code, Part I
- (16 Jan 2015) Voron vs. Esent
- (15 Jan 2015) Routing