Excerpts from the RavenDB Performance team reportOptimizers, Assemble!

time to read 3 min | 444 words

Note, this post was written by Federico. In the previous post on the topic after inspecting the decompiled source using ILSpy  we were able to uncover potential things we could do.

Do you remember the loop from a couple of posts behind?


This loop was the responsible to check for the last bytes; either because the words compare found a difference or we are at the end of the memory block and the last block is not 8 bytes wide. You may remember it easier because of the awful code it generated.


After optimization work that loop looked sensibly different:


And we can double-check that the optimization did create a packed MSIL.


Now let’s look at the different operations involved. In red, blue and orange we have unavoidable instructions, as we need to do the comparison to be able to return the result.

13 instructions to do the work and 14 for the rest. Half of the operations are housekeeping in order to prepare for the next iteration.

The experienced developer would notice that if we would have done this on the JIT output each one of the increments and decrements can be implemented with a single assembler operation using specific addressing mode. However, we shouldn’t underestimate the impact of memory loads and stores.

How would the following loop look in pseudo-assembler?

Load address of lhs into register
Load address of R into raddr-inregister
Move value of lhs into R
Load byte into a 32 bits register (lhs-inregister)
Substract rhs-inregister, [raddr-inregister],
Load int32 from R into r-inregister
Jump :WEAREDONE if non zero r-inregister,
Load address of lhs into lhsaddr-register
Add 4 into [lhsaddr-register] (immediate-mode)
Load address of rhs into rhsaddr-register
Add 4 into [rhsaddr-register]   (immediate-mode)
Load address of n into naddr-register
Increment [naddr-register]
Load content of n into n-register
Jump :START if bigger than zero n-register
Push 0
Push r-inregister

As you can see the housekeeping keeps being a lot of operations. J

Armed with this knowledge. How would you optimize this loop?

More posts in "Excerpts from the RavenDB Performance team report" series:

  1. (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
  2. (18 Feb 2015) JSON & Structs in Voron
  3. (13 Feb 2015) Facets of information, Part II
  4. (12 Feb 2015) Facets of information, Part I
  5. (06 Feb 2015) Do you copy that?
  6. (05 Feb 2015) Optimizing Compare – Conclusions
  7. (04 Feb 2015) Comparing Branch Tables
  8. (03 Feb 2015) Optimizers, Assemble!
  9. (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
  10. (29 Jan 2015) Optimizing Memory Comparisons, size does matter
  11. (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
  12. (27 Jan 2015) Optimizing Memory Comparisons
  13. (26 Jan 2015) Optimizing Memory Compare/Copy Costs
  14. (23 Jan 2015) Expensive headers, and cache effects
  15. (22 Jan 2015) The long tale of a lambda
  16. (21 Jan 2015) Dates take a lot of time
  17. (20 Jan 2015) Etags and evil code, part II
  18. (19 Jan 2015) Etags and evil code, Part I
  19. (16 Jan 2015) Voron vs. Esent
  20. (15 Jan 2015) Routing