Do you remember the loop from a couple of posts behind?
This loop was the responsible to check for the last bytes; either because the words compare found a difference or we are at the end of the memory block and the last block is not 8 bytes wide. You may remember it easier because of the awful code it generated.
After optimization work that loop looked sensibly different:
And we can double-check that the optimization did create a packed MSIL.
Now let’s look at the different operations involved. In red, blue and orange we have unavoidable instructions, as we need to do the comparison to be able to return the result.
13 instructions to do the work and 14 for the rest. Half of the operations are housekeeping in order to prepare for the next iteration.
The experienced developer would notice that if we would have done this on the JIT output each one of the increments and decrements can be implemented with a single assembler operation using specific addressing mode. However, we shouldn’t underestimate the impact of memory loads and stores.
How would the following loop look in pseudo-assembler?
:START Load address of lhs into register Load address of R into raddr-inregister Move value of lhs into R Load byte into a 32 bits register (lhs-inregister) Substract rhs-inregister, [raddr-inregister], Load int32 from R into r-inregister Jump :WEAREDONE if non zero r-inregister, Load address of lhs into lhsaddr-register Add 4 into [lhsaddr-register] (immediate-mode) Load address of rhs into rhsaddr-register Add 4 into [rhsaddr-register] (immediate-mode) Load address of n into naddr-register Increment [naddr-register] Load content of n into n-register Jump :START if bigger than zero n-register Push 0 Return :WEAREDONE Push r-inregister Return
As you can see the housekeeping keeps being a lot of operations. J
Armed with this knowledge. How would you optimize this loop?
More posts in "Excerpts from the RavenDB Performance team report" series:
- (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
- (18 Feb 2015) JSON & Structs in Voron
- (13 Feb 2015) Facets of information, Part II
- (12 Feb 2015) Facets of information, Part I
- (06 Feb 2015) Do you copy that?
- (05 Feb 2015) Optimizing Compare – Conclusions
- (04 Feb 2015) Comparing Branch Tables
- (03 Feb 2015) Optimizers, Assemble!
- (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
- (29 Jan 2015) Optimizing Memory Comparisons, size does matter
- (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
- (27 Jan 2015) Optimizing Memory Comparisons
- (26 Jan 2015) Optimizing Memory Compare/Copy Costs
- (23 Jan 2015) Expensive headers, and cache effects
- (22 Jan 2015) The long tale of a lambda
- (21 Jan 2015) Dates take a lot of time
- (20 Jan 2015) Etags and evil code, part II
- (19 Jan 2015) Etags and evil code, Part I
- (16 Jan 2015) Voron vs. Esent
- (15 Jan 2015) Routing