Excerpts from the RavenDB Performance team reportThe long tale of a lambda
This nugget was discovered by Tal, who was measuring write throughput and noticed that a lot of the time wasn’t being handled in the proper code path, but on something on the side that seemed… off.
prefetchingQueue.Aggregate(0, (x,c) => x + SelectSerializedSizeOnDiskIfNotNull(c)) > context.Configuration.AvailableMemoryForRaisingBatchSizeLimit)
This piece of code is responsible for heuristics deep inside the bowels of RavenDB. In fact, it is the piece that decide whatever we have enough memory to index directly from memory or do we need to start dumping stuff and pay the I/O cost of them later.
As a result, this piece of code is called once for every document save. It was also quite wildly inefficient. The Aggregate implementation was pretty much as you imagined it, and it took three times as much time as actually process the rest of the request. The underlying reason was that we kept doing a foreach and a delegate invocation on each an every call. The more documents we had coming in, the more work we had to do. Shlemiel the painter at his best.
We first saw a major improvement by just removing the Aggregate() call in favor of a purpose built function that summed all those details for us directly. Next, we changed things so instead of doing O(N) work per request, we could do an O(1) work by doing this work one bit at a time and aggregating it on the fly. So whenever we added or removed something to the prefetching queue, we would also make sure to add / remove that from the global tally.
Once that is done, we saw almost 18% improvement in high write scenarios, because we weren’t just busy counting how much stuff we have in memory to figure out if we can put things in memory.
I can’t emphasize enough how important it is that the work throughout was done using a profiler (in our case, the excellent dotTrace) because if dotTrace wouldn’t have point a big finger at this line of code, we would have never have considered this to be problematic.
More posts in "Excerpts from the RavenDB Performance team report" series:
- (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
- (18 Feb 2015) JSON & Structs in Voron
- (13 Feb 2015) Facets of information, Part II
- (12 Feb 2015) Facets of information, Part I
- (06 Feb 2015) Do you copy that?
- (05 Feb 2015) Optimizing Compare – Conclusions
- (04 Feb 2015) Comparing Branch Tables
- (03 Feb 2015) Optimizers, Assemble!
- (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
- (29 Jan 2015) Optimizing Memory Comparisons, size does matter
- (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
- (27 Jan 2015) Optimizing Memory Comparisons
- (26 Jan 2015) Optimizing Memory Compare/Copy Costs
- (23 Jan 2015) Expensive headers, and cache effects
- (22 Jan 2015) The long tale of a lambda
- (21 Jan 2015) Dates take a lot of time
- (20 Jan 2015) Etags and evil code, part II
- (19 Jan 2015) Etags and evil code, Part I
- (16 Jan 2015) Voron vs. Esent
- (15 Jan 2015) Routing
Comments
That Aggregate is really just code obfuscation. It is a sum nothing more.
Simple rule of optimization. It is never, ever, what you think it is.
In fact if I was ever to attempt to optimize without access to profiler, I would likely look at it. Make every reasonable deduction on what is the issue, what COULD NOT BE the issue.
Then do the exact opposite, work on what could NOT be the issue first.
So the real optimization was going away from O(N) and not the lambda itself?
This is a bit like the rule they have in the Roslyn code base, that states that as part of "avoiding allocations in hot paths", you should "Avoid LINQ"! See http://mattwarren.org/2014/06/10/roslyn-code-base-performance-lessons-part-2/ and https://roslyn.codeplex.com/wikipage?title=How%20to%20Contribute&referringTitle=Documentation
LINQ is great, but it really does do a lot of allocations behind the scenes. Joe Duffy has a great quiz on this, see "A different, better-performing approach" on the page http://joeduffyblog.com/2010/09/06/the-premature-optimization-is-evil-myth/
Stig, No, actually. We ended up there. But just moving from Linq Aggregate to a dedicated method reduced the runtime in a MAJOR way. The cost was still O(N), but the O was so much smaller.
Comment preview