Reducing the cost of occasionally needed information
Consider the case when you need to call a function, and based on the result of the function, and some other information, do some work. That work require additional information, that can only be computed by calling the function.
Sounds complex? Let us look at the actual code, I tried to make it simple, but keep the same spirit.
This code has three different code paths. We are inserting a value that is already in the tree, so we update it and return. We insert a new value, but the page has enough space, so we add it and return.
Or, the interesting case for us, we need to split the page, which requires modifying the parent page (which may require split that page, etc). So the question is, how do we get the parent page?
This is important because we already found the parent page, we had to go through it to find the actual page we returned in FindPageFor. So ideally we’ll just return the list of parent pages whenever we call FindPageFor.
The problem is that in all read scenarios, which by far outweigh the write operations, never need this value. What is more, even write operations only need it occasionally. On the other hand, when we do need it, we already did all the work needed to just give it to you, and it would be a waste to do it all over again.
We started the process (this particular story spans 4 years or so) with adding an optional parameter. If you passed a not null value, we’ll fill it with the relevant details. So far, so good, reads will send it null, but all the writes had to send it, and only a few of them had to actually use it.
The next step was to move the code to use an out lambda. In other words, when you called us, we’ll also give you a delegate that you can call, which will give you the details you need. This way, you’ll only need to pay the price when you actually needed it.
It looked like this:
However, that let to a different problem, while we only paid the price of calling the lambda when we needed it, we still paid the price of allocating the labmda itself.
The next step was to create two overloads, one which is used only for reads, and one for writes. The writes one allows us to send a struct out parameter. The key here is that because it is a struct there is no allocation, and the method is written so we put the values that were previously captured on the lambda on the struct. Then if we need to actually generate the value, we do that, but have no additional allocations.
And this post is now about six times longer than the actual optimization itself.
Comments
C# 7 allows you to go from:
to:
A more sneaky but performant way is to to put the current page into a ThreadLocal storage value which can access if you need to. You need to set the value in FindPageFor all times but it should have simlar costs although it creates some hidden dependencies. ThreadLocal<T> is much slower than the [ThreadStatic] attribute just in case you try it.
@Alois In this particular scenario, even trying to access a ThreadLocal is much slower than the actual allocation. This code-path executes in the hundreds of nanoseconds scale.
Comment preview