Using the .NET JIT to reduce abstraction overhead
I ran into this recently and I thought that this technique would make a great post. We are using that extensively inside of RavenDB to reduce the overhead of abstractions while not limiting our capabilities. It is probably best that I’ll start with an example. We have a need to perform some action, which needs to be specialized by the caller.
For example, let’s imagine that we want to aggregate the result of calling multiple services for a certain task. Consider the following code:
As you can see, the code above sends a single request to multiple locations and aggregates the results. The point is that we can separate the creation of the request (and all that this entails) from the actual logic for aggregating the results.
Here is a typical usage for this sort of code:
You can notice that the code is fairly simple, and uses lambdas for injecting the specialized behavior into the process.
That leads to a bunch of problems:
- Delegate / lambda invocation is more expensive.
- Lambdas need to be allocated.
- They capture state (and may capture more and for a lot longer than you would expect).
In short, when I look at this, I see performance issues down the road. But it turns out that I can write very similar code, without any of those issues, like this:
Here, instead of passing lambdas, we pass an interface. That has the same exact cost as lambda, in fact. However, in this case we also specify that this interface must be implemented by a struct (value type). That leads to really interesting behavior, since at JIT time, the system knows that there is no abstraction here, it can do optimizations such as inlining or calling the method directly (with no abstraction overhead). It also means that any state that we capture is done so explicitly (and we won’t be tainted by other lambdas in the method).
We still have good a separation between the process we run and the way we specialize that, but without any runtime overhead on this. The code itself is a bit more verbose, but not too onerous.
Do you have any benchmarks for comparison between the two approaches?
@Kuan: this is not true, I am affraid. The tasks array might have more elements than required and they that will be null (in the better case), or even some random value (the documentation https://docs.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1.rent?view=net-6.0 states that "The array returned by this method may not be zero-initialized.").
Hi, I think that the parameter (TMerger merger) is not necessary.
A local variable declared in the method:
var merger = default(TMerger);
should do the same trick? And it will not be accidentally boxed.
Depending on exactly what is happening elsewhere. This can eliminate virtual calls and usually saves 3 - 5%. Also important is reduction in allocations.
The idea is that you can use the structure to pass arguments, explicitly controlling what is captured, unlike a lambda. And it will likely always be boxed, since this is
Oren, thanks for the answer. I don't see the implementation of the IMergedOperation. Maybe "capture the required state in the struct, which will be boxed" approach may be replaced by the explicit arguments in the method signature. I am aware that this change is most likely not possible in your scenario due to many variations on required arguments on the caller site?
I am using a similar code when. 1) I want to select the required operation in compilation time (the poor C# developer equivalent of the compile-time polymorphism from C++). In the sample below I have replaced virtual call/delegates for enrypt/decrypt operation by the TAeadOperation generic parameter. 2) TAeadOperation is a struct - simple wrapper about the Encrypt/Decrypt operation. 3) So TAeadOperation is never boxed, TAeadOperation doesn't have state and the overhead of the "aggressively inlined" method should be on many supported .NET platforms reduced to direct call of the underlying API.
and non-virtual Transform method looks like this.
Errata. var aeadOperation = default(TAeadOperation()); otherwise on many (all?) platforms emits Activator.CreateInstance call.
Yes, that would be a great way to do that. Now, let's assume that you want to specialize the operation. So you have the key, but you want to have a separate key for
userb. You need a way to pass that value in. That is not something that is generic to all values, that is a particular
aeadimplementation. So you pass the value with the context key and can use that. There will not be any boxing involved unless you are using