High performance field clobbering
So we are using a particular library in a not so standard way. And in order to gain 10x performance benefit we have to reuse a particular class from this library. This class isn’t meant to be reused, but looking at its code, it is clear that it is perfectly possible to do so. All we need is to set the _started field to false and it will be possible to reuse this instance.
So far, so good. Except that the field is private. Now, we can’t just implement our own copy of this, because this is a field in the base class that we are extending to plug our extension to the system. We could try submitting a patch for this, but this is a popular library, and tying ourselves to a particular version would suck. This code has also hasn’t changed since at Jan 2012, so that is pretty stable. And yes, we are aware of the risk in doing this, unsupported, etc.
Now that we decided to do it, the question is how. I created the following epic class:
And here is the simplest option to handle it:
Can we do better?
What happens if we cache the field lookup?
This has significant improvement, right?
But that is still quite high for me. Can we do better still?
Let us try some dynamic code generation. In this case, we can’t use the much easier Expression class to do it, and have to go with direct IL generation, which gives:
And the benchmark result?
That is pretty awesome. For comparison purposes, I also did a static delegate and direct set, to compare the costs.
And those give me:
But I think that 2.5 ns is fast enough for me here .
Comments
Is the famous library named Lucene? if so, I know there are some efforts to get 4.8 out. Thanks a lot for sharing, this is very useful.
Is there a performance advantage in using
DynamicMethod
, vs building a simple Expression tree and generating the lambda viaExpression.Lamda<Action<PrivateRyan, bool>>(...).Compile()
? (not talking about the initial cost of generating the thing, but calling the resulting lambda at runtime)I usually use Expression Trees for dynamic code, because they are simpler to maintain than having to understand the equivalent IL code.
Christophe, You can do that, but it has limits, in particular, you typically cannot avoid visibility, and just setting a field cannot be done in an expression.
Are you sure? I use them all the time to access non-public fields, methods, ctors, and even to deserialize structs/classes with auto-properties where I need to access the private backing fields.
For example, here is one of my helper methods, specifically to build "setters" for Fields (ignore the fact that the action takes object for the instance and params, it would work the same for typed args)
Christophe, You are correct, thanks! Funnily enough, your version is faster:
Note quite sure why, thought
Indeed, that is weird, because I thought that Compile() was just using an ExpressionTree Visitor that would Emit(..) the appropriate IL ? Maybe the JIT is doing magic under the covers?
I also tried using using Roslyn for codegen at runtime, because sometimes generating a string with the C# code that does what I want is a bit easier than cobbling together expression trees, especially when you need to build larger methods (like ones that deserialize all the fields of a struct in one go, instead of calling lots of smaller methods for each field). The startup cost of Roslyn is a lot higher, but if you are already using it for other things (like Razor views), it becomes "free". Only downside is that you can't do fancy stuff like calling into private/internal stuff.
A bit late, but I can explain why his code is faster. You see Christophe's code is closed over an object (in this case the empty closure object). You can get the same result by creating a dynamic method that closes over an empty object. Delegate invoke are modelled as instance calls, so if you aren't closing over anything it has to thunk shuffle your parameters.
Comment preview