Stupid Micro Benchmarking: Proxy Performance
Let us take a look at this class:
public class Trivial { public void EmptyStandard() { } public virtual void EmptyVirtual() { } [MethodImpl(MethodImplOptions.NoInlining)] public void EmptyNoInline() { } [MethodImpl(MethodImplOptions.NoInlining)] public virtual void EmptyVirtualNoInline() { } }
Now let us see what is the effect of using Dynamic Proxy on performance, here is the test rig:
int count = 100000000; var trivial = new Trivial(); Stopwatch sp = Stopwatch.StartNew(); for (int i = 0; i < count; i++) trivial.EmptyStandard(); Console.WriteLine("EmptyStandard: " + sp.ElapsedMilliseconds); sp.Reset(); sp.Start(); for (int i = 0; i < count; i++) trivial.EmptyVirtual(); Console.WriteLine("EmptyVirtual: " + sp.ElapsedMilliseconds); sp.Reset(); sp.Start(); for (int i = 0; i < count; i++) trivial.EmptyNoInline(); Console.WriteLine("EmptyNoInline: " + sp.ElapsedMilliseconds); sp.Reset(); sp.Start(); for (int i = 0; i < count; i++) trivial.EmptyVirtualNoInline(); Console.WriteLine("EmptyVirtualNoInline: " + sp.ElapsedMilliseconds); trivial = (Trivial)new ProxyGenerator().CreateClassProxy(typeof (Trivial)); sp.Reset(); sp.Start(); for (int i = 0; i < count; i++) trivial.EmptyVirtual(); Console.WriteLine("Proxy EmptyVirtual: " + sp.ElapsedMilliseconds); sp.Reset(); sp.Start(); for (int i = 0; i < count; i++) trivial.EmptyVirtualNoInline(); Console.WriteLine("Proxy EmptyVirtualNoInline: " + sp.ElapsedMilliseconds);
The result of this is:
EmptyStandard: 382
EmptyVirtual: 397
EmptyNoInline: 557
EmptyVirtualNoInline: 520
Proxy EmptyVirtual: 6628
Proxy EmptyVirtualNoInline: 6372
On first glance, it is horrible, using a proxy have a 10x perf penalty. But notice just how many times I am running the code. a hundred million times, to be able to get anything observable.
As usual, this micro benchmark basically means that I don't really care about such things :-)
Comments
While this is a non-issue for some developers, such as yourself. Most can actually benefit from this post. This may be the first time they have encountered the concept of premature/micro optimization. For us (yes, me included) this is actually mind bending information.
I used to believe that reflection was evil and couldn't understand why anyone would use it. Now I see there is a great benefit and little cost. At least a much lower cost than I expected. This came from the belief faster is better and just not understanding the tools of my trade.
So thank you for another great nugget of programming.
@Jason:
Saying that reflection has great benefit at little cost is not 100% correct. The reflection used in Dynamic proxy is heavily optimized. Using reflection carelessly can still be quite costly. To continue the example above, try this naive chunk of reflection code and see how long it takes:
trvial.GetType().GetMethod("EmptyStandard", BindingFlags.Instance | BindingFlags.Public).Invoke(foo, null);
Knowing how your code performs is always a god thing. Perhaps in the future you will need that extra millisecond, perhps you will never need it. But at least now you know.
Computer cycles cheap.
Programmer cycles expensive.
@Nathan I think it is quote of the day!
is this in 3.5? just curious because the CLR normally won't inline virtual calls so the time difference is umm curious. Are you running in 32bit or 64 bit? In 32bit ...
when disassembled in release produces
00DF0075 E8A21FB1FF call 0090201C (JitHelp: CORINFO_HELP_NEWSFAST)
00DF007A 8BC8 mov ecx,eax
00DF007C 8B01 mov eax,dword ptr [ecx]
00DF007E FF5038 call dword ptr [eax+38h]
00DF0081 E862755F78 call 793E75E8 (System.Console.get_In(), mdToken: 0600076e)
00DF0086 8BC8 mov ecx,eax
00DF0088 8B01 mov eax,dword ptr [ecx]
00DF008A FF5064 call dword ptr [eax+64h]
00DF008D C3 ret
00DF007C is the call to the virtual method (no inlining).
Cheers,
Greg
This is on 3.5, 32bit, no SP1
Take into account that the JIT may rewrite the code if it notice a common path
The run time performance seems good but to make the picture complete you should also add the first cold start, first warm start performance to see how much infrastructure it needs from the disk before anything visible happens.
It really depends how stringent your startup performance goals are.
Yours,
Alois Kraus
Computer cycles cheap.
Programmer cycles expensive.
Tenthousand paid users that have to wait a second is more expensive.
It might be worth checking how parameters affect the results, although very few parameter payloads are that large...
What is the reason for the performance difference?
I thought the proxy just inherits the given class.
The actual performance PITA I have met with DP is the speed of creating proxies, especially under debugger.
Under the debugger, the assembly load is intercepted and work done there, which is why it is slow.
the major difference is that in the DP case, we need to create an invocation, pass it to the interceptor, etc.
There is actual work going on, whereas the non proxy case is just nop
Comment preview