The cost of abstraction
Sasha commented on my perf post, and he mentioned the following:
But what if you're designing a messaging infrastructure for intra-application communication? It is no longer fine to hide it behind a mockable IQueue interface because that interface call is more expensive than the enqueue operation!
Now, I know that an interface call is more expensive than a non virtual call. The machine needs to do more work. But exactly how much more work is there to do? I decided to find out.
class Program { public static void Main(string[] args) { //warm up PerformOp(new List<string>(101)); List<long> times = new List<long>(); Stopwatch startNew = new Stopwatch(); for (int i = 0; i < 50; i++) { startNew.Start(); PerformOp(new List<string>(101)); times.Add(startNew.ElapsedMilliseconds); startNew.Reset(); } Console.WriteLine(times.Average()); } private static void PerformOp(IList<string> strings) { for (int i = 0; i < 100000000; i++) { strings.Add("item"); if(strings.Count>100) strings.Clear(); } } }
I run this program with first with PerforOp(List<string>) and then with PerformOp(IList<string>). In the first case, we are doing non virtual method call, and in the second, we are going through the interface.
Running this has produced the following results:
- IList<T> - 2786 milliseconds
- List<T> - 2380 milliseconds
Well, that is a significant, right?
Not really. Remember that we run a tight loop of a hundred million times.
Let us see individual call times:
- IList<T> - 0.0000278600 milliseconds
- List<T> - 0.0000238000 milliseconds
The difference between the two is 0.00000406 milliseconds. In other words, it is an insignificant part of a microsecond.
The difference between a non virtual call and an method call is significantly less than a microsecond.
I don't think that I am going to worry about that, thank you very much.
Comments
Oren, you have measured ONE trivial example and started making conclusions. Even in your example, if you change the code so that the actual interface implementation changes over time, the performance difference goes up to about 40% (60 vs 100 in my measurements).
Now assume you have a non-functional requirement saying that you must support 1,000,000 messages per second inserted into this queue. Would you still disregard the fact using an interface is a BAD decision?
I guess what bothers me the most is that you're saying that since it takes less than a microsecond, it doesn't bother you at all. Any application has billions of operations taking just less than a microsecond. If ALL these operations were designed so carelessly, the impact would be substantial. That's the point I am trying (and apparently failing) to convey.
Sasha,
1,000,000 messages a second, assuming in memory and no locks behind held, is _easy_.
For that matter, in such a scenario it would be memory allocation, not method calls, that would be the problem.
You are failing to convey this because you are assuming that all operations are made equal.
If I had to design a high scale system that never did any IO and performed everything in memory, and I run into perf issue, I would say that you have a point. Even then, the hotspots would be where the money is, not cross system.
If you allow the 4/1000 microsecond benchmark to affect your thinking, you will end up with a single method for everything, because method calls are also costing some non zero amount of time.
Agreed. If some CPU cycles (we are getting more and more of them anyway) could save us some development effort, then they should.
I used to bother a lot about delegates vs virtual methods vs unsafe vs interfaces when "training" NN+GA forecasting models (almost made me learn C++ and asm), but even there it was not worth the effort in the long run.
BTW, I was amazed to hear that some people have even started setting up separate container scopes per request in web applications. This should have some performance penalty.
Sasha,
Please, do not forget about the 80/20 distribution applied to code performance: only 20% of code takes 80% of the execution time.
So it is not the effort to squeeze milliseconds out of the other 80%.
Rinat,
container scope is very cheap, it is a hashtable somewhere for locating existing instances.
Hi,
IMHO the root argument of this conversation - perf vs correctness is pretty flawed. Because there is in my opinion no such thing as a fast - just a fast enough. But there is a correct. Why do I bring that up on this issue? Because using this example on can see that though the correctness will not change over time the perf issue will!
What do I mean by that? Well, whenever I do run into some perf issues early that is related to CPU power (NOT IO/latency issues) I remind myself of the simple fact: my laptop that I would not even consider to be too fast by today's standards harvests more CPU power than the fastest supercomputer on this our earth 8 to 10 years ago. So if I am working on a piece of software that will see production in half a year - chances are that money will bail you out of that particular perf issue. So I would rather strive to be correct and maintainable today and worry about performance issues later. Is that lazy and maybe dangerous? Sure - but wouldn't it hurt the overall ROI more, if I spend days and weeks optimizing something that the future production system might not even see as a perf issue?
That's why I abandoned anticipatory CPU optimizations in favour of working harder on correctness and maintainability.
Now - remoting and tier design. That's a whole different issue. Because while Moore's law applies to CPUs it never has when it comes to bandwidth and latency. But I find that those issues usually are solved using optimized algorithms rather than optimized performance (e.g. List vs IList).
Just me thinking out loud. ;)
Tobi
P.S.: Oren, your validation of URLs does not allow for the "~" character. But http://saftsack.fs.uni-bayreuth.de/~dun3/ (my blog’s URL) should be valid. ;-)
Lots of Russians here. Neat :)
Tobias,
There is just one issue with your approach, during development you may be working on an underpower laptop, but in production, the amount of data and scale you have are far greater than the difference in speed in the servers
I have to say that I really could care less if the interface call is twice as costly as the non-interface call.
When talking about performance, developers tend to optimize things like loops and making a particular method call faster. Those things are mostly worthless (unless you are doing something stupid or building a latency sensative system--say, an airplane engine).
A 5% optimization is worthless unless we are talking about running that code on, say 10 servers, where the 5% optimization makes a 50% difference.
The other stupid mistake people make with optimization is thinking in serial mode. When it comes down to it, don't optimize for the loop, optimize so you can run in parallel. Unless you are running in an enviroment where resources are greatly constrained (embedded systems) adding hardware is often cheaper than, say, draining two weeks of labor in performance optimizations (do the per hour $$ math against the cost of a server).
Don't make it faster, make it parallel--unless you have the size of economy working for you (ie..5% over 10 servers).
I just have to point out regarding my last comment--hardware resources are a commodity, developer time is not. Optimize appropriately.
Whenever discussions of premature optimization come up, those that favor early optimization always seem to think that those who don't favor it do so out of a complete lack of caring for performance and/or laziness. On the contrary!
Many people are familiar with the concept of the "Last Responsible Moment". For those that aren't, a simple way to think of it is this: One always would prefer to make decisions with all possible decision-relevant information available at the time the decision is to be made. The way to do this is to wait to make that decision for as long as you can without further waiting being irresponsible. You don't sit around in the meantime, however, you simply move forward and work on things for which you have the most information available and can make the best decisions (and, of course, those things that simply must happen first due to sequencing issues.)
The strange thing, however, is that while early optimization advocates are usually quite understanding of the Last Responsible Moment principle they almost always selectively ignore LRM when discussing optimization. LRM applies to optimization just like it applies to everything else. You can't optimize until you know what truly needs to be optimized. Obviously, you shouldn't be blatantly wasteful in the meantime, but there are almost without exception far better places to apply early effort than to optimization.
This isn't to suggest that optimization is any less important, or that we know everything there is to know about the things we do sooner like build for correctness and maintainability. We simply have available more of the information related to correctness and maintainability decisions than we have information regarding performance decisions.
Hi Oren,
absolutely true.
Maybe I am wrong about that, but I tend to treat the issue you are talking about in your response (scaling up in production, going from 10000 to 1000000) as an algorithm problem (e.g. incorrect IO pattern, big O problems) rather than a performance problem (O(n) optimization).
But most likely I am wrong doing so. ;-)
Oren, I would not oversimplify this to a simple hash.
You still need to register newly resolved instances back in the request container (with the container/factory ownership), emit all the events, do some reflection, dispose all container-owned registrations at the end of the request container's life etc.
Additionally, having container per request adds one more level in the container lookup chain.
Rinat,
I am not sure what container are you talking about, but that is certainly not any that I know
Is the difference insignificant? No, actually for my case it is very significant. I run MonteCarlo simulations and the "little" difference you talk about can actually add or remove hours and days from my simulation run.
Of course what I am talking about is a very specific situation, and it begs the question why not C++? Answer if you code C# properly you are almost at the speed of C++ without having to deal with the issues of C++.
Christian,
How many places you have this hotspot on?
In most cases, those are very few.
More importantly, no one is denying that Christian's example involves a hotspot. Also, I doubt anyone would deny that given previous experience implementing similar things (eg Monte Carlo) one can always apply a degree of that experience so as to avoid generating too much re-work during optimization. However, at the end of the day the fact of the matter is that in the vast majority of situations the vast majority of developers will not be able to predict the vast majority of hotspot locations.
To use the Monte Carlo example for a moment, unless you wrote Monte Carlo before and are writing it again, to believe that you can predict the location of hotspots (aside from fundamental issues, e.g. building distributed applications using chatty interfaces) in a given application is grandstanding at best and sheer delusion at worst. ;)
Comment preview