Performance optimizations, managed code and leaky abstractions
I run into this post from Jeff Atwood, talking about the performance difference between managed and unmanaged code:
There were a lot of optimizations for this along the way, but the C++ version has soundly beaten the C# version. As expected, right?
Well, yes, but with extenuating circumstances.
So am I ashamed by my crushing defeat? Hardly. The managed code achieved a very good result for hardly any effort. To defeat the managed version, Raymond had to:
- Write his own file/io stuff
- Write his own string class
- Write his own allocator
- Write his own international mapping
Of course he used available lower level libraries to do this, but that's still a lot of work. Can you call what's left an STL program? I don't think so, I think he kept the std::vector class which ultimately was never a problem and he kept the find function. Pretty much everything else is gone.
So, yup, you can definitely beat the CLR. I think Raymond can make his program go even faster.
I find this interesting, because it isn’t really specific for C++, in my recent performance sprint for the profiler, I had to:
- Write my own paging system
- Write my own string parsing routines
- Write my own allocator
For the most part, performance optimizations fall into four categories:
- Inefficient algorithms – O(N) notation, etc.
- Inefficient execution – not applying caching, doing too much work upfront, doing unneeded work.
- I/O Bound – the execution waits for a file, database, socket, etc.
- CPU Bound – it just takes a lot of calculations to get the result.
I can think of very few problems that are really CPU Bounded, they tend to be very specific and small. And those are just about the only ones that’ll gain any real benefit from a faster code. Of course, in pure math scenarios, which is pretty much where most of the CPU Bound code reside, there isn’t much of a difference between the language that you choose (assuming it is not interpreted, at least, and that you can run directly on the CPU using native instructions). But as I said, those are pretty rare.
In nearly all cases, you’ll find that the #1 cause for perf issues is IO. Good IO strategies (buffering, pre-loading, lazy loading, etc) are usually applicable for specific scenarios, but they are the ones that will make a world of difference between poorly performing code and highly performing code. Caching can also make a huge difference, as well as differing work to when it is actually needed.
I intentionally kept the “optimize the algorithm” for last, because while it can have drastic performance difference, it is also the easiest to do, since there is so much information about it, assuming that you didn’t accidently got yourself into an O(N^2) or worse.