Performance optimizations, managed code and leaky abstractions
I run into this post from Jeff Atwood, talking about the performance difference between managed and unmanaged code:
There were a lot of optimizations for this along the way, but the C++ version has soundly beaten the C# version. As expected, right?
Well, yes, but with extenuating circumstances.
So am I ashamed by my crushing defeat? Hardly. The managed code achieved a very good result for hardly any effort. To defeat the managed version, Raymond had to:
- Write his own file/io stuff
- Write his own string class
- Write his own allocator
- Write his own international mapping
Of course he used available lower level libraries to do this, but that's still a lot of work. Can you call what's left an STL program? I don't think so, I think he kept the std::vector class which ultimately was never a problem and he kept the find function. Pretty much everything else is gone.
So, yup, you can definitely beat the CLR. I think Raymond can make his program go even faster.
I find this interesting, because it isn’t really specific for C++, in my recent performance sprint for the profiler, I had to:
- Write my own paging system
- Write my own string parsing routines
- Write my own allocator
For the most part, performance optimizations fall into four categories:
- Inefficient algorithms – O(N) notation, etc.
- Inefficient execution – not applying caching, doing too much work upfront, doing unneeded work.
- I/O Bound – the execution waits for a file, database, socket, etc.
- CPU Bound – it just takes a lot of calculations to get the result.
I can think of very few problems that are really CPU Bounded, they tend to be very specific and small. And those are just about the only ones that’ll gain any real benefit from a faster code. Of course, in pure math scenarios, which is pretty much where most of the CPU Bound code reside, there isn’t much of a difference between the language that you choose (assuming it is not interpreted, at least, and that you can run directly on the CPU using native instructions). But as I said, those are pretty rare.
In nearly all cases, you’ll find that the #1 cause for perf issues is IO. Good IO strategies (buffering, pre-loading, lazy loading, etc) are usually applicable for specific scenarios, but they are the ones that will make a world of difference between poorly performing code and highly performing code. Caching can also make a huge difference, as well as differing work to when it is actually needed.
I intentionally kept the “optimize the algorithm” for last, because while it can have drastic performance difference, it is also the easiest to do, since there is so much information about it, assuming that you didn’t accidently got yourself into an O(N^2) or worse.
Sounds very familiar. And indeed it is interesting that the tests don't evolve the C# version past the 2nd attempt. Like with the sprint you describe, I'm doing something very similar in protobuf-net at the moment, and if the scenario warrants you of course can always take the managed code another level (for example, I'm currently rewriting part of the pb-net core to use yet-more meta-programming, making the IL at execution minimal).
I wonder what the graphs would have looked like if the same level of rework had been done between C# / C++. I wouldn't necessarily expect C# to be faster than C++, but I suspect you could remove the word "soundly".
If that is C#/UTF8, I'd be interested to hear what you've done differently and what your results were. I'm currently using Encoding/Encoder etc, with byte/char and a custom interner - have you gone far from this?
This is my observation as well, at our company we have C#, Ajax and C++ clients for our web services. The C# client only involves 1 line of code to call our web services (since we're able to re-use our web service DataContract's) where for the C++ client we've had to generate (1000's of lines) of xml parsing code to parse the xml into our generated C++ model. What was interesting was on the first cut the C++ client was nearly 1.5x times slower than the C# client and it was only after spending a lot of time doing heavy optimizations were we able to get it on par with the C# client.
The lack of reflection and the ability to generate 'custom cached code paths' mean that for some tasks it will be extremely hard for C++ to have better performance that their C# equivalents. An example of this would be a for object serializers/deserializers (like Marc's protobuf-net), I'm able to use 'cached generic static classes' to embed a compiled delegate in the static constructor and avoid any runtime penalties of working out the right code-path to use. Initial benchmarks are promising as I'm able to achieve a 6.72x and 10.18x perf increase over Microsoft's Xml and Json DataContract serializers - whilst only being ~1.92x slower than Marc's extremely fast binary protobuf-net serializer:
eh, Correction I mean a 3.5x and 5.3x perf improvement, Marc's protobuf-net serializer shows the 6.72x and 10.18x improvement.
@Demis - cute ;-p I'll have to re-double my protobuf-net efforts to keep an edge!
Also one thing that I find C/C++ seem to excel at over managed languages is rich-client GUI development.
I'm really impressed with the speed of native apps like chrome and spotify and have been hard pressed to find any examples of C#/.NET (or even Java) GUI apps with the same level of speed and responsiveness.
For the most part managed winforms/wpf/silverlight clients show good enough perf results to get the job done, but to me they always feel more sluggish then a well designed native C/C++ app.
It would be good if you did a blog post calling out for the best examples of .NET GUI clients that are available as I think it would be good to see what can be accomplished with a managed .NET GUI.
@Demis Bellot - about the only managed GUI app I use semi-frequently is paint.net, and it has a very slick gui - http://en.wikipedia.org/wiki/Paint.NET
No comment, I stop reading Jeff blog year and half because he seems like 'Microsoft Slave' to me.
if the C++ developer can't write the version 6 from the first time so he is bad developer, also I don't know what kind of C++ classes he use to get this performance, I hope he didn't use STL because no company use it.
Also the people that used C++, I guess they use it when they have a critical project.
And about .NET performance this is really the biggest joke, because I don't use it in any critical project, I already implemented a lot of .NET namespaces to improve the performance and reduce the memory consumption and on average my classes was faster by about 50% and some time by factor of 2x and it reduce the memory by about 20 ~30% and this classes includes: String, Parallelism, Collection, Serialization, Refraction.
Why do people even follow Jeff's blog, the guy is an idiot. Glorified writer, amateur programmer.
The other thing to keep in mind is that the average skilled developer C# developer would initially come up with something like iteration #1 as a solution, while a skilled C++ would initially come up with something like iteration #3.
It's true that with unmanaged code you can squeeze more out of it, that's a no-brainer, but what percentage of real-world apps out there is there ever enough emphasis on squeezing performance?
Companies look at bottom line costs. Add up the hours spent refactoring and look at the Cost of that performance. No one's going to give a rats ass if you can shave off less than 0.1s if it's going to cost them 10+ hours to do it. So in the end for a managed application written by competent developers you're going to get something like Iteration #1 managed or maybe a bit worse for less $ than something like Iteration #3 or 4 of unmanaged C++. (And plenty of risk that you're going to get a few iteration #1 & 2's in there as well.)