Ayende @ Rahien

Refunds available at head office

Some final notes about LMDB review

Okay, having gone through the LMDB codebase with a fine toothed comb, I think that I can safely say that it is both a very impressive codebase and one the dearly need some TLC. I’ll freely admit that I am by no means a C guy. And it is entirely possible that a lot of the issues that I have been bugging me are standard C things. But I don’t think so. Methods that go on for hundreds of lines, duplicated code and plethora of gotos hardly seem to be the things that pop to mind when I hear good C code.

But beyond my issues with the code, the implementation is really quite brilliant. The way LMDB manages to pack so much functionality by not doing things is quite impressive. Interestingly, you couldn’t write this database even 5 years ago. LMDB relies on being able to map the db into memory, and up until x64 became prevalent, you just couldn’t do that for any db with a meaningful size. With x64 and the effectively unlimited address space we have (will I be laughing at my naivety in a few years?), that is no longer an issue.

I learned quite a lot from the project, and it has been frustrating, annoying and fascinating experience.

Comments

Howard Chu
08/22/2013 11:44 AM by
Howard Chu

Given what I've seen about the common wisdom of good code, I'd say most of it is garbage. Good code is code that yields the correct result using the fewest resources. In particular, your notion about gotos:

http://www.u.arizona.edu/~rubinson/copyrightviolations/GoToConsideredHarmful.html

It's an interesting argument, but it all comes crashing down in the very last sentence:

"In [2] Guiseppe Jacopini seems to have proved the (logical) superfluousness of the go to statement. The exercise to translate an arbitrary flow diagram more or less mechanically into a jump-less one, however, is not to be recommended. Then the resulting flow diagram cannot be expected to be more transparent than the original one. "

Fundamentally all of those structured programming constructs are just fancy dressing for gotos. When using them gives a compact representation, use them. When using them just complicates the apparent flow, use a goto. It's that simple.

Howard Chu
08/22/2013 11:47 AM by
Howard Chu

Your comment formatter ate that URL. This is a pretty good discussion on the topic. http://c2.com/cgi/wiki?GotoConsideredHarmful

Howard Chu
08/22/2013 12:55 PM by
Howard Chu

Final comment on your final notes - thanks for taking the time to study the code and write such a comprehensive review. Open source has been an established phenomenon for many years now and yet very few people actually take the time to read the source of the code they're working with. (Indeed, very few people take the time to read and study much of anything, mostly they just skim. Getting the Cliff's Notes version of everything is no way to go through life.) I appreciate the time and respect the effort you spent on this.

Ayende Rahien
08/22/2013 01:49 PM by
Ayende Rahien

Howard, I would like to thank you for creating such an interesting project. The codebase has been quite fascinating, and our discussions have been eye opening.

Judah Gabriel Himango
08/22/2013 05:51 PM by
Judah Gabriel Himango

Just wanted to chime in and say, this has been a great review series.

I loved the honest criticisms, the banter between Howard and Ayende, seeing the vast viewpoint differences between a C developer and a .NET developer, and the candor between you guys was great.

I learned some things along the way as well.

jjclockwise
08/23/2013 06:28 AM by
jjclockwise

Guys, thank you all. I have read all these series and comments as well and didn't understand anything. This is no surprise though. Db's are way out of my field of expertise.

Matt Warren
08/23/2013 08:19 AM by
Matt Warren

Oren, Howard

Thanks for this great series and the accompanying comments/discussion.

I've just about followed along, but I never would've had the time to work all this out for myself.

Rob Lyndon
08/23/2013 09:12 AM by
Rob Lyndon

JJ -- I hear you, brother. If you have time to look at the codebase, I would highly recommend it. Start with the simple pieces that you can understand, hold on to your patience and humility, and you'll be surprised at how quickly your knowledge deepens. I myself am still very much scratching at the surface of this topic, but the combination of the codebases, these blog posts, and the contributions from Howard, Ayende, Kelly, Matt and others is as good as any university course on the market.

Matt Warren
08/23/2013 09:31 AM by
Matt Warren

@Howard

With regards to the "common wisdom of good code", I think it depends on what type of software you are writing. For you the code size absolutely matters more than readability, things like fitting code into the CPU cache, minimising overhead of extra classes/functions etc definitely is a priority. But I would say that the amount of people writing that type of code is a small percentage.

However for the large majority of .NET devs (and maybe devs in general), when writing L.O.B style apps, these things are less of an issue. Things like making the app maintainable for other developers and getting the app out as quick as possible (whilst still have it working) are more of a priority. That's why I think code readability and best-practices mean more.

For most of us, ensuring that our code fits into a CPU cache is miles away from what we have to worry about. By the time the .NET runtime is loaded up, we've done some XML/Json serialisation, MVC has kicked in and done loads of magic reflection etc, it doesn't really matter.

Rob Lyndon
08/23/2013 09:47 AM by
Rob Lyndon

How about this for an interview question? Show someone this page, and say "Discuss".

http://www.codethinked.com/ten-c-keywords-that-you-shouldne28099t-be-using

Matt Warren
08/23/2013 09:59 AM by
Matt Warren

@Rob,

I think the title is a bit mis-leading, but the quote in the middle is better:

"there are many features in C# that I think the average developer just shouldn’t be using unless they have a very good reason to do so"

This is about right, you need a good reason to use those keywords. But once you understand them, they're usage is justified. For instance volatile is needed if you want to write multi-threaded code that doesn't use "lock" everywhere, but you have to understand the .NET memory model to use it properly. Unsafe if absolutely needed if you want to work with memory directly, rather than passing round copies of byte[] all the time.

Howard Chu
08/23/2013 11:38 AM by
Howard Chu

@Matt - Given two apps that accomplish the same function, and the same monetary price, I'll take the more efficient one, regardless of the level or line of business the app is. This one will let me do more other things with the same compute resources I already have. Efficiency always matters; it's the difference between getting everything you need to get done, or none at all.

"If you don't have the time to do it right, when are you ever going to have the time to go back and fix it?" Getting the app out as quick as possible is what stupid managers worry about. Smart managers focus on getting it right, first and foremost.

"By the time blah blah blah has kicked in it doesn't really matter" - saying that efficiency doesn't matter because of all the inefficient layers you plan to use is not a valid argument. Indeed, it is a strong indictment of all of those bloated frameworks - they should be written well enough that you do see a difference. Computers are supposed to be fast - that is their sole reason for existence. You get folks out there saying "our software is super-fast" and people buy into it because they've never seen what truly efficient software can do, they've never seen the true power of the computer systems they already own. Heck, today's CPUs are thousands of times faster than a few years ago and yet it still takes on the order of minutes just to get thru a BIOS boot sequence. To me this is intolerable. Time is not money, time is more than money. You shouldn't accept such slowness from the systems you use.

And if we're talking business/financial apps - you hear about big stock/commodity trading houses vying to have their offices located closest to the head-end of their fiber-optic internet provider, because it shaves 3ns off their network RTT gets their trades booked 3ns ahead of their competitors. Seems like wasted effort if all of their apps are just running inside a JVM or some other managed environment in the end, doesn't it?

Stefan Forsberg
08/23/2013 12:08 PM by
Stefan Forsberg

Thanks a lot for this series. Even though I, like jjclockwise said, didn't grasp all the things said it's been a very enjoyable read.

Ayende Rahien
08/23/2013 12:11 PM by
Ayende Rahien

Rob, I would disagree with pretty much all of those (except maybe sealed :-)). You can't do interop without all of those.

Matt Warren
08/23/2013 07:55 PM by
Matt Warren

@Howard

I wasn't talking about efficiency in general, I was merely talking about the case where you were trading off code readability to ensure that your code fits into the CPU L1 cache. I wasn't saying that inefficient abstraction layers should be excused, in fact I know that .NET runtime does quite a bit of work to ensure things are efficient as possible. I was merely saying that in a managed environment, worrying about your code fitting into a CPU cache, is probably a very premature optimisation. A lot of apps will have a long list of items that are much more expensive and would need to be optimised first, such as serialisation, network access, front-end code (if a web app), memory usage etc.

I absolutely agree that programs should be efficient and perform well, but I think that for the majority of developers, good performance means fixing things at a much higher level, such as minimising network calls, not allocating too much memory, using efficient algorithms, caching stuff etc. Nothing like the low-level things you have to worry about when writing a data-layer such as LMDB.

I'm a performance nut myself and I have some understanding about things like CPU caches, false-sharing, cache-friendly memory access patterns etc. But I've only ever had to deal with those when working on RavenDB (although I'd love a day job where these things matter)

BTW with respect to high speed trading systems and JVM/Managed environments, take a look at the Disruptor (http://lmax-exchange.github.io/disruptor/) and Martin Thompsons work (Mechanical Sympathy Blog). It clearly shows that you can make managed languages perform at very high levels. However you do have to do a bit of work, for instance to make sure the garbage collector doesn't slow things down, ensure false sharing isn't an issue, etc. Generally you are fighting a bit against the "managed" things that the run-time is normally trying to hide from you, but both the JVM and and .NET allow this via things like unsafe, structs, volatile etc.

Aris
10/08/2013 10:59 AM by
Aris

@Howard

Nothing like the low-level things you have to worry about when writing a data-layer such as LMDB.

This argument can't go with Linux developer.

Just googling about this topic and you will Llightened.

Comments have been closed on this topic.