Ayende @ Rahien

Refunds available at head office

Reviewing Lightning memory-mapped database library: A thoughtful hiatus

I thought that I would stop a bit from focusing on what the LMDB code is doing in favor to some observations about the code itself. Going into this codebase it like getting hit in the face with a shovel. Now, this might be my personal experience, as someone who has done a lot of managed code work in the past. But I used to be a pretty horrible C/C++ guy (the fact that I say C/C++ should tell you exactly what my level was).

But I don’t think that it was just that. Even beyond the fact that the code is C, and not C++ (which I am much more used to), there is a problem that only become clear to me well after I read the code for the millionth time. It grew. Looking at the way the code is structured, it looks like it was about as nice a C codebase as you can get (don’t get excited, that isn’t saying much). But overtime, features were added, but the general structure of the codebase wasn’t adjusted to account for that.

I am talking about things like this:

image

image

image

There are actually 22 (!) ‘if(IS_LEAF(mp))’ references in the codebase.

Or what about this?

image

image

It looks like certain features (duplicate keys support, for example) was added that had a lot of implication on the code, but it wasn’t refactored accordingly. It make it very hard to go through.

Comments

Anton
08/06/2013 03:51 PM by
Anton

Can this code repetition be a case of unwrapping what otherwise would have been function calls, making those things inline for performance reasons?

Ayende Rahien
08/06/2013 07:14 PM by
Ayende Rahien

Anton, No, I don't believe it. There are other ways to do that, see: http://gcc.gnu.org/onlinedocs/gcc/Inline.html

Howard Chu
08/06/2013 08:55 PM by
Howard Chu

Eh. Yes, cursornext is mostly a mirror image of cursorprev. Likewise cursorfirst / cursorlast. But I made a conscious choice to keep them separate. I could easily have unified them but there would be additional branching and special casing going on.

As for the LEAF2 cases - it's cheaper to have the two lines of repeated code than to go thru the overhead of a function call.

Ayende Rahien
08/07/2013 04:23 AM by
Ayende Rahien

Howard, You can do inline function, which has the same cost of repeating the code and not have duplicate code. Now, if you need to change something, you have to search for all the places where this is happening.

Comments have been closed on this topic.