Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

@

Posts: 5,949 | Comments: 44,546

filter by tags archive

Reviewing Lightning memory-mapped database libraryA thoughtful hiatus


I thought that I would stop a bit from focusing on what the LMDB code is doing in favor to some observations about the code itself. Going into this codebase it like getting hit in the face with a shovel. Now, this might be my personal experience, as someone who has done a lot of managed code work in the past. But I used to be a pretty horrible C/C++ guy (the fact that I say C/C++ should tell you exactly what my level was).

But I don’t think that it was just that. Even beyond the fact that the code is C, and not C++ (which I am much more used to), there is a problem that only become clear to me well after I read the code for the millionth time. It grew. Looking at the way the code is structured, it looks like it was about as nice a C codebase as you can get (don’t get excited, that isn’t saying much). But overtime, features were added, but the general structure of the codebase wasn’t adjusted to account for that.

I am talking about things like this:

image

image

image

There are actually 22 (!) ‘if(IS_LEAF(mp))’ references in the codebase.

Or what about this?

image

image

It looks like certain features (duplicate keys support, for example) was added that had a lot of implication on the code, but it wasn’t refactored accordingly. It make it very hard to go through.

More posts in "Reviewing Lightning memory-mapped database library" series:

  1. (08 Aug 2013) MVCC
  2. (07 Aug 2013) What about free pages?
  3. (06 Aug 2013) Transactions & commits
  4. (05 Aug 2013) A thoughtful hiatus
  5. (02 Aug 2013) On page splits and other painful things
  6. (30 Jul 2013) On disk data
  7. (25 Jul 2013) Stepping through make everything easier
  8. (24 Jul 2013) going deeper
  9. (15 Jul 2013) Because, damn it!
  10. (12 Jul 2013) tries++
  11. (09 Jul 2013) Partial

Comments

Anton

Can this code repetition be a case of unwrapping what otherwise would have been function calls, making those things inline for performance reasons?

Ayende Rahien

Anton, No, I don't believe it. There are other ways to do that, see: http://gcc.gnu.org/onlinedocs/gcc/Inline.html

Howard Chu

Eh. Yes, cursornext is mostly a mirror image of cursorprev. Likewise cursorfirst / cursorlast. But I made a conscious choice to keep them separate. I could easily have unified them but there would be additional branching and special casing going on.

As for the LEAF2 cases - it's cheaper to have the two lines of repeated code than to go thru the overhead of a function call.

Ayende Rahien

Howard, You can do inline function, which has the same cost of repeating the code and not have duplicate code. Now, if you need to change something, you have to search for all the places where this is happening.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. The RavenDB Comic Strip: Part III – High availability & sleeping soundly - 3 hours from now

There are posts all the way to May 28, 2015

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    20 May 2015 - Part II – a team in trouble!
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats