Reviewing Lightning memory-mapped database library: tries++

architecture (615) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1087) rss
raven (1456) rss
ravendb.net (540) rss
reviews (184) rss

2025
- July (6)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Jul 12 2013

Reviewing Lightning memory-mapped database librarytries++

time to read 4 min | 659 words

I decided to resume my attempt to figure out the LMDB codebase. The previous attempt ended when my eyes started bleeding a little, but I think that I can get a bit further now.

One thing that I figured out already is that because this is C code, memory management is really painful. I noted previous how good it was from leveldb to use std::string for most allocations, since it frees both caller and lib from having to worry about the memory. LMDB only shows how important that was. Take a look at the following method:

Quick, what do you think it does. As it turn out, what is does is update the key to point to the key that the cursor is currently pointing at. The reason for that is that the library doesn’t want to own the memory for key (since then it would have to provide a way to free it).

Now, to be fair, this appears to be an internal method, only used by the lib itself, but such things are pretty pervasive in the codebase, and it is really annoying that the code is all in one spot, with no real structure, since I am still trying to figure out some sort of hierarchy for the code so I could get a grip on what is actually going on.

I decided to just go to the start and figure out how it opens a database:

It is interesting that you need a transaction to open the database. However, the opened database survives the transaction closing, and from the docs, if you open a db in a transaction, that is the only thing you can do in that transaction.

I think that I just figured out something, the code in mdb.c is actually written in reverse depth order. So the closer to the end of the file a function is, the more visible it is.

Okay, it appears that there is the notion of multiple dbs, but there is also the notion of the main db. It looks like the main db is also used for tracking things for child dbs. In particular, when opening a named db, we have the following:

We search for the db information in the main db. As you can see, mdb_cursor_set is actually setting the data on the cursor to the value we passed in. More interesting is what happens if the db is not there. It appears to merely create the db data and set it in the main db. Okay, looking at this function docs, it appears that this is just about allocating a database handle, not about opening the actual database.

I am not really sure that I understand the reasoning behind it, but a new database is actually created in mdb_cursor_put. So only after the first time you create a value, will the db actually be created.

I guess that explains why this function goes on for over 400 lines of code and has no less than 10 goto(!) and 7 goto sections!

I get that this is C, but come on, seriously!

Tweet Share Share 7 comments

Tags:

Comments

12 Jul 2013
12:24 PM

Ayrton Gsz

You know i guess GOTOs arent bad at all if they are well explained even thought some GOTOs cannot even be understood with any kind of comment or description.

From what i see i can tell you that they use gotos to aggregate code and not merely to go back or go forward in time

12 Jul 2013
13:05 PM

Howard Chu

From your comments I take it you have probably never written in assembly language or machine language. Also you have too little experience with the C language to be giving valid opinions on C coding style.

The overall structure of the file is reverse-depth because that's the natural order for a language that requires definitions to occur before references. An experienced C coder wouldn't have given this a second thought, I was a little surprised it's been such a stumbling block for you but now that I'm told you're a .Net programmer I suppose that explains it.

Everything in LMDB is transactional. A transaction is required for dbi_open because creating a named database actually writes into the environment. If the transaction is aborted, the DB's creation will be reverted, same as you would expect anything else to be reverted in an abort. This is a key improvement over BerkeleyDB, where named DBs reside in individual filesystem files, and it's impossible to coordinate transaction events with filesystem events. (E.g., create a DB in a txn, commit, then someone outside deletes the file. txn is invalid but BDB can't figure it out.)

The B+tree structure in LMDB is essentially recursive. A data node in the main DB can be the DB record for a named DB. A data node in any of these DBs can be the DB record for a sorted-duplicate sub-DB. The DB record is only for bookkeeping of the data, it records the page number of the DB's root page and various pagecount stats. The root page is only allocated the first time you write to the DB. Why would you bother to allocate before then?

The internal comments already explain these things, perhaps you shouldn't have been so impatient in skipping over the first 2000 lines of comments in the file before beginning your analysis.

12 Jul 2013
13:24 PM

Howard Chu

re: mdb_update_key - no, that's not what it does. It does the exact opposite - it replaces the key that the cursor is pointing at with the given key. That should be obvious since "key" is marked as an input parameter. If it did what you describe, then key would be an output parameter.

12 Jul 2013
13:43 PM

Howard Chu

It might help you to read the docs from the user's perspective before trying to get into the code itself. http://fossies.org/dox/openldap-2.4.35/group__mdb.html

12 Jul 2013
14:58 PM

Ayende Rahien

Howard, I am most certainly not a C programmer. And I haven't done asm level development in a very long time. I get why you are doing reverse depth, my surprise is more about the fact that this is all in a single file than anything else. I am used to .h files that declare the functions, after which you can move them around with a great deal of freedom.

With regards to transaction to create the db. My surprise is more about the lifetime of the dbi. Usually, anything that requires a transaction can only live as long as that particular transaction.

I really like the tree impl, especially the way it works with the free db and the the free list. However, it was surprising to see that since I was thinking dbs in the sense of separate databases (including separate files). Instead, what you have is a lot close to different tables that reside in the same file.

12 Jul 2013
15:02 PM

Ayende Rahien

Howard, You are correct WRT mdb_update_key, I didn't read the code properly.

15 Jul 2013
16:07 PM

Saman

I mean seriously, why the heck should anybody use gotos.

Do human lives rely on the fact that this code has to use gotos? I want to see a fight with the project owners and Uncle Bob about this topic.

If this is the programming style in these C communities then I am happy not being part of them.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Reviewing Lightning memory-mapped database librarytries++

More posts in "Reviewing Lightning memory-mapped database library" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Reviewing Lightning memory-mapped database library" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication