Reviewing Lightning memory-mapped database library: going deeper

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (895) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1450) rss
ravendb.net (534) rss
reviews (184) rss

2025
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Jul 24 2013

Reviewing Lightning memory-mapped database librarygoing deeper

time to read 5 min | 842 words

Okay, I now have a pretty rough idea about how the codebase actually works. I still think that the codebase is quite ugly. For example, take a look at this:

The len parameter for CreateFile is whatever to open or create or just open (read only). But why is it in a parameter called len?

Probably because it was just there, and it would be a sin to create another local variable just for this, I guess (in a codebase where a method had > 18 local variables!). To make things more interesting, in the rest of this method, this is actually a string len variable, sigh.

At any rate, let us actually dig deeper now. The following structure is holding data about a db.

This is actually somewhat misleading, at least with regards to how I would think about a db. This is the entry point for all the pages that belong to a specific db. But a db in LMDB is not really the same thing as a db in SQL Server or RavenDB. It all reside in the same file, and you always have at least two. The first one is the free db, which is used to track all the free pages. The second one is the main db. Then you have additional, named databases.

This is used here:

This define the metadata for the entire environment. Note that we have the two dbs there in mm_dbs. The mm_txnid denotes the last committed transaction id. This value is what gives LMDB its MVCC support. The mm_last_pg value is the last used page, any transaction that wants to write will start writing at that value.

Let us see how we deal with pages here, shall we?

The first part find try to find a dirty page if we are in a read/write transaction and we haven’t specify that we can write directly to memory. This is done by doing a binary search on the list of dirty pages.

Otherwise, we can just hand the user the actual page by accessing it directly.

Next, let us look where this is actually used. Searching for a page with a specific key in it. This is done mostly in mdb_node_search.

This seems to be doing a binary search for the keys inside a specific page (in this case, the page that is at the top of the stack on the cursor). That leads to the conclusion that pages internally have data internally stored as sorted arrays.

And this leads me to another pet peeve with this code base. Take a look at this line:

Sure, this is a well known trick to cheaply divide a number by half, but are you seriously telling me that the compiler isn’t going to optimize (low + high) / 2 ? To my knowledge, no C compiler updated in the last 10 – 15 years managed to miss this optimization. So why write code that is going to be harder to read?

Okay, so now we know how we search for a specific key inside a page, but how do we get to the actual page that we want to search on? This happens on mdb_page_search_root. Let me see if I can take it apart.

When this method is called, the cursor is setup so the first page on the pages stack is the root page.

And… that is enough for me. Up until now, I have been trying to just read the code. Not execute it, not debug through it, nothing .Just go over the code one line at a time and figure out what is going on. I actually think that I have a really good grasp about what is going on in there, but I think that this is pretty much all I can do at that point from just reading the code. So I am going to stop now and setup an debug environment so I can work with it, and report my finding from stepping through the code.

Tweet Share Share 13 comments

Tags:

Comments

24 Jul 2013
18:42 PM

Jiggaboo

Using one variable for more then one purpose. If it was at least called x, y, temp but len? :) I think it would be great series where you refactor this code so is looks as it supposed to look. This is not professional code. I wouldn't accept it from interns and wouldn't let them commit it to my repository.

24 Jul 2013
19:46 PM

Rafal

But how would this help? Would renaming some variables make you suddenly understand the code?

24 Jul 2013
19:47 PM

Howard Chu

git commit 30736a0ff5baf9159e02a0562ac7bd31ea128c3d

This falls into the "who cares" category. It works, and Windows simply isn't a priority.

24 Jul 2013
21:16 PM

Howard Chu

Good question Rafal. For that specific example, is there anyone who would read that code and not understand what it's doing? Even if it's not instantly obvious, it's a standard Windows system call, and all of the parameters are already thoroughly documented.

Part of being an effective programmer is not wasting time on the things that don't matter.

25 Jul 2013
03:01 AM

Alex Spence

If you are re-using variables all over the code, it would absolutely take anyone longer to understand what its doing. I personally would spend more time saying "wtf!?", and likely even getting some of my team members to see examples of what not to do. I don't see how introducing a new local variable for its own purpose would reduce you're effectiveness. What it will do is introduce potential bugs into your code when you can't count on that variable's state, since it may have been modified somewhere for some other purpose.

25 Jul 2013
10:31 AM

Frank Quednau

"Part of being an effective programmer is not wasting time on the things that don't matter."

You're assuming that there is a universal consensus of what the things that don't matter are.

You prove that there isn't.

In most books, reusing local variables is a pretty effective way to waste processor cycles in the brain. A great deal of programmers spend more time reading code than writing code. In these circumstances it is a pretty good thing if the code undergoes some work towards readability.

25 Jul 2013
13:55 PM

Jiggaboo

@Rafal: Renaming variables is first step. Then extract functions. Highest level functions should read as normal text - Composed method pattern.

25 Jul 2013
13:57 PM

Jiggaboo

@Howard Chu: You are totally wrong. When I look at that call I have no idea what 15th parameter is for. When I see it is called len I think it should hold length of for example file to create. By writing code like that you are just wasting time of everyone that will ever read that code. If I have to think more then 5-10 seconds about what code is doing then for me this is not finished code. Same with comments. Good code doesn't need comments most of the time.

29 Jul 2013
12:26 PM

Howard Chu

Jiggaboo: if you think good code doesn't need comments you are obviously unqualified to be in this conversation. Come back after you've had an actual course in software engineering.

30 Jul 2013
18:22 PM

Daniel Moreira Yokoyama

Howard Chu: if you think good code does need comments you are obviously unqualified to be in this conversation. Come back after you've had an actual course in code quality.

30 Jul 2013
18:26 PM

Daniel Moreira Yokoyama

Code quality concerns in more than just "working software", but how to align effectiveness and readability, so it still works fine and is easy to maintain, still, it is easy to read and communicates intention. And this conversations is all about it.

31 Jul 2013
07:32 AM

Ayende Rahien

Daniel, Good code can still do things that are non obvious. For example, if I have a list of pages to write to disk, and I sort them first, then write them. Unless there is a comment there saying that we do that to avoid random seeks, it can be really hard to figure out at a later point in time.

31 Jul 2013
12:07 PM

Daniel Moreira Yokoyama

I see... the thing is, where the code can't communicate intention for itself, you should comment to keep the reader's comprehension about it... It is way different from just code as unreadable as you can and make a hell to understand what you code really does.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Reviewing Lightning memory-mapped database librarygoing deeper

More posts in "Reviewing Lightning memory-mapped database library" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

More posts in "Reviewing Lightning memory-mapped database library" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication