Ayende @ Rahien

It's a girl

Reviewing Lightning memory-mapped database library: Because, damn it!

I decided to resume my attempt to figure out the LMDB codebase. Okay, I did some more figuring out and I think that I know where I went wrong. The codebase appears to be trying small enough to fit into the CPU caches, in order to do that, they decided to make it hard enough to make sure it won’t fit into my cache. But I’m trying to work through that.

One of the things that I figured out was that I made a mistake, I kept thinking about databases in the same way you’ll see in RavenDB or SQL Server, as complete separate items. Instead, LMDB does something quite different. It basically uses a single file for everything. This is called the environment, and that can contains named dbs. If you have multiple databases, all of them are going to reside in the same file. The way it works, it creates a file and then map it into memory. On Windows, it actually allocate the space for the file upfront (which doesn’t appears to be necessary).

But the basic idea is that you need to specify upfront the size of the file you want, and that is the maximum data size you have. This can change, but only if you close the environment and start it up again with a higher value. This also explains why you have to specify the number of databases you want to have per environment.  Once you created the environment, everything else become an issue of just managing the pages. In particular, because LMDB only allows a single writer, it doesn’t really have to worry about concurrency issues. It has mapped all the data into memory, and then it is just a matter of creating the appropriate data structure.

A lot of the code is dedicated to managing pages of data, and now that I have gone through enough of the codebase to be sure that I sort of have a vague idea about what is going on I can still say that I think that this is way denser than it should. And I shudder to think what it would take to make any sort of change to this codebase.

Comments

Howard Chu
07/12/2013 04:35 PM by
Howard Chu

"On Windows, it actually allocate the space for the file upfront (which doesn’t appears to be necessary)."

Nothing we do in LMDB is unnecessary. The comment on that code explains why we do it. Check the Microsoft SDK docs for yourself.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366537%28v=vs.85%29.aspx

"An attempt to map a file with a length of 0 (zero) fails with an error code of ERRORFILEINVALID. Applications should test for files with a length of 0 (zero) and reject those files."

re: code density - tight code is better than bloat.

Ayende Rahien
07/12/2013 04:37 PM by
Ayende Rahien

Howard, Actually, no. You can allocate 1 byte in the file, then move on normally. After which, you can continue normally, and it will behave in the same fashion.

Rafal
07/15/2013 09:54 AM by
Rafal

But Ayende, can you elaborate a bit more on why are you doing a 'review' of LMDB and what are your criteria? Do you want to use this library for something? Or modify it? Maybe you could start the review with describing what are LMDB's features, how good (or bad) is it's public API, how does it compare to LevelDB or BerkeleyDB, then what algorighms/data structures it uses. Code density or complexity is not a very interesting feature to look at without broader context. I suppose Howard would be more than happy to explain these things to us here if he wouldn't have to concentrate on minor/unimportant details ;)

Ayende Rahien
07/15/2013 12:28 PM by
Ayende Rahien

Rafal, I am reviewing LMDB because I want to know more about it. If you want to read more about LMDB itself, you are free to go and check the docs, etc. This post is about how I approach learning this project.

Saman
07/15/2013 03:58 PM by
Saman

Looking at the source I must say that I am happy not having to write C code, it's just unreadable and ugly.

Rafal
07/15/2013 08:10 PM by
Rafal

Ayende, without your post I wouldn't even know LMDB exists so you certainly got at least one person interested in it. I had a look at the code too (without trying to understand it) and it seemed quite similar to BerkeleyDB (the API, transactions, single-file databases, general usage patterns), so maybe it's worth checking how deep is this similarity?

Matt Warren
07/15/2013 09:01 PM by
Matt Warren

@Rafal

There's a really nice intro to LMDB here http://symas.com/mdb/20121107-LinuxCon-MDB.pdf

Ayende Rahien
07/17/2013 10:40 AM by
Ayende Rahien

Rafal, LMDB was modeled to be very close to BDB, since that is what the project used before hand.

Howard Chu
07/18/2013 10:52 PM by
Howard Chu

Howard, Actually, no. You can allocate 1 byte in the file, then move on normally. After which, you can continue normally, and it will behave in the same fashion.

Ayende, you're still missing something.

"If an application specifies a size for the file mapping object that is larger than the size of the actual named file on disk and if the page protection allows write access (that is, the flProtect parameter specifies PAGEREADWRITE or PAGEEXECUTE_READWRITE), then the file on disk is increased to match the specified size of the file mapping object. "

Windows will force the file to be fully allocated regardless. There's no point in trying to avoid it.

Howard Chu
07/18/2013 11:05 PM by
Howard Chu

Rafal, yes, LMDB was modeled after BDB since we've used BDB extensively for the past dozen+ years.

The original LDAPCon 2011 presentation examines BDB and other alternatives (and why they were unsuitable for OpenLDAP)

http://symas.com/mdb/20111010LDAPCon%20MDB.pdf http://symas.com/mdb/20111010LDAPCon-MDB.pdf

The newer presentation spends less time talking about BDB's shortcomings, but the earlier paper gives you more of our motivation.

Comments have been closed on this topic.