Reviewing Lightning memory-mapped database libraryBecause, damn it!
I decided to resume my attempt to figure out the LMDB codebase. Okay, I did some more figuring out and I think that I know where I went wrong. The codebase appears to be trying small enough to fit into the CPU caches, in order to do that, they decided to make it hard enough to make sure it won’t fit into my cache. But I’m trying to work through that.
One of the things that I figured out was that I made a mistake, I kept thinking about databases in the same way you’ll see in RavenDB or SQL Server, as complete separate items. Instead, LMDB does something quite different. It basically uses a single file for everything. This is called the environment, and that can contains named dbs. If you have multiple databases, all of them are going to reside in the same file. The way it works, it creates a file and then map it into memory. On Windows, it actually allocate the space for the file upfront (which doesn’t appears to be necessary).
But the basic idea is that you need to specify upfront the size of the file you want, and that is the maximum data size you have. This can change, but only if you close the environment and start it up again with a higher value. This also explains why you have to specify the number of databases you want to have per environment. Once you created the environment, everything else become an issue of just managing the pages. In particular, because LMDB only allows a single writer, it doesn’t really have to worry about concurrency issues. It has mapped all the data into memory, and then it is just a matter of creating the appropriate data structure.
A lot of the code is dedicated to managing pages of data, and now that I have gone through enough of the codebase to be sure that I sort of have a vague idea about what is going on I can still say that I think that this is way denser than it should. And I shudder to think what it would take to make any sort of change to this codebase.
More posts in "Reviewing Lightning memory-mapped database library" series:
- (08 Aug 2013) MVCC
- (07 Aug 2013) What about free pages?
- (06 Aug 2013) Transactions & commits
- (05 Aug 2013) A thoughtful hiatus
- (02 Aug 2013) On page splits and other painful things
- (30 Jul 2013) On disk data
- (25 Jul 2013) Stepping through make everything easier
- (24 Jul 2013) going deeper
- (15 Jul 2013) Because, damn it!
- (12 Jul 2013) tries++
- (09 Jul 2013) Partial
Comments
"On Windows, it actually allocate the space for the file upfront (which doesn’t appears to be necessary)."
Nothing we do in LMDB is unnecessary. The comment on that code explains why we do it. Check the Microsoft SDK docs for yourself.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366537%28v=vs.85%29.aspx
"An attempt to map a file with a length of 0 (zero) fails with an error code of ERROR_FILE_INVALID. Applications should test for files with a length of 0 (zero) and reject those files."
re: code density - tight code is better than bloat.
Howard, Actually, no. You can allocate 1 byte in the file, then move on normally. After which, you can continue normally, and it will behave in the same fashion.
But Ayende, can you elaborate a bit more on why are you doing a 'review' of LMDB and what are your criteria? Do you want to use this library for something? Or modify it? Maybe you could start the review with describing what are LMDB's features, how good (or bad) is it's public API, how does it compare to LevelDB or BerkeleyDB, then what algorighms/data structures it uses. Code density or complexity is not a very interesting feature to look at without broader context. I suppose Howard would be more than happy to explain these things to us here if he wouldn't have to concentrate on minor/unimportant details ;)
Rafal, I am reviewing LMDB because I want to know more about it. If you want to read more about LMDB itself, you are free to go and check the docs, etc. This post is about how I approach learning this project.
Looking at the source I must say that I am happy not having to write C code, it's just unreadable and ugly.
Ayende, without your post I wouldn't even know LMDB exists so you certainly got at least one person interested in it. I had a look at the code too (without trying to understand it) and it seemed quite similar to BerkeleyDB (the API, transactions, single-file databases, general usage patterns), so maybe it's worth checking how deep is this similarity?
@Rafal
There's a really nice intro to LMDB here http://symas.com/mdb/20121107-LinuxCon-MDB.pdf
Rafal, LMDB was modeled to be very close to BDB, since that is what the project used before hand.
Ayende, you're still missing something.
"If an application specifies a size for the file mapping object that is larger than the size of the actual named file on disk and if the page protection allows write access (that is, the flProtect parameter specifies PAGE_READWRITE or PAGE_EXECUTE_READWRITE), then the file on disk is increased to match the specified size of the file mapping object. "
Windows will force the file to be fully allocated regardless. There's no point in trying to avoid it.
Rafal, yes, LMDB was modeled after BDB since we've used BDB extensively for the past dozen+ years.
The original LDAPCon 2011 presentation examines BDB and other alternatives (and why they were unsuitable for OpenLDAP)
http://symas.com/mdb/20111010LDAPCon%20MDB.pdf http://symas.com/mdb/20111010LDAPCon-MDB.pdf
The newer presentation spends less time talking about BDB's shortcomings, but the earlier paper gives you more of our motivation.
Comment preview