Reviewing LevelDBPart V, into the MemTables we go
You can read about the theory of Sorted Strings Tables and Memtables here. In this case, what I am interested in is going a bit deeper into the leveldb codebase, and understanding how the data is actually kept in memory and what is it doing there.
In order to do that, we are going to investigate MemTable. As it turned out, this is actually a very simple data structure. A MemTable just hold a SkipList, whish is a sorted data structure that allows O(log N) access and modifications. The interesting thing about Skip List in contrast to Binary Trees, is that it is much easier to create a performant solution of concurrent skip list (either with or without locks) over a concurrently binary tree.
The data in the table is just a list of key & value (or delete marker). And that means that searches through this can give you three results:
- Here is the value for the key (exists)
- The value for the key was remove (deleted)
- The value is not in the memory table (missing)
It is the last part where we get involved with the more interesting aspect of LevelDB (and the reason it is called leveldb in the first place). The notion that you have multiple levels. The mem table is the first one, and then you spill the output out to disk (the Sorted Strings Table). Now that I figure out how simple MemTable is really is, I am going to take a look at the leveldb log, and then dive into Sorted Strings Table.
More posts in "Reviewing LevelDB" series:
- (26 Apr 2013) Part XVIII–Summary
- (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
- (12 Apr 2013) Part XV–MemTables gets compacted too
- (11 Apr 2013) Part XVI–Recovery ain’t so tough?
- (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
- (09 Apr 2013) Part XIII–Smile, and here is your snapshot
- (08 Apr 2013) Part XII–Reading an SST
- (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
- (04 Apr 2013) Part X–table building is all fun and games until…
- (03 Apr 2013) Part IX- Compaction is the new black
- (02 Apr 2013) Part VIII–What are the levels all about?
- (29 Mar 2013) Part VII–The version is where the levels are
- (28 Mar 2013) Part VI, the Log is base for Atomicity
- (27 Mar 2013) Part V, into the MemTables we go
- (26 Mar 2013) Part IV
- (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
- (21 Mar 2013) Part II, Put some data on the disk, dude
- (20 Mar 2013) Part I, What is this all about?
The writes being serialized, they only need to worry about concurrent readers, right? Also, how is the trimming of the list done?
In the article you link to, the author states that "Periodically, the MemTable is flushed to disk as an SSTable", but that doesn't seem to be the case. First, because that would give a window where updates would be lost, and also because we saw in a previous post that first a write is applied to the log, and only then to the memtable. Am I missing something?
The log and SSTables are 2 different files, see http://leveldb.googlecode.com/git/doc/impl.html for more info. Also the flusing happens in the background after a new log has been started, so no updates are lost (see "Level 0" in the link above.