Ayende @ Rahien

It's a girl

Reviewing LevelDB: Part V, into the MemTables we go

You can read about the theory of Sorted Strings Tables and Memtables here. In this case, what I am interested in is going a bit deeper into the leveldb codebase, and understanding how the data is actually kept in memory and what is it doing there.

In order to do that, we are going to investigate MemTable. As it turned out, this is actually a very simple data structure. A MemTable just hold a SkipList, whish is a sorted data structure that allows O(log N) access and modifications. The interesting thing about Skip List in contrast to Binary Trees, is that it is much easier to create a performant solution of concurrent skip list (either with or without locks) over a concurrently binary tree.

The data in the table is just a list of key & value (or delete marker). And that means that searches through this can give you three results:

  • Here is the value for the key (exists)
  • The value for the key was remove (deleted)
  • The value is not in the memory table (missing)

It is the last part where we get involved with the more interesting aspect of LevelDB (and the reason it is called leveldb in the first place). The notion that you have multiple levels. The mem table is the first one, and then you spill the output out to disk (the Sorted Strings Table). Now that I figure out how simple MemTable is really is, I am going to take a look at the leveldb log, and then dive into Sorted Strings Table.

Comments

Duarte Nunes
03/27/2013 10:49 AM by
Duarte Nunes

The writes being serialized, they only need to worry about concurrent readers, right? Also, how is the trimming of the list done?

In the article you link to, the author states that "Periodically, the MemTable is flushed to disk as an SSTable", but that doesn't seem to be the case. First, because that would give a window where updates would be lost, and also because we saw in a previous post that first a write is applied to the log, and only then to the memtable. Am I missing something?

Matt Warren
03/27/2013 02:08 PM by
Matt Warren

@Duarte

The log and SSTables are 2 different files, see http://leveldb.googlecode.com/git/doc/impl.html for more info. Also the flusing happens in the background after a new log has been started, so no updates are lost (see "Level 0" in the link above.

Comments have been closed on this topic.