Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,551

filter by tags archive

Raven’s StorageMemtables are tough

time to read 3 min | 537 words

Memtables are conceptually a very simple thing. You have the list of values that you were provided, as well as a skip list for searches.


  • Memtables are meant to be used concurrently.
  • We are going to have to have to hold all of our values in memory. And I am really not sure that I want to be doing that.
  • When we switch between mem tables (and under write conditions, we might be doing that a lot), I want to immediately clear the used memory, not wait for the GC to kick in.

The first thing to do was to port the actual SkipList from the leveldb codebase. That isn’t really hard, but I had to make sure that assumptions made for the C++ memory model are valid for the CLR memory model. In particular, .NET doesn’t have AtomicPointer, but Volatile.Read / Volatile.Write are a good replacement, it seems. I decided to port the one from leveldb because I don’t know what assumptions other list implementations have made. That was the first step in order to create a memtable. The second was to decide where to actually store the data.

Here is the most important method for that part:

   public void Add(ulong seq, ItemType type, Slice key, Stream val)

The problem is that we cannot just reference this. We have to copy those values into memory that we control. Why is that? Because the use is free to change the Stream contents or the Slice’s array as soon as we return from this method. By the same token, we can’t just batch this stuff in memory, again, because of the LOH. The way this is handled in leveldb never made much sense to me, so I am going to drastically change that behavior.

In my implementation, I decided to do the following:

  • Copy the keys to our own buffer, and keep them inside the skip list. This is what we will use for actually doing searches.
  • Change the SkipList to keep track of values, as well as the key.
  • Keep the actual values in unmanaged memory, instead of managed memory. That avoid the whole LOH issue, and give me immediate control on when the memory is disposed.

This took some careful coding, because I want to explicitly give up on the GC for this. That means that I need to make damn sure that I don’t have bugs that would generate memory leak.

Each memtable would allocate 4MB of unmanaged memory, and would write the values to it. Note that you can write over 4MB (for example, by writing a very large value, or by writing a value whose length exceed the 4MB limit. At that point, we would allocate more unmanaged memory, and hand over the memory table to compaction.

The whole thing is pretty neat, even if I say so myself Smile.

More posts in "Raven’s Storage" series:

  1. (06 May 2013) Memtables are tough
  2. (03 May 2013) Understanding the SST file format
  3. (02 May 2013) Reading a Sorted String Table
  4. (29 Apr 2013) Building a Sorted String Table



Can you link to the source code in github? would be interesting to read.

Ayende Rahien

jgauffin, Sure https://github.com/ayende/temp.raven.storage


Since I've also been experimenting with a level db inspired c# storage engine recently, I am following a similar strategy with a little twist. The fixed size rotating journal/log allocates the unmanaged memory. This allows a "write to memory" model, where on flushes (e.g. end of a batched multi-writer write) this is flushed or fsynced to disk and added to the memtable, where the memtable just references the appropriate slices of unmanaged memory in the journal.

Ayende Rahien

Alex, That is exactly what I ended up doing :-)

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - 10 hours from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats