Raven’s Storage: Reading a Sorted String Table

time to read 4 min | 637 words

When reading a SST, we have to deal with values of potentially large sizes. I want to avoid loading anything into managed memory if I can possible avoid it. That, along with other considerations has led me to use memory mapped files as the underlying abstractions for reading from the table.

Along the way, I think that I made the following assumptions:

  • Work in little endian only.
  • Can work in 32 bits, but 64 bits are preferred.
  • Doesn’t put pressure on the .NET memory manager.

In particular, I am worried about users wanting to do store large values, or large number of keys.

As it turned out, .NET’s memory mapped files are a pretty good answer for what I wanted to do. Sure, it is a bit of a pain with regards to how to handle things like WinRT / Silverlight, etc. But I am mostly focused on server side for now. And I got some ideas on how to provide the mmap illusion on top regular streams for platforms that don’t support it.

The fact that SST is written by a single thread, and once it is written it is immutable has drastically simplified the codebase. Although I have to admit that looking at hex dump to figure out that I wrote to the wrong position is a bit of a bother, but more on that later. A lot of the code is basically the leveldb code, tweaked for .NET uses. One important difference that I made was with regards to the actual API.

  • Keys are assumed to be small. Most of the time, less than 2KB in size, and there are optimizations in place to take advantage of that. (It will still work with keys bigger than that, but will consume more memory).
  • In general, through the codebase I tried to put major emphasis on performance and memory use even at this early stage.
  • Values are not assumed to be small.

What does the last one mean?

Well, to start with, we aren’t going to map the entire file into our memory space, to start with, it might be big enough to start fragmenting our virtual address space, but mostly because there is no need. We always map just a single blokc at at time, and usually we never bother to read the values into managed memory, instead just accessing the data directly from the memory mapped file.

Here is an example of reading the key for an entry from the SST:


As you can see, we need to read just the key itself to memory, the value itself is not even touched. Also, there is some buffer management going on to make sure that we don’t need to re-allocate buffers as we are scanning through the table.

When you want to get a value, you call CreateValueStream, which gives you a Stream that you can work with. Here is how you use the API:


This is actually part of the internal codebase, we are actually storing data inside the SST that will later help us optimize things, but that is something I’ll touch on a later point in time.

Except for the slight worry that I am going to have to change the underlying approach from memory mapped files to streams if I need to run it outside the server/client, this is very cool.

Next on my list is to think on how to implement the memtable, again, without impacting too much on the managed memory.