Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,409

filter by tags archive

Reviewing LevelDBPart XIII–Smile, and here is your snapshot

Okay, now that I know how data actually gets to the disk and from it, it is time to read how leveldb handles snapshots. Snapshots seems to be very simple on the surface. On every write, we generate a sequence number. We store the number locally and use the oldest still living snapshot as the oldest version that we have to retain when doing compaction work.

I had hard time figuring out how it worked out with regards to using this in memory. Consider the following code:

     leveldb::WriteOptions writeOptions;
     writeOptions.sync = true;
     db->Put(writeOptions, "test", "one");
     const leveldb::Snapshot* snapshot = db->GetSnapshot();
     db->Put(writeOptions, "test", "two");
    leveldb::ReadOptions readOptions;
    readOptions.snapshot = snapshot;
    std::string val;
    status = db->Get(readOptions, "test", &val);

This will properly give val == “one” when we execute it.

As it turned out, I missed something when I read the code earlier for MemTables. The value that is actually stored as the key is [key][tag]. And the tag is the sequence number + write type. And because of the way it is sorted (little endian, always), it means that higher values are sorted first. And what that means in turn is that unless you specify a specific snapshot number (which is what this tag contains, most of the time), you are going to get the latest version. But if you specify a snapshot number, you’ll get the value that was there as of that snapshot.

And that, in turn, means that we can write code like this:


Where key.memtable_key() contains the required snapshot value. So we can just skip all the ones larger than what we want.

That is really cool, but what about when we go to disk? Pretty much in the same way. The actual key value include the sequence & tag value. But the comparator knows to filter it out when needed. This is quite nice, and an elegant way to handle this situation.

More posts in "Reviewing LevelDB" series:

  1. (26 Apr 2013) Part XVIII–Summary
  2. (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
  3. (12 Apr 2013) Part XV–MemTables gets compacted too
  4. (11 Apr 2013) Part XVI–Recovery ain’t so tough?
  5. (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
  6. (09 Apr 2013) Part XIII–Smile, and here is your snapshot
  7. (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
  8. (04 Apr 2013) Part X–table building is all fun and games until…
  9. (02 Apr 2013) Part VIII–What are the levels all about?
  10. (29 Mar 2013) Part VII–The version is where the levels are
  11. (28 Mar 2013) Part VI, the Log is base for Atomicity
  12. (27 Mar 2013) Part V, into the MemTables we go
  13. (26 Mar 2013) Part IV
  14. (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
  15. (21 Mar 2013) Part II, Put some data on the disk, dude



Wake me up when series finishes...


Thanks for the time you are taking with this excelent series.

For many people who in the past decade have become a bit rusty in their c++ and that consider LevelDB an interesting codebase incorporating some very nice ideas (and possibly wish there would be comparable, good and managed alternatives in the .NET world), dissecting it is not that trivial.

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats