Ayende @ Rahien

It's a girl

Reviewing LevelDB: Part XIII–Smile, and here is your snapshot

Okay, now that I know how data actually gets to the disk and from it, it is time to read how leveldb handles snapshots. Snapshots seems to be very simple on the surface. On every write, we generate a sequence number. We store the number locally and use the oldest still living snapshot as the oldest version that we have to retain when doing compaction work.

I had hard time figuring out how it worked out with regards to using this in memory. Consider the following code:

     leveldb::WriteOptions writeOptions;
     writeOptions.sync = true;
     
     db->Put(writeOptions, "test", "one");
     
     const leveldb::Snapshot* snapshot = db->GetSnapshot();
     
     db->Put(writeOptions, "test", "two");
  
    leveldb::ReadOptions readOptions;
    readOptions.snapshot = snapshot;
    std::string val;
    status = db->Get(readOptions, "test", &val);

This will properly give val == “one” when we execute it.

As it turned out, I missed something when I read the code earlier for MemTables. The value that is actually stored as the key is [key][tag]. And the tag is the sequence number + write type. And because of the way it is sorted (little endian, always), it means that higher values are sorted first. And what that means in turn is that unless you specify a specific snapshot number (which is what this tag contains, most of the time), you are going to get the latest version. But if you specify a snapshot number, you’ll get the value that was there as of that snapshot.

And that, in turn, means that we can write code like this:

image

Where key.memtable_key() contains the required snapshot value. So we can just skip all the ones larger than what we want.

That is really cool, but what about when we go to disk? Pretty much in the same way. The actual key value include the sequence & tag value. But the comparator knows to filter it out when needed. This is quite nice, and an elegant way to handle this situation.

Comments

wed
04/09/2013 03:16 PM by
wed

Wake me up when series finishes...

alex
04/10/2013 03:37 PM by
alex

Thanks for the time you are taking with this excelent series.

For many people who in the past decade have become a bit rusty in their c++ and that consider LevelDB an interesting codebase incorporating some very nice ideas (and possibly wish there would be comparable, good and managed alternatives in the .NET world), dissecting it is not that trivial.

Comments have been closed on this topic.