Reviewing LevelDB: Part XIII–Smile, and here is your snapshot

Apr 09 2013

Reviewing LevelDBPart XIII–Smile, and here is your snapshot

time to read 3 min | 462 words

Okay, now that I know how data actually gets to the disk and from it, it is time to read how leveldb handles snapshots. Snapshots seems to be very simple on the surface. On every write, we generate a sequence number. We store the number locally and use the oldest still living snapshot as the oldest version that we have to retain when doing compaction work.

I had hard time figuring out how it worked out with regards to using this in memory. Consider the following code:

     leveldb::WriteOptions writeOptions;
     writeOptions.sync = true;
     
     db->Put(writeOptions, "test", "one");
     
     const leveldb::Snapshot* snapshot = db->GetSnapshot();
     
     db->Put(writeOptions, "test", "two");
  
    leveldb::ReadOptions readOptions;
    readOptions.snapshot = snapshot;
    std::string val;
    status = db->Get(readOptions, "test", &val);

This will properly give val == “one” when we execute it.

As it turned out, I missed something when I read the code earlier for MemTables. The value that is actually stored as the key is [key][tag]. And the tag is the sequence number + write type. And because of the way it is sorted (little endian, always), it means that higher values are sorted first. And what that means in turn is that unless you specify a specific snapshot number (which is what this tag contains, most of the time), you are going to get the latest version. But if you specify a snapshot number, you’ll get the value that was there as of that snapshot.

And that, in turn, means that we can write code like this:

Where key.memtable_key() contains the required snapshot value. So we can just skip all the ones larger than what we want.

That is really cool, but what about when we go to disk? Pretty much in the same way. The actual key value include the sequence & tag value. But the comparator knows to filter it out when needed. This is quite nice, and an elegant way to handle this situation.

Tweet Share Share 2 comments

Tags:

More posts in "Reviewing LevelDB" series:

(26 Apr 2013) Part XVIII–Summary
(15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
(12 Apr 2013) Part XV–MemTables gets compacted too
(11 Apr 2013) Part XVI–Recovery ain’t so tough?
(10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
(09 Apr 2013) Part XIII–Smile, and here is your snapshot
(08 Apr 2013) Part XII–Reading an SST
(05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
(04 Apr 2013) Part X–table building is all fun and games until…
(03 Apr 2013) Part IX- Compaction is the new black
(02 Apr 2013) Part VIII–What are the levels all about?
(29 Mar 2013) Part VII–The version is where the levels are
(28 Mar 2013) Part VI, the Log is base for Atomicity
(27 Mar 2013) Part V, into the MemTables we go
(26 Mar 2013) Part IV
(22 Mar 2013) Part III, WriteBatch isn’t what you think it is
(21 Mar 2013) Part II, Put some data on the disk, dude
(20 Mar 2013) Part I, What is this all about?

Comments

09 Apr 2013
15:16 PM

wed

Wake me up when series finishes...

10 Apr 2013
15:37 PM

alex

Thanks for the time you are taking with this excelent series.

For many people who in the past decade have become a bit rusty in their c++ and that consider LevelDB an interesting codebase incorporating some very nice ideas (and possibly wish there would be comparable, good and managed alternatives in the .NET world), dissecting it is not that trivial.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB