Okay, now that I know how data actually gets to the disk and from it, it is time to read how leveldb handles snapshots. Snapshots seems to be very simple on the surface. On every write, we generate a sequence number. We store the number locally and use the oldest still living snapshot as the oldest version that we have to retain when doing compaction work.
I had hard time figuring out how it worked out with regards to using this in memory. Consider the following code:
leveldb::WriteOptions writeOptions; writeOptions.sync = true; db->Put(writeOptions, "test", "one"); const leveldb::Snapshot* snapshot = db->GetSnapshot(); db->Put(writeOptions, "test", "two"); leveldb::ReadOptions readOptions; readOptions.snapshot = snapshot; std::string val; status = db->Get(readOptions, "test", &val);
This will properly give val == “one” when we execute it.
As it turned out, I missed something when I read the code earlier for MemTables. The value that is actually stored as the key is [key][tag]. And the tag is the sequence number + write type. And because of the way it is sorted (little endian, always), it means that higher values are sorted first. And what that means in turn is that unless you specify a specific snapshot number (which is what this tag contains, most of the time), you are going to get the latest version. But if you specify a snapshot number, you’ll get the value that was there as of that snapshot.
And that, in turn, means that we can write code like this:
Where key.memtable_key() contains the required snapshot value. So we can just skip all the ones larger than what we want.
That is really cool, but what about when we go to disk? Pretty much in the same way. The actual key value include the sequence & tag value. But the comparator knows to filter it out when needed. This is quite nice, and an elegant way to handle this situation.