Reviewing LevelDB: Part VIII–What are the levels all about?

Apr 02 2013

Reviewing LevelDBPart VIII–What are the levels all about?

time to read 3 min | 567 words

When I last left things, we were still digging into the code for Version and figure out that this is a way to handle the list of files, and that it is the table cache that is actually doing IO. For this post, I want to see where continuing to read the Version code will take us.

There are some interesting things going on in here. We use the version to do a lot more. For example, because it holds all of the files, we can use it to check if they are overlaps in key spaces, which leveldb can use to optimize things later on.

I am not sure that I am following everything that is going on there, but I start to get the shape of things.

As I understand things right now. Data is stored in the following levels:

MemTable – where all the current updates are going
Immutable MemTable – which is frozen, and probably in the process of being written out
Level 0 – probably just a dump of the mem table to disk, key ranges may overlap, so you may need to read multiple files to know where a particular value is.
Level 1 to 6 – files where keys cannot overlap, so you probably only need one seek to find things. But seeks are still charged if you need to go through multiple levels to find a value.

That make sense, and it explains things like charging seeks, etc.

Now, I am currently going over VersionSet::Builder, which appears to be applying something called VersionEdit efficiently. I don't know what any of those things are yet...

VersionEdit appears to be a way to hold (and maybe store on disk?) data about pending changes to a version. This is currently not really useful because I still don't know how that is used. This means that I need to go deeper into the VersionSet code now.

VersionSet::LogAndApply looks like it is an interesting candidate for behavior.

I also found why we need VersionEdit to be persistable, we write it to the manifest file. Probably as a way to figure out what is going on when we recover. LogAndApply write the VersionEdit to a manifest file, then set the current file to point to that manifest file.

I am currently reading the Recover function, and it seems to be applying things in reverse. It reads the CURRENT file which points to the current manifest, which contains the serialized VersionEdit contents. Those are read in turn, after which we apply the edit to the in memory state.

Toward the end of VersionSet.cc file, we start seeing things regards compaction. Like:

Iterator* VersionSet::MakeInputIterator(Compaction* c)

Compaction* VersionSet::PickCompaction()

I sort of follow and sort of doesn't follow what the code is doing there. It selects the appropriate set of files that need to be compacted. I don't really get the actual logic, I'll admit, but hopefully that will become better when I read the rest of the code.

Compaction appears to work on level and level+1 files. I assume that it is going to read all of those files, merge them into level+1 and then delete (or mark for deletion) the existing files.

This is now close to midnight, and my eyes started building and the code started to do gymnastics on the screen, so I'll call it a post for now.

0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB