Reviewing LevelDB: Part XI–Reading from Sort String Tables via the TableCache

Apr 05 2013

Reviewing LevelDBPart XI–Reading from Sort String Tables via the TableCache

time to read 6 min | 1034 words

In the previous post, I focused mostly on reading the code for writing a SST to disk. But I have to admit that I am not really following how you read them back in a way that would be easy to read.

In order to understand that, I think that the right place in the code would be the TableCache. The API for that is pretty slick, here are just the header (comments were stripped).

class TableCache {
 public:
  TableCache(const std::string& dbname, const Options* options, int entries);
  ~TableCache();

  Iterator* NewIterator(const ReadOptions& options,
                        uint64_t file_number,
                        uint64_t file_size,
                        Table** tableptr = NULL);

  Status Get(const ReadOptions& options,
             uint64_t file_number,
             uint64_t file_size,
             const Slice& k,
             void* arg,
             void (*handle_result)(void*, const Slice&, const Slice&));

  void Evict(uint64_t file_number);

 private:

  Status FindTable(uint64_t file_number, uint64_t file_size, Cache::Handle**);
};

There are a couple of interesting things here that show up right away. How do you know what is the number of entries. I guess that this is stored externally, but I am not sure where. I am going to figure that question out first.

And the answer is strange:

It isn't number of entries in the table, it is the number of files? That actually say something very important, since this means that the table cache is reading multiple SST files, rather than just one per cache. Looking at the rest of the API, it makes even more sense. We need to pass the file number that we are going to be reading. That one is going to be got from the Version we are using.

Side note: I just spend an hour or so setting up a tryout project for leveldb, so I can actually execute the code and see what it does. This had me learning cmake (I am using KDevelop to read the code) and I finally got it. Still haven't figure out how to step into leveldb's code. But that is a good step.
Also, the debug experience is about 2/10 compared to VS.
And damn, I am so not a C++ developer any longer. Then again, never was a real dev on linux, anyway.

The first thing the TableCache does is to setup a cache. This is interesting, and I followed the code, it create 16 internal caches and has between them. I think that this is done to get concurrency because all the internal cache methods looks like this:

Note the mutex lock. That is pretty much how it works for all of the cache work. Looking deeper into the code, I can see another reason why I should be grateful for staying away from C++. leveldb comes with its own hash table functionality. At any rate, the cache it using LRU to evict items, and it get the number of entries from the value that was passed in the ctor. That make me think that it hold the opened files.

Speaking of the cache, here is an example of the code using it:

The cache is also used to do block caching, this is why it takes a slice as an argument. I'm going to go over that later, because this looks quite interesting. The rest of this method looks like this:

So the only very interesting bit is the Table::Open. The rest is just opening the mmap file and storing it in the cache. Note that the actual file size is passed externally. I am not really sure what this means yet. I'll go over the table code later, right now I want to focus on the table cache.

Looking over the TableCache interface, we can see that we always get the file size from outside. That got me curious enough to figure out why. And the answer is that we always have it in the FileMetaData that we previously explored. I am not sure why that is so important, but I'll ignore it for now.

The rest of TableCache is pretty simple, although this made me grin:

More specifically, look at the RegisterCleanup, this is basically registering for the delete event, so they can unref the cache. Pretty nice, all around.

The rest of the code is just forwarding calls to the Table, so that is what I'll be reading next...

Tweet Share Share 2 comments

Tags:

More posts in "Reviewing LevelDB" series:

(26 Apr 2013) Part XVIII–Summary
(15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
(12 Apr 2013) Part XV–MemTables gets compacted too
(11 Apr 2013) Part XVI–Recovery ain’t so tough?
(10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
(09 Apr 2013) Part XIII–Smile, and here is your snapshot
(08 Apr 2013) Part XII–Reading an SST
(05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
(04 Apr 2013) Part X–table building is all fun and games until…
(03 Apr 2013) Part IX- Compaction is the new black
(02 Apr 2013) Part VIII–What are the levels all about?
(29 Mar 2013) Part VII–The version is where the levels are
(28 Mar 2013) Part VI, the Log is base for Atomicity
(27 Mar 2013) Part V, into the MemTables we go
(26 Mar 2013) Part IV
(22 Mar 2013) Part III, WriteBatch isn’t what you think it is
(21 Mar 2013) Part II, Put some data on the disk, dude
(20 Mar 2013) Part I, What is this all about?

Comments

06 Apr 2013
09:00 AM

tobi

What's with all the reinterpret_casts? I'm not a C++ pro but they are supposed to be more dangerous than static_cast and should be used only if needed.

09 Apr 2013
07:04 AM

Ayende Rahien

Tobi, The docs say: "The result of a reinterpret_cast cannot safely be used for anything other than being cast back to its original type. " And that is what they are doing. They casted it first to void* when the sent the value to the function pointer.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB