Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,408

filter by tags archive

Reviewing LevelDBPart XI–Reading from Sort String Tables via the TableCache


In the previous post, I focused mostly on reading the code for writing a SST to disk. But I have to admit that I am not really following how you read them back in a way that would be easy to read.

In order to understand that, I think that the right place in the code would be the TableCache. The API for that is pretty slick, here are just the header (comments were stripped).

class TableCache {
 public:
  TableCache(const std::string& dbname, const Options* options, int entries);
  ~TableCache();

  Iterator* NewIterator(const ReadOptions& options,
                        uint64_t file_number,
                        uint64_t file_size,
                        Table** tableptr = NULL);

  Status Get(const ReadOptions& options,
             uint64_t file_number,
             uint64_t file_size,
             const Slice& k,
             void* arg,
             void (*handle_result)(void*, const Slice&, const Slice&));

  void Evict(uint64_t file_number);

 private:

  Status FindTable(uint64_t file_number, uint64_t file_size, Cache::Handle**);
};

There are a couple of interesting things here that show up right away. How do you know what is the number of entries. I guess that this is stored externally, but I am not sure where. I am going to figure that question out first.

And the answer is strange:

image

It isn't number of entries in the table, it is the number of files? That actually say something very important, since this means that the table cache is reading multiple SST files, rather than just one per cache. Looking at the rest of the API, it makes even more sense. We need to pass the file number that we are going to be reading. That one is going to be got from the Version we are using.

Side note: I just spend an hour or so setting up a tryout project for leveldb, so I can actually execute the code and see what it does. This had me learning cmake (I am using KDevelop to read the code) and I finally got it. Still haven't figure out how to step into leveldb's code. But that is a good step.

Also, the debug experience is about 2/10 compared to VS.

And damn, I am so not a C++ developer any longer. Then again, never was a real dev on linux, anyway.

The first thing the TableCache does is to setup a cache. This is interesting, and I followed the code, it create 16 internal caches and has between them. I think that this is done to get concurrency because all the internal cache methods looks like this:

image

Note the mutex lock. That is pretty much how it works for all of the cache work. Looking deeper into the code, I can see another reason why I should be grateful for staying away from C++. leveldb comes with its own hash table functionality. At any rate, the cache it using LRU to evict items, and it get the number of entries from the value that was passed in the ctor. That make me think that it hold the opened files.

Speaking of the cache, here is an example of the code using it:

image

The cache is also used to do block caching, this is why it takes a slice as an argument. I'm going to go over that later, because this looks quite interesting. The rest of this method looks like this:

image

So the only very interesting bit is the Table::Open. The rest is just opening the mmap file and storing it in the cache. Note that the actual file size is passed externally. I am not really sure what this means yet. I'll go over the table code later, right now I want to focus on the table cache.

Looking over the TableCache interface, we can see that we always get the file size from outside. That got me curious enough to figure out why. And the answer is that we always have it in the FileMetaData that we previously explored. I am not sure why that is so important, but I'll ignore it for now.

The rest of TableCache is pretty simple, although this made me grin:

image

More specifically, look at the RegisterCleanup, this is basically registering for the delete event, so they can unref the cache. Pretty nice, all around.

The rest of the code is just forwarding calls to the Table, so that is what I'll be reading next...

More posts in "Reviewing LevelDB" series:

  1. (26 Apr 2013) Part XVIII–Summary
  2. (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
  3. (12 Apr 2013) Part XV–MemTables gets compacted too
  4. (11 Apr 2013) Part XVI–Recovery ain’t so tough?
  5. (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
  6. (09 Apr 2013) Part XIII–Smile, and here is your snapshot
  7. (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
  8. (04 Apr 2013) Part X–table building is all fun and games until…
  9. (02 Apr 2013) Part VIII–What are the levels all about?
  10. (29 Mar 2013) Part VII–The version is where the levels are
  11. (28 Mar 2013) Part VI, the Log is base for Atomicity
  12. (27 Mar 2013) Part V, into the MemTables we go
  13. (26 Mar 2013) Part IV
  14. (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
  15. (21 Mar 2013) Part II, Put some data on the disk, dude

Comments

tobi

What's with all the reinterpretcasts? I'm not a C++ pro but they are supposed to be more dangerous than staticcast and should be used only if needed.

Ayende Rahien

Tobi, The docs say: "The result of a reinterpret_cast cannot safely be used for anything other than being cast back to its original type. " And that is what they are doing. They casted it first to void* when the sent the value to the function pointer.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats