Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,026 | Comments: 44,842

filter by tags archive

Reviewing LevelDBPart XIV– there is the mem table and then there is the immutable memtable

time to read 3 min | 488 words

In leveldb we have the memtable, into which all of the current writes are going and the immutable memtable. So far, I haven't really checked what it actually means.

It looks like this happens on the write, by default, a mem table is limited to about 4 MB of space. We write to the memtable (backed by the tx log) until we get to the point where we go beyond the 4MB limit (note that again, if you have large values, you might go much higher than 4MB and then we switch to another memtable, change the existing memtable to be the immutable one, and move on.

Something that might trip you is that if you write 2 big values one after the other, in separate batches, the second one might need to wait for the compaction to complete.

Here is the test code:

   std::string big(1024 * 1024 * 5, 'a');
     for(int i=0; i < 10; i++ ){
         std::clock_t start = std::clock();
         std::ostringstream o;
         o << "test" << i;
         std::string key =  o.str();

         db->Put(writeOptions, key, big);
         std::clock_t end = std::clock();
         std::cout << i << ": " << 1000.0 *(end - start) / CLOCKS_PER_SEC << "ms " << std::endl;

And the output is:

0: 20ms

1: 30ms

2: 50ms

3: 50ms

4: 50ms

5: 50ms

6: 40ms

7: 60ms

8: 50ms

9: 50ms

Not really that interesting, I'll admit. But when I tried it with 50 mb for each value, it felt like the entire machine grind to a halt. Considering the amount of times memory is copied around, and that it needs to be saved to at least two locations (log file & sst), that makes a lot of sense.

For references ,those were my timing, but I am not sure that I trust them.

0: 170ms

1: 310ms

2: 350ms

3: 460ms

4: 340ms

5: 340ms

6: 280ms

7: 530ms

8: 400ms

9: 1200ms

More posts in "Reviewing LevelDB" series:

  1. (26 Apr 2013) Part XVIII–Summary
  2. (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
  3. (12 Apr 2013) Part XV–MemTables gets compacted too
  4. (11 Apr 2013) Part XVI–Recovery ain’t so tough?
  5. (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
  6. (09 Apr 2013) Part XIII–Smile, and here is your snapshot
  7. (08 Apr 2013) Part XII–Reading an SST
  8. (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
  9. (04 Apr 2013) Part X–table building is all fun and games until…
  10. (03 Apr 2013) Part IX- Compaction is the new black
  11. (02 Apr 2013) Part VIII–What are the levels all about?
  12. (29 Mar 2013) Part VII–The version is where the levels are
  13. (28 Mar 2013) Part VI, the Log is base for Atomicity
  14. (27 Mar 2013) Part V, into the MemTables we go
  15. (26 Mar 2013) Part IV
  16. (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
  17. (21 Mar 2013) Part II, Put some data on the disk, dude
  18. (20 Mar 2013) Part I, What is this all about?


Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. Technical observations from my wife (3):
    13 Nov 2015 - Production issues
  2. Production postmortem (13):
    13 Nov 2015 - The case of the “it is slow on that machine (only)”
  3. Speaking (5):
    09 Nov 2015 - Community talk in Kiev, Ukraine–What does it take to be a good developer
  4. Find the bug (5):
    11 Sep 2015 - The concurrent memory buster
  5. Buffer allocation strategies (3):
    09 Sep 2015 - Bad usage patterns
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats