Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,026 | Comments: 44,842

filter by tags archive

Reviewing LevelDBPart XVIII–Summary

time to read 2 min | 349 words

Well, I am very happy at the conclusion of this blog post series. Beside being one of the longest that I have done, this actually stretched my ability to count using roman numerals.

In summary, I am quite happy that I spent the time reading all of this code. The LevelDB codebase is really simple, when you grok what it actually does. There is nothing there that would totally stun a person. What there is there, however, is a lot of accumulated experience in building those sort of things.

You see this all over the place, in the format of the SST, in the way compaction is working, in the ability to add filters, write merging, etc. The leveldb codebase is a really good codebase to read, and I am very happy to have done so. Especially since doing this in C++ is way out of m comfort zone. It was also interesting to read what I believe is idiomatic C++ code.

Another observation about leveldb is that it is a hard core C++ product. You can’t really take that and use the same approach in a managed environment. In particular, efforts to port leveldb to java (https://github.com/dain/leveldb/) are going to run into hard issues with problems like managing the memory. Java, like .NET, has issues with allocating large byte arrays, and even from the brief look I took, working with leveldb on java using the codebase as it is would likely put a lot of pressure there.

Initially, I wanted to use leveldb as a storage engine for RavenDB. Since I couldn’t get it compiling & working on Windows (and yes, that is a hard requirement. And yes, it has to be compiling on my machine to apply), I thought about just porting it. That isn’t going to be possible. At least not in the trivial sense. Too much work is require to make it work properly.

Yes, I have an idea, no, you don’t get to hear it at this time Smile.

More posts in "Reviewing LevelDB" series:

  1. (26 Apr 2013) Part XVIII–Summary
  2. (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
  3. (12 Apr 2013) Part XV–MemTables gets compacted too
  4. (11 Apr 2013) Part XVI–Recovery ain’t so tough?
  5. (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
  6. (09 Apr 2013) Part XIII–Smile, and here is your snapshot
  7. (08 Apr 2013) Part XII–Reading an SST
  8. (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
  9. (04 Apr 2013) Part X–table building is all fun and games until…
  10. (03 Apr 2013) Part IX- Compaction is the new black
  11. (02 Apr 2013) Part VIII–What are the levels all about?
  12. (29 Mar 2013) Part VII–The version is where the levels are
  13. (28 Mar 2013) Part VI, the Log is base for Atomicity
  14. (27 Mar 2013) Part V, into the MemTables we go
  15. (26 Mar 2013) Part IV
  16. (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
  17. (21 Mar 2013) Part II, Put some data on the disk, dude
  18. (20 Mar 2013) Part I, What is this all about?



what factors made you decide to study the leveldb codebase in particular?


With in .NET 4.5 would the issue with large byte arrays still be an issue? And does a full port need to happen when a person could use C++/CLI? Where I work we have a very extensive set of C++ libraries that we've written interop libraries for. Truth be told we've contemplated porting those C++ libraries over to C# but it's always been pushed back down as a low priority item. Given that there isn't a lot of change to the C++ libraries we get the best of both worlds, the speed of C++ and the ease of development with C#.


I'd have to imagine the interop costs with calling C++ libraries on every db call would be devastating to the overall performance. Microsoft probably has done all it can with improving CLR performance. We may see some tweaks here and there, but I don't think we'll see native C++ speeds anytime soon. Not with managed memory and garbage collection

Eventually I'd like to see a native CPU with specialized hardware logic that can execute CLR/MSIL instructions. There was a chip a few years ago that tried to do this with Java but failed to get off the ground. It doesn't have to be an entire CPU. Just a few extra thousand transistors, sort of like how Intel uses "QuickSync" to have hardware-based encode/decode of H.264 video.

Ayende Rahien

Peter, I tried to look at a bunch of other stuff as well, but leveldb was the simplest, easiest to grok and well used in production.

Ayende Rahien

Brian, Why should 4.5 change things for large byte arrays? C++/CLI doesn't work in Mono.

Ayende Rahien

JDice, You are aware that for most things, CLI is faster than C++, right? Certainly for most of the stuff that doesn't really require you to squeeze every erg of perf out of the system. See: http://www.codinghorror.com/blog/2005/05/on-managed-code-performance-again.html


hmm, didn't re-read after my post. My angle brackets got swallowed. With 4.5 and gcAllowVeryLargeObjects (and an x64 processor) you can allocate arrays larger then 2 GB. I assumed that should handle any problems with large byte arrays. But either way it sounds like not having CLI in mono is a killer anyways.

Ayende Rahien

Brian, Yeah, this isn't about gt 2GB arrays, it is about the large object heap and GC issues.

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. Technical observations from my wife (3):
    13 Nov 2015 - Production issues
  2. Production postmortem (13):
    13 Nov 2015 - The case of the “it is slow on that machine (only)”
  3. Speaking (5):
    09 Nov 2015 - Community talk in Kiev, Ukraine–What does it take to be a good developer
  4. Find the bug (5):
    11 Sep 2015 - The concurrent memory buster
  5. Buffer allocation strategies (3):
    09 Sep 2015 - Bad usage patterns
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats