Reviewing LevelDB: Part XVIII–Summary

Apr 26 2013

Reviewing LevelDBPart XVIII–Summary

time to read 2 min | 349 words

Well, I am very happy at the conclusion of this blog post series. Beside being one of the longest that I have done, this actually stretched my ability to count using roman numerals.

In summary, I am quite happy that I spent the time reading all of this code. The LevelDB codebase is really simple, when you grok what it actually does. There is nothing there that would totally stun a person. What there is there, however, is a lot of accumulated experience in building those sort of things.

You see this all over the place, in the format of the SST, in the way compaction is working, in the ability to add filters, write merging, etc. The leveldb codebase is a really good codebase to read, and I am very happy to have done so. Especially since doing this in C++ is way out of m comfort zone. It was also interesting to read what I believe is idiomatic C++ code.

Another observation about leveldb is that it is a hard core C++ product. You can’t really take that and use the same approach in a managed environment. In particular, efforts to port leveldb to java (https://github.com/dain/leveldb/) are going to run into hard issues with problems like managing the memory. Java, like .NET, has issues with allocating large byte arrays, and even from the brief look I took, working with leveldb on java using the codebase as it is would likely put a lot of pressure there.

Initially, I wanted to use leveldb as a storage engine for RavenDB. Since I couldn’t get it compiling & working on Windows (and yes, that is a hard requirement. And yes, it has to be compiling on my machine to apply), I thought about just porting it. That isn’t going to be possible. At least not in the trivial sense. Too much work is require to make it work properly.

Yes, I have an idea, no, you don’t get to hear it at this time Smile .

Tweet Share Share 8 comments

Tags:

More posts in "Reviewing LevelDB" series:

(26 Apr 2013) Part XVIII–Summary
(15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
(12 Apr 2013) Part XV–MemTables gets compacted too
(11 Apr 2013) Part XVI–Recovery ain’t so tough?
(10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
(09 Apr 2013) Part XIII–Smile, and here is your snapshot
(08 Apr 2013) Part XII–Reading an SST
(05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
(04 Apr 2013) Part X–table building is all fun and games until…
(03 Apr 2013) Part IX- Compaction is the new black
(02 Apr 2013) Part VIII–What are the levels all about?
(29 Mar 2013) Part VII–The version is where the levels are
(28 Mar 2013) Part VI, the Log is base for Atomicity
(27 Mar 2013) Part V, into the MemTables we go
(26 Mar 2013) Part IV
(22 Mar 2013) Part III, WriteBatch isn’t what you think it is
(21 Mar 2013) Part II, Put some data on the disk, dude
(20 Mar 2013) Part I, What is this all about?

Comments

26 Apr 2013
14:07 PM

peter

what factors made you decide to study the leveldb codebase in particular?

26 Apr 2013
14:22 PM

Brian

With <gcAllowVeryLargeObjects> in .NET 4.5 would the issue with large byte arrays still be an issue? And does a full port need to happen when a person could use C++/CLI? Where I work we have a very extensive set of C++ libraries that we've written interop libraries for. Truth be told we've contemplated porting those C++ libraries over to C# but it's always been pushed back down as a low priority item. Given that there isn't a lot of change to the C++ libraries we get the best of both worlds, the speed of C++ and the ease of development with C#.

26 Apr 2013
18:53 PM

JDice

I'd have to imagine the interop costs with calling C++ libraries on every db call would be devastating to the overall performance. Microsoft probably has done all it can with improving CLR performance. We may see some tweaks here and there, but I don't think we'll see native C++ speeds anytime soon. Not with managed memory and garbage collection

Eventually I'd like to see a native CPU with specialized hardware logic that can execute CLR/MSIL instructions. There was a chip a few years ago that tried to do this with Java but failed to get off the ground. It doesn't have to be an entire CPU. Just a few extra thousand transistors, sort of like how Intel uses "QuickSync" to have hardware-based encode/decode of H.264 video.

27 Apr 2013
07:25 AM

Ayende Rahien

Peter, I tried to look at a bunch of other stuff as well, but leveldb was the simplest, easiest to grok and well used in production.

27 Apr 2013
07:25 AM

Ayende Rahien

Brian, Why should 4.5 change things for large byte arrays? C++/CLI doesn't work in Mono.

27 Apr 2013
07:36 AM

Ayende Rahien

JDice, You are aware that for most things, CLI is faster than C++, right? Certainly for most of the stuff that doesn't really require you to squeeze every erg of perf out of the system. See: http://www.codinghorror.com/blog/2005/05/on-managed-code-performance-again.html

27 Apr 2013
14:44 PM

Brian

hmm, didn't re-read after my post. My angle brackets got swallowed. With 4.5 and gcAllowVeryLargeObjects (and an x64 processor) you can allocate arrays larger then 2 GB. I assumed that should handle any problems with large byte arrays. But either way it sounds like not having CLI in mono is a killer anyways.

28 Apr 2013
06:19 AM

Ayende Rahien

Brian, Yeah, this isn't about gt 2GB arrays, it is about the large object heap and GC issues.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB