Ayende @ Rahien

Refunds available at head office

Reviewing LevelDB: Part XVIII–Summary

Well, I am very happy at the conclusion of this blog post series. Beside being one of the longest that I have done, this actually stretched my ability to count using roman numerals.

In summary, I am quite happy that I spent the time reading all of this code. The LevelDB codebase is really simple, when you grok what it actually does. There is nothing there that would totally stun a person. What there is there, however, is a lot of accumulated experience in building those sort of things.

You see this all over the place, in the format of the SST, in the way compaction is working, in the ability to add filters, write merging, etc. The leveldb codebase is a really good codebase to read, and I am very happy to have done so. Especially since doing this in C++ is way out of m comfort zone. It was also interesting to read what I believe is idiomatic C++ code.

Another observation about leveldb is that it is a hard core C++ product. You can’t really take that and use the same approach in a managed environment. In particular, efforts to port leveldb to java (https://github.com/dain/leveldb/) are going to run into hard issues with problems like managing the memory. Java, like .NET, has issues with allocating large byte arrays, and even from the brief look I took, working with leveldb on java using the codebase as it is would likely put a lot of pressure there.

Initially, I wanted to use leveldb as a storage engine for RavenDB. Since I couldn’t get it compiling & working on Windows (and yes, that is a hard requirement. And yes, it has to be compiling on my machine to apply), I thought about just porting it. That isn’t going to be possible. At least not in the trivial sense. Too much work is require to make it work properly.

Yes, I have an idea, no, you don’t get to hear it at this time Smile.

Comments

peter
04/26/2013 02:07 PM by
peter

what factors made you decide to study the leveldb codebase in particular?

Brian
04/26/2013 02:22 PM by
Brian

With in .NET 4.5 would the issue with large byte arrays still be an issue? And does a full port need to happen when a person could use C++/CLI? Where I work we have a very extensive set of C++ libraries that we've written interop libraries for. Truth be told we've contemplated porting those C++ libraries over to C# but it's always been pushed back down as a low priority item. Given that there isn't a lot of change to the C++ libraries we get the best of both worlds, the speed of C++ and the ease of development with C#.

JDice
04/26/2013 06:53 PM by
JDice

I'd have to imagine the interop costs with calling C++ libraries on every db call would be devastating to the overall performance. Microsoft probably has done all it can with improving CLR performance. We may see some tweaks here and there, but I don't think we'll see native C++ speeds anytime soon. Not with managed memory and garbage collection

Eventually I'd like to see a native CPU with specialized hardware logic that can execute CLR/MSIL instructions. There was a chip a few years ago that tried to do this with Java but failed to get off the ground. It doesn't have to be an entire CPU. Just a few extra thousand transistors, sort of like how Intel uses "QuickSync" to have hardware-based encode/decode of H.264 video.

Ayende Rahien
04/27/2013 07:25 AM by
Ayende Rahien

Peter, I tried to look at a bunch of other stuff as well, but leveldb was the simplest, easiest to grok and well used in production.

Ayende Rahien
04/27/2013 07:25 AM by
Ayende Rahien

Brian, Why should 4.5 change things for large byte arrays? C++/CLI doesn't work in Mono.

Ayende Rahien
04/27/2013 07:36 AM by
Ayende Rahien

JDice, You are aware that for most things, CLI is faster than C++, right? Certainly for most of the stuff that doesn't really require you to squeeze every erg of perf out of the system. See: http://www.codinghorror.com/blog/2005/05/on-managed-code-performance-again.html

Brian
04/27/2013 02:44 PM by
Brian

hmm, didn't re-read after my post. My angle brackets got swallowed. With 4.5 and gcAllowVeryLargeObjects (and an x64 processor) you can allocate arrays larger then 2 GB. I assumed that should handle any problems with large byte arrays. But either way it sounds like not having CLI in mono is a killer anyways.

Ayende Rahien
04/28/2013 06:19 AM by
Ayende Rahien

Brian, Yeah, this isn't about gt 2GB arrays, it is about the large object heap and GC issues.

Comments have been closed on this topic.