LevelDB & Windows: It ain’t a love story
I have been investigating the LevelDB project for the purpose of adding another storage engine to RavenDB. The good news is that there is a very strong likelihood that we can actually use that as a basis for what we want.
The bad news is that it is insanely easy to get LevelDB to compile and work on Linux, and appears to be an insurmountable barrier to do the same on Windows.
Yes, I know that I can get it working by just using a precompiled binary, but that won’t work. I actually want to make some changes there (mostly in the C API, right now).
This instructions appears to be no longer current. And this thread was promising, but didn’t lead anywhere.
I am going to go over the codebase with a fine tooth comb, but I am no longer a C++ programmer, and the intricacies of the build system is putting a very high roadblock of frustration.
Comments
Is transaction support in leveldb sophisticated enough for RavenDB needs?
@Ayende There are so many virtualization solutions (and cloud) nowadays. I don't think we should care if something works on Windows anymore.
@rafal
LevelDB diesn't support transactions you have to implement your own as a layer on top. See indexedDB in Chrome for an example impl
LevekDB gives you atomic batched updates, but that's it
Hy oren, What is the promise of having another DB option? Do you plan to superseed esent?
Daniel
There's a couple of LevelDB ports that should compile in VS, see https://code.google.com/p/leveldbwin/ and https://code.google.com/r/kkowalczyk-leveldb/
However I think that's the problem with a Windows version of LevelDB, you're relying on someone porting the low-level parts (such as threading, mutexes and I/O) from the official Google Linux version.
Compared to Esent that MS officially supports and ships in every version of Windows, that's a big difference for something that you want to be robust and fully tested.
You might also want to take a look at this https://groups.google.com/forum/#!topic/leveldb/g_fWOcIwNDM, it should save you some time
Rafal, No, but we already have written the code to compensate for that.
j23, It matters, a lot. To start with, we are developing mostly on Windows. Having to develop on a separate platform is a huge barrier.
Daniel, I don't trust munin, and I would like to have something better.
Matt, Those are both last updated on 2011, that is a bit too old for me. And I agree on the problem there.
Matt, I know of leveldb-sharp, but it has major implementation issues making it unsuitable for what we want to do.
Ayende, you don't trust Munin you say (which I understand). I'm curious: did you have higher expectations before starting the project? Did it turn out to be to costly to get Munin to a "trustworthy" state?
Tobi, Yes, it was always going to be a toy thing, but it got pretty expensive when we run into race conditions there.
Ayende, I notice you already have an alternative to esent (not recommended for production I see) - why are you considering other engines such as leveldb instead of pursuing your own? (for interests sake)
Wal, We want something that can run on Linux. And building a proper storage engine is HARD, I want to skip doing all the hard work and take something that is already known to be working.
Having looked at Raven's ESENT table structure and the source code that interacts with it, I would think SQLite would be an obvious choice as an alternative fully cross platform extremely reliable and fast data engine.
Have you guys looked at SQLite for this purpose?
You don't really want to use a storage engine which has already made decisions about concurrency control and transactions - Sqlite wouldn't be a good fit for the first come first served model preferred by Raven.
Hey Oren, switch to Linux then you just have to type "make" ;-)
Rob,
I would be interested to to hear some more details on what is meant by "already made decisions" on SQLite vs ESENT.
Is it that SQLite only does read_uncommitted and serializable vs ESENT only doing snapshot isolation?
Control over transactions seem comparable between them.
Data types on table columns seems comparable except for no built in multi-value tagged types.
SQLite can run in memory, journal in memory, fsync off, etc. to get varying level of performance vs safety.
You can spread data across multiple files to increase concurrency among other tricks.
Hard to beat how well tested and how stable the file format is, seems like a good fit for an embedded cross platform db engine with good .Net bindings.
Of course Postgresql would give you all the concurrency, transactions and datatypes you could ask for including a full JSON type or other multi-value types, but you would need to spin up a separate process for it since it won't run in-process.
@Justin
I thought that Sqlite had issues with multi-threaded access, see http://ayende.com/blog/3400/in-search-of-an-embedded-db
Isn't SQLite using BerkeleyDB underneath?
No, ESENT
SQLite has either a database file level or in the case of shared cache mode table level locks. These are shared reade r- single writer locks but you can allow reads while writing with read uncommitted if needed.
See: http://www.sqlite.org/sharedcache.html
That mode would be ideal for Raven IMO.
And as I mentioned you could spread each of the 18 raven tables to separate files if needed and you can still do join across files, which I am not sure if you guy do. But the table level locking should give the same result.
This is obviously not as nice as snapshot isolation with MVCC that is given to you for free by more complex DB's like PostgreSQL or ESENT, but it looks like Mongo is doing just fine with a per database global lock:
http://docs.mongodb.org/manual/faq/concurrency/
Mongo's was process wide up until 2.2!
"Isn't SQLite using BerkeleyDB underneath?"
No: http://www.sqlite.org/about.html
It has minimal platform dependencies and a well documented file format:
http://www.sqlite.org/selfcontained.html http://www.sqlite.org/fileformat.html
I think someone made a SQLite wrapper for BereklyDB at one point suing virtual tables: http://www.sqlite.org/vtab.html
Hey, I totally mis-read the question above as I was on my iPhone :-)
WRT to Mongo doing DB-wide locks, that's not the same thing, it's easy to do DB-wide locks if you have a single writer thread and you don't care for fsync (as an example)
Rob,
Not sure I follow, you can turn off fsync with SQLite and use a single writer thread as well, thats up to Raven?
Of course SQLite will let you use multiple writer threads but they will wait on the same table.
That wasn't what I was saying, I was saying that Mongo gets away with its full-db locks because it works that way, and Raven doesn't work that way so it's a bad comparison.
Rob,
My point is that is an internal implementation detail to Raven. From the outside Mongo and Raven are very similar and Mongo is able to provide adequate performance, arguably as good or better than Raven even though it has a global lock.
If Mongo can do this with a global lock so should Raven, and if so then you now have the option of using a very well tested and performant database engine that's cross platform.
If this is not possible because Ravens design is so tied to a storage engine having snapshot isolation then I guess SQLite is not viable choice and my original question is answered. It will be interesting to see what solution is chosen to accomplish this goal in the end.
Ayende,
Is LevelDB going to provide better performance than Esent? If so, how much would you estimate?
Did anything ever come of looking into BangDB?
Justin, I agree, but SQLite is pretty bad at multi threaded access, at that is something that we rely heavily on.
Matt, That would NOT be ideal for RavenDB, actually. We really do need to have multiple concurrent writers at the same time. Just to give you some examples, map/reduce indexes, stats, replication, user writes. A lot of those generate concurrent writes. We actually need to make a lot of writes to the same "table" a the same time, and any storage solution we use has to reflect that.
And Mongo's decision to do that is... well, let us say that it caused a lot of problems for Mongo's users (search the forums), and Mongo doesn't have nearly as much background stuff as we do.
JDice, As I can't run it yet, I have really no way to tell. Performance is something that we would like to improve, but I have no idea how / whatever that will be the case
Matt, We tried looking at BangDB, I couldn't find the code. It is supposed to be OSS project, and I couldn't find the code (and I looked). That is the point when I gave up.
And you would like to have one storage engine at least that can work on all platforms that you want RavenDB to work on. Otherwise you could just use ESENT on Windows and LevelDB on Linux.
Chris, That is a VERY important issue, yes.
Did you check out https://github.com/hsn10/leveldb-mingw it seems like it is active.
@ayende Question is why should Google care to support Leveldb on Windows ? I don't think they have any serious server software on non *nix
j23, Wherever did I said that Google is obligated to do so?
There is a Berkeley DB back-end to SQLite which supports multiple writers: http://stackoverflow.com/questions/2824135/how-fast-is-berkeley-db-sql-compared-to-sqlite
Giorgi, I don't trust BDB at all. See my previous experiments with it.
Ayende,
The bug that you encountered is fix and there is a .Net binding for 5.3 so why not give it a try again?
Giorgi, Which bug are you talking about? And I want to be able to compile & step through the code myself.
The bug which you linked to at http://ayende.com/blog/3411/observations-on-embedded-databases
Giorgi, I lost trust in that, and that is _important_. I prefer concentrating my effort on things that didn't fall off & die the first time I touched them.
@ayende ... have you looked at https://github.com/bitcoin/bitcoin/tree/master/src/leveldb
btw we use sqlite right now and are fine with it, but are also looking into leveldb (but need cross platform)
there is also an article on that:
http://www.codeproject.com/Articles/569146/LevelDB-DLL-for-Windows-A-New-Approach-to-Exportin
Matthias, I am sorry, but I didn't know about that. At this point, however, it isn't that relevant.
Comment preview