Ayende @ Rahien

My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969


Posts: 5,947 | Comments: 44,540

filter by tags archive

Rhino Divan DB reboot idea

Divan DB is my pet database. I created it to scratch an itch [Nitpickers: please note this!], to see if I can create Couch DB like system in .NET. You can read all about it in the following series of posts.

It stalled for a while, mostly because I run into the hard problems (building map/reduce views). But I think that I actually have a better idea, instead of trying to build something that would just mimic Couch DB, a .NET based Document DB is actually a very achievable goal.

The way it would work is actually pretty simple, the server would accept Json-formatted documents, like those:

        "id": 153,
        "type": "book",
        "name": "Storm from the Shadows",
        "authors": [
            "David Weber"
        "categories": [
            "You gotta read it"
        "avg_stars": 4.5,
        "reviews": [13,5423,423,123,512]
        "id": 1337,
        "type": "book",
        "name": "DSLs in Boo",
        "authors": [
            "Ayende Rahien",
            "Oren Eini"
        "categories": [
            "You REALLY gotta read it"
        "avg_stars": 7,
        "reviews": [843,214,451]

Querying could be done either by id, or using a query on an index. Indexes can be defined using the following syntax:

var booksByTitle = 
   from book in docs
   where book.type == "book"
   select new { book.title };

The fun part here is that this index would be translated into a Lucene index, which means that you could query the index using a query:

Query(“booksByTitle”, “title:Boo”) –> documents that match this query.

As well as apply any & all the usual Lucene tricks.

You don’t get Map/Reduce using this method, but the amount of complexity you have is quite low, and the implementation should take only several days to build.



Kyle Szklenski

Don't think we missed your little ID number for your own book - I didn't miss it last time either, just didn't have time to comment on it. ;)


It was and still is a very good idea to implement a lightweight document db in .Net. Especially if it could be used both embedded and standalone and supported System.Transactions

On the other hand, there are several JSON document databases like MongoDB mentioned above or the Persevere project, both offering very rich functionalities, so you must think what will be the niche for DivanDB and what features will make it a superior tool in that niche.


Those pointing out MongoDB and other NoSQL alternatives need to keep in mind that the .Net connectors for these products are still in there infancy. I know that Craig and Samus are working on a MongoDB connector but its not quite there yet, in my opinion. I'm not trying to down play the effort in anyway though. I just don't think I could take that to my company and say we should build our apps using it. Yet!

So the idea of anyone taking some time to put together an alternative that integrates well from the start and, in the case of Oren, supports their style of work. All while releasing the code and discussing the design choices is a good thing. Its both a learning experience for all of us and could lead to some new insight to the problem that benefits us all.

Judah Himango

and the implementation should take only several days to build.

Famous last words. :-)

But hey, if anybody could pull it off, it's you, Oren.

Andrew Stewart


Purely from a learning perspective I've been hacking up a documentdb in .net, inspired by mongodb. I've used esent as the backing store thanks to Oren for the idea.

If anyone would be interested in helping out or just looking over the codebase, you can find it below(very hacky in places still).


It's very early about a week old and only currently works in embedded mode, but thought it would add to the conversation.


Ayende Rahien


That post was written about a week ago, two days ago and today I completed the engine impl.

It is working

Judah Himango

Awesome, nice work! So when can we play with it? :-)

Rob Conery

How about instead of a Document, we build an Expression-based database :) that uses Linq internally to query the data?

You can do it... of all people :). I've mused on what it would take but... it's over my head. Your head is massive.


Can't wait to have a go at it - so is this built using Lucene.NET as a class library, i.e. I could use it in a medium trust environment even?


I guess I don't quite understand the difference between creating a JSON document DB versus just using XML or YAML. Yeah, XML isn't as readable and YAML isn't as wide spread - but unless you're only ever going to parse the result with JavaScript - what's the advantage in storing the data as JSON?

Ayende Rahien


Can you explain what you mean by expression based?

Ayende Rahien


This is using Esent, so probably not.

Ayende Rahien

JSON can be read anywhere, including on the browser, cheap to create and read, easy to understand, easy to manipulate.

Compared the cost of reading XML doc vs. JSON doc, you'll see the issues.

But it is also much more readable and ideally suited for object graph serializations


@MattMc3 I don't think the point is to STORE the data in JSON, but to interact with it in JSON. Essentially the data store would be any level of native to object representation of said data. The real advantage is having a light interaction to the database query/storage mechanism. Not to mention that queries can naturally translate via JSON based calls to/from the database layer.

@Ayende Are you planning on using IronJS + DLR for a query engine at all? I'm just curious here, as it could allow a similar interactive ability to what is offered by MongoDB. I really like what I've seen in MongoDB, and would love to see the client interactions natively available in Jaxer (spidermonkey based) and node.js (V8 based).



Darn (re: Esent) - oh well, still can't wait to see how it turns out. Question: would it be possible to do what you're looking to do with just Lucene.NET indexes?

Rob Conery

@Oren what I mean is that (and I'm arm-waving... again) the storage model here is JSON, which is groovy, but I was thinking it would be interesting to somehow store the data using an Expression Tree. The storage would be MemberAccess I spose (not sure how to best leverage it... just musing) but instead of working up JSON calls, you could use LINQ directly.

Does this make sense?

Andrew Stewart


Isnt all your asking for is a way to query the documents you store through linq? ie

using (var docDb = new DocDb())


var doc = new Document <company();

doc.Data = new Company { Name= "My Company"};


var docsFound = docDb.Query <company().Where(q=>q.Name=="My Company");


This is exactly the syntax I'm working on for my docdb, the docs contents are still stored in json though. They're just queried through linq accessed via strongly typed.



Andy Hitchman

I've had similar ideas about using Lucene as a KV store. Hit a bit of a roadblock when I realised that you have to re-open IndexReaders to pick up changes made by IndexWriters...

It looks like IndexReaders take a snapshot of the index when they open, which could be a massive scalability issue for a line-of-business transaction processing app.

I has a PoC workng very quickly, but on realising the above, I switched by backing store to Berkeley DB.

I might be wrong about Lucene's behaviour though.


Ayende Rahien


Right now, no. It actually shouldn't be hard at all.

Right now I am using Linq based expressions to do this, and it is fairly easy to work.

Ayende Rahien


What I actually need is just a way to store blobs by key in an ACID manner.

It isn't hard to write, but ESENT gives it to me for free, and is very easy to use.

It is separated into a distinct place in the app, so you can replace it if you really want to.

It might be interesting to do a BDB storage implementation.

And no, Lucene doesn't offer TX guarantees, so you need to handle this differently.

Ayende Rahien

You have wave enough to make this impossible. I don't think that what you want is even desirable.

Can you show some code here to explain that?

Ayende Rahien


The problem is that your method requires O(N) approach and running in the same address space.

Ayende Rahien


You are correct about Lucene's needing to open the index readers, that is actually a big plus, because that means that readers don't have to wait for writers, and I don't get the scalability issue.

Lucene is highly scalable.

Demis Bellot

Redis let's you store and retrieve blobs atomically, it also lets you store and retrieve lists, sets and ordered sets of blobs fast. I've developed an open source generic Redis client that can store and retrieve the entire Northwind database (3202 records) in less than 1.2 seconds (on my 3yo imac) here:


I've opted for a smaller, faster serialization format, that is over 5x faster JSON (it's effectively JSON with the quotes and whitespace removed) as the JSON serializers in .NET we're having a noticeable impact on performance:



Serialization format:


Though I have to say adding searching capabilities in Lucene is quite an interesting idea. Are you doing real-time searches with Lucene? i.e. are you adding it to the Lucene index as soon as you've added it to Rhino DB? My old problem with Lucene was it didn't used to handle updates very well (and they've recommended that you build a new index instead) but it now looks like the new version does.

Ayende Rahien


I am doing background indexing, but we are talking about 25 ms update time from insert to index update

Chris Smith

FYI, Lucene 2.9+ supports "near realtime" updates. You can get an IndexReader from an IndexWriter that will return documents that haven't been flushed yet.

You can also take a look at Zoie ( http://code.google.com/p/zoie/) which does something similar.

J Chris A

Very cool project to be doing this. I think there is definitely room for more diversity in the document database world.

I'd ask you to at least consider maintaining superficial compatibility with CouchDB (eg use _id instead of id, _rev if you have MVCC) consider reusing our JavaScript view engine (can be imported without any Erlang code at all, etc.)

There is still a lot of room to add some new value beyond CouchDB. We consider CouchDB to be a protocol even more than a database. I've started a project to port it to Ruby here: http://github.com/jchris/booth This would be a nice place to steal the map reduce code from.

I'd be really curious to see how far you can deviate from CouchDB (implementation, API, targeted use cases) and still maintain, for instance, the ability to replicate with CouchDB, reuse CouchDB design documents, etc.

What do you think of that challenge?

Ayende Rahien


I am actually thinking about having each document be composed of data & metadata, the id & rev would go in the metadata.

That would allow easier extensibility for things like adding security filters using user's code. I believe I got the idea from couch, but I am not sure.

Replication is something that I don't intend to deal with for the v1.0 release.

We have something very similar to design docs, basically a set of docs that uses Linq to define an index, which allow very efficient indexing.

Note that we don't even use the term view, mostly because it isn't a view like in couch, but rather just a way to define interesting indexes. We do allow filtering on the indexes, though, and some rather interesting document flattening.

Thanks for the pointer about booth, I'll look into that once the current crunch is over. I certainly find ruby easier to grok then erlang.

J Chris A


It sounds like you are on the right track.

One really cool benefit you have if you stay close enough to the CouchDB API is that Futon should "just work". If your API differs enough that Futon doesn't work out of the box, you could of course build an HTTP facade to mimic CouchDB.

There is, for instance a Futon 4 Mongo project that does this. Pretty cool if you ask me: http://github.com/sbellity/futon4mongo If they got cross replication with CouchDB to work, then it'd be not just cool, but seriously useful.


Lets say I want to store and query over dates. How would I do that?

Ayende Rahien

Range query, probably, you need to be more specific


Are floats, strings and lists the only data types supported in the JSON representation? If I need a higher-level data type, such as Date, how does the index get generated? Or do I have to represent it as iso-8601 and index that?

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. RavenDB Sharding (2):
    21 May 2015 - Adding a new shard to an existing cluster, the easy way
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats