Why Raven DB?
One question that I got a few times regarding Raven is why? Richard Lopes puts the question nicely:
However as a pragmatic developer, I am wondering what new this project is offering in a saturated market where you have quite mature alternatives like CouchDB, MongoDB, Tokyo, Redis, and many more ?
Many of these products are also cross platform and run at C speed with a proven record, being used in very big web sites where their sharding capabilities and fault tolerance have been pushed far.
The answer is composed of several parts, and cover quite a bit of history.
Why Raven DB from Ayende’s point of view?
Almost two years ago, I decided that it is time that I give my Erlang reading abilities a big push and sat down and read Couch DB source code. That was quite interesting, and was one of the reasons that I got interested in that NoSQL Thing. Unfortunately, I am one of those people who have a really hard time learning by osmosis, I have to do something to truly understand it. I have used (and built) a distributed key value store in several projects, but I felt that I didn’t really have a good understanding on what it means to use a document database.
I really hate having ideas stuck in my head, they tend to ping and then someone tell me that I have been staring at a blank wall for two hours, and I realize that I just finished designing a document database. And about a year ago, it finally got bad enough that I sat down and wrote an implementation, just to get it off my chest. That was Rhino DivanDB. In most ways, it was a proof of concept, more than anything else. Just enough so I could tell myself, yes, I can do it.
Then I run into situations where a document database would be an ideal fit, except… that the available choices wouldn’t quite do what I wanted. They are all open source, however, so no problem there, right? Except that none of them are really approachable to the .NET eco system. Yes, I can do both C++ and Erlang, but I don’t really like it. Moreover, it seems like .NET support is almost an afterthought (if at all) for those projects.
There are some people who call me arrogant, but I really do think that I can do better. And I think that we did. Raven is a project where I tried a lot of new things, not from coding perspective, but from community & launch perspectives. It will be out soon, and I think you’ll be able to appreciate the level of focus on the non coding aspects of the project.
Why Raven DB from your point of view?
Raven is an OSS (with a commercial option) document database for the .NET/Windows platform. While there are other document databases around, such as CouchDB or MongoDB, there really isn’t anything that a .NET developer can pick up and use without a significant amount of friction. Those projects are excellent in what they do, but they aren’t targeting the .NET ecosystem.
Raven does, and in so doing, it brings a lot of benefits to the table. When building Raven and the supporting infrastructure, the focus was always on making sure that it did the Right Thing from the .NET developer point of view. Below you can see a more detailed analysis on Raven’s benefits, but it comes down to that. Raven is build by .NET developers for .NET developers.
Corny, isn’t it? But true nonetheless.
What does Raven DB has to offer? Raven…
- builds on existing infrastructure that is known to scale to amazing sizes (Raven’s storage can handle up to 16 terrabytes on a single machine).
- runs, natively and with no effort, on Windows. In comparison, to get CouchDB to run on Windows you start by compiling Erlang from source.
- is not just a server. You can easily (trivially) embed Raven inside your application.
- is transactional. That means ACID, if you put data in it, that data is going to stay there.
- supports System.Transactions and can take part in distributed transactions.
- allows you to define indexes using Linq queries.
- supports map/reduce operations on top of your documents using Linq.
- comes with a fully functional .NET client API, which implements Unit of Work, change tracking, read and write optimizations, and a bunch more.
- has an amazing web interface allowing you to see, manipulate and query your documents.
- is REST based, so you can access it via the java script API directly.
- can be extended by writing MEF plugins.
- has trigger support that allow you to do some really nifty things, like document merges, auditing, versioning and authorization.
- supports partial document updates, so you don’t have to send full documents over the wire.
- supports sharding out of the box.
- is available in both OSS and commercial modes.
There are probably other things, but I need to head out for a client now, so I’ll stop.
I would love to hear your opinions about it, both positive and negative.
I'm currently choosing to use it because:
Being able to embed it makes it really easy to write integration tests (Create directory, run tests, delete directory)
Easy Unit of Work implementation (Important if you want to keep a tidy system)
Because I like having to define indexes up front, the idea of just dummy querying against data without indexes defeats the whole point of having a speedy read-store.
Because it's new and offers opportunities to help shape it and talk about it at an early stage.
Because it's natively written in .NET, it means I've been able to start contributing a lot easier than if I was having to switch between different platforms constantly
Other than that I haven't got a lot of justification, which is why I keep telling people that once I've written the basic blog entries, and all the functionality I want to write works against RavenDb I'll do a comparison of the usages of couch/mongo against RavenDb and see how they actually stack up so I can have a valid opinion.
Oh, and because you're indexing into Lucene we get an awful lot of fuzzy search stuff for free!
It also looks like RavenDB may be pretty capable as the base for a content repository, which is a pretty big hole in .NET OSS. There's a partial implementation of JSR-170, but nothing else as far as I can see.
I'd like to download and play with this. Is there an example app built on RavenDB somewhere?
Oh,. and when will RavenProf be available? ;)
You're using Lucene for searching if I recall correctly?
Does your implementation happen to be transactional? At the moment a crash during a lucene index update can corrupt the index :(
Looking forward to giving this a poke regardless!
One simple question you wrote:." Raven is an OSS (with a commercial option) document database for the .NET/Windows platform." Is that mean that there are no plans to support Mono platform?
I think you have every reason to start a completely managed NoSQL implementation for .NET as there is definitely a lacking of any competing products in this space. However like any server software there are 2 parts involved in building a complete NoSQL solution, i.e. client and server. To use an RDBMS as an analogy, you have the the RDBMS server which is almost always written in C/C++ for performance reasons and a feature-rich client e.g. ADO.NET, ORM's ala NHibernate, EF, etc.
Now we all know that you are a very accomplished developer but the managed .NET runtime will not allow even you to match the speed of a high-performance network server written in C which lets you access low-level event-based async IO and native memory layout, for these reasons managed server implementations will never be as fast or as efficient as properly developed native implementations - look to nginx, memcached and Redis as prime examples.
Personally I think you've made a mistake by choosing to develop the entire stack yourself instead of focusing on creating a value-added added client solution on top of an existing established implementation. It obviously gives you a lot of advantages as you can influence your own direction and features but in the process of making a .NET implementation for the .NET ecosystem, you've made every other client a second-class citizen. To re-use the above analogy even though SQL Server is primarily used by .NET clients I think it would be a massive disadvantage if it only supported .NET clients.
With that out of the way, RavenDB does have a lot of USP's and is poised to bring a lot of competitive advantages to the table. And with your full development effort behind it, it has a very bright future indeed. For me the biggest competitive advantages are its built-in Lucene index, LINQ integration and rich-querying support, System.Transactions support, embeddable option, optional commercial support and I'm assuming your rich web interface (although yet to see it, I'm sure it will be good). If I next need a NoSQL solution with a built-in search index and rich querying support I will definitely be evaluating RavenDB.
Not wanting to detract from your open-source offering too much but your posts suggest that there isn't many viable .NET client offerings for existing NoSQL solutions. Although it's not a DocumentDB per-se, Redis is a rich data-structures server and I believe I offer one of the best clients there is (feature complete with the latest dev version of redis-server), providing much value-added functionality expected from a first-class C# client available at: code.google.com/.../ServiceStackRedis
I beg to differ. In many cases c# can perform just as well as C/C++ apps.
Take a look at this discussion on stackoverflow, stackoverflow.com/.../c-performance-vs-java-c
.NET is now so mature, that you will never experience any difference.
Although I'm not a C/C++ expert, I do believe you have to do all your memory/IO management yourself, and unless you're an optimization expert, you're likely to make code that performs worse than anything you write in C#.
WIth your argumentation we should all ditch .NET all together and go back to native languages. ASP.NET and MVC is made in C#, and if I'm not mistaken it performs quite well.
Besides, it would probably be more efficient to just add another server to the cluster if you start to reach max capacity.
I'm very interested in Raven. Downloaded and built sources and gave it a try. It's fast and easy to get hands on it (just had some troubles with authentication).
I like the developper APIs in Raven. I sampled a few other NOSQL DB and they are usually a real Pain to setup on windows, or the drivers are incomplete or immature.
I'm not considering how server is implemented to consider performance (I just don't mind...), but I felt it was a lot slower than MongoDB (it's just feeling, I should take time to measure it).
Raven beeing restfull server side, I don't get the point of Raven not beeing used with non-.Net client.
Could you gave us a clue about when Raven will get to an "official" Beta status ?
C# is mostly comparable for C/C++ in a lot of areas, one of the areas in which it is not is in high-performance network servers. In these cases using threading for concurrency adds an additional overhead. For more information look at reading up on the 'C10K Problem': http://www.kegel.com/c10k.html
This is the reason why nodejs.org exists and is looking to solve completely by having all IO operations event-based and completely asynchronous. There is an introductory PDF on the site that explains the difference and benefits of its approach. There is no need to guess here, the benchmarks speak for themselves. Look for any managed or threaded servers compared against any of the products I've mentioned, these native implementations always perform faster and require less memory.
That's actually the opposite of what I suggested. For maximum efficiency, the server should remain native and for maximum, flexibility and data accessibility there should be a client available for every language binding, e.g. In .NET's case one for C#.
Well not that it matters here, MVC performance is quite good. Not as fast or as efficient as say an equivalent nodejs application but for the type safety and productivity gains I will always choose C#. If however I wanted to build a scalable comet or web sockets server I will not be using an ASP.NET solution.
I've never advocated building custom web server applications in native code. If I suggested this much I apologize.
Any plans for replication support?
This is what I meant by second-class citizen. It's still accessible via a REST+JSON endpoint, but most of the advanced features will only be available to .NET clients. Granted this is not really a problem for the audience here, but is still a disadvantage compared with competing solutions when choosing to develop the server yourself, you don't take advantage of the existing eco-system.
Note: Although CouchDB only provides a REST interface as well, the difference with it is that all clients have equal access to the core functionality.
As most of the comments are in the whole positive, I thought I would chip in with something negative :-).
If RavenDB's main selling point is that it is written in .NET code or even C# then why is this important.?
I have been using many databases for many years and it has never been important to me that it is written in a particular language.
Is C# or the CLR not a bad fit due to the "Heavy" nature of CLR threads. Something like Erlang makes more sense as processes are cheap.
The same could be said of the parallel framework which I think RavenDb uses.
Yes, it is. It is able to recover nicely from crashes
One of the most popular NoSQL databases out there right now, that is used by Twitter, Facebook, Digg, etc is Cassandra, which is written completely in Java. So the "Heavy" CLR argument sort of falls to the way side.
Speaking of Cassandra, does RavenDB support distributed node replication? In other words, is each Key/Value also tagged with a timestamp so that it can easily be synchronized with other nodes?
Not at the moment, no.
We have some plans to do so, but they are on a backburner for now.
I can do thousands of writes per second, _transactionally_.
Your examples are all for software that gets data and save it to memory, flushing to disk on the background, if at all, totally different requirement and problem set.
I don't think so, the client uses REST, so can anyone else.
It would be trivial to add support to that for any platform that can make HTTP calls.
Mongo saves to memory, Raven saves to disk.
Raven can do crash recovery, Mongo cannot.
Raven is transactional, Mongo is not.
That is for writes, for reads, Raven should be just as fast.
Raven will be out on the 18th
I wrote a .NET chat server that supported 20,000 connected clients easily.
It is really not that hard, and yes, I used async IO for pretty much everything.
The client is communicating with the server using HTTP+REST, there isn't any secret backdoor for .NET only stuff.
It means that it is more accessible, it means not letting a new runtime into production env.
It means that a .NET dev can figure out what is wrong if he needs to.
It means that extending the server is EASY
Something similar is planned, yes.
I have not closed down the details yet. I would love to talk with you more about it, to see what you mean
Mongo also does not scale well (for the moment at least). See no offense as I did'nt meant to "compare" both tools. I'm aware that Mongo is designed for performance, so other tools will hardly compete in this field.
One other thing that makes Mongo fast for writes is that it does not return errors when writing, it immediatly return to the client and makes the write operation in the background. Is Raven working like this too ? if not (as I guess), is it something worthy to add (as an option for example) ?
If my understanding of 'nosql' is correct; the biggest advantage is horizontal scaling. if that's the case then a majority of applications wouldn't need a nosql solution. An RDBMS would do just fine.
One thing that does look nice about nosql is the direct object serialization. No need for mapping the domain to the database. This does bring up one question though. Does Raven serialize protected/private fields? Similar to how NH can map private fields.
"Because I can" is always a valid reason.
What do you mean, Mongo doesn't scale?
My reading says that it is very scalable.
Raven currently writes to disk before returning to the user. I am thinking of adding untransactional writes, but I am not sure that I want them
Yes, that is one of the major reasons, but not the only one. Flexible data model, querying capabilities and schema less nature means that it is suitable for a lot of other tasks.
By default, no. Raven allows you to customize it, though.
Glad to hear there are plans for replication support. High Availability is something our DocumentDb needs!
I've noticed on the tweet-osphere that you've been AFK for 4 hours during NHibernate talk, congratz I heard it was successful. So I'll wait for you to catch up to replying to all the comments before replying.
Just in-case its implied CouchDB, MongoDB and Redis all support atomic operations, while Redis supports custom atomic transactions (with optional command pipe-lining) albeit maybe not as flexible as RavenDB but suitable for a large class of application, high-level application locking can achieve the rest. Also because it was mentioned Redis supports a recoverable Append-only file mode and trivial replication in addition async background saves.
Agreed they are targeting 2 different ends of the NoSQL solution spectrum, but there are still some overlapping problem sets.
Sounds awesome, looks like it will still be very relevant today - it's not open source by any chance? :)
Ok I didn't know that - it makes sense I guess, which means you must have some cool LINQ 2 REST projection going on under the covers - sounds cool.
Even though it may be trivial it is really not an option for alternate application platform developers unless a supported client exists. This is one of the things you've given up by going your own route. Hopefully RavenDB will be hugely successful and you will get external developers donating rich client support. Although I suspect the .NET IDIOMS which is a benefit here may also be a disadvantage for 'foreign clients'.
Tell you the truth, the embedded option sounds very compelling in which case you stand a very good chance at succeeding in db4o's embedded persistence space, so I hope you look to fully support this feature. Will it require admin privileges?
When I am talking about transactions, I am talk about never losing writes.
Redis' append only format sounds similar, but I haven't been able to find any mention of the perf hit that this cause.
To give you an idea, Raven's approach is fsync after every command.
And no, the chat server isn't OSS, commercial work for an old client.
The embedded version requires no privileges.
This is intriguing to me. I'm just now jumping on the NoSQL movement and love what I'm seeing. The performance aspect is really what's got me sucked it. I see a a method of using the document DB as the real-time data store for the web application and running a background process that pulls the data into a SQL DB for the reporting team to use (SQL reporting services). I think this is the best of bother worlds. From what I've seen the performance can be so good that I may be able to eliminate a good deal of data caching.
I'm hoping RavenDB's read performance is on par with MongoDB (my current NoSQSL favorite). I'd love to see some bechmarks to compare. It will also be important to compare the write speeds considering RavenDB is transactional, but MongoDB isn't.
Great work and thanks for sharing it!
And I as well, in addition to data integrity. Redis supports multiple configurable modes including 'fsyncalways' so you never lose a write as well as 'fsync when one run in the event loop exists' - so that can potentially write tens of commands at once.
Although its heavily optimized, this obviously has a perf hit (ou would experience as well) since it needs multiple sync-hits to disk depending on your configuration but its a necessary trade-off for consistency.
Do you have any numbers?
The overall published benchmarks for Redis are maintained here:
Although these numbers don't mean much unless you're doing an apples-to-apples comparison against a competing solution. It is easy enough to test though because redis comes with a redis-benchmark utility to measure the performance you can get in your environment. The different supported server platforms will be a problem though since RavenDB is supported to run on windows and Redis is supported to *NIX servers. I host unsupported 32bit windows builds here but they use Cygwin so would not be a fair comparison:
The best benchmark will probably need to have the same server dual-booting into a vanilla windows and linux distro (as they are the expected production platforms):
You should be able to easily configure redis to a similar setup in how RavenDB operates as everything is configurable in a simple redis.conf file.
I actually think comparing against RavenDB will prove to be an interesting exercise - I will be curious of the results as well!
This is easily achieved in C# as the underlying I/O competition ports implemention of the Windows O/S is exposed to the C# Socket class.
A little while ago I had to write a chat server test client and I could quite easily simulate 100K clients on a single PC using this technique and CPU never went above 10%
Sounds awesome, can you port it to Ruby ;)
Those are very good client numbers, though I imagine the high-performance server handling the load on the other end would've been more difficult to implement. What was the chat server software like, i.e. was it native? as those single-server numbers are very impressive?
Otherwise it may be time to do some benchmarking of my own to compare against the leading node.js chat servers. Do you know of any open source high-performance chat servers available that are using this technique? Could prove very insightful and make a good blog entry :)
Can't disclose the chat server as it's client confidential, but my point still holds as my chat test client *was" a server; i.e. it was sending out requests to the chat server, listening for the responses, and then performing one or more actions involving sending different traffic.
Apart from the fact that the RFC is a mess, I thought at the time that a nice C# implementation of a chat server would be a good idea - Oren's next project? ;-)
Only if someone would pay for it
How about 999 internet kudos points instead :)
Seriously though, high-perf, scalable comet / web socket servers are all the rage now, that I don't think you would have too hard a time commercializing it. Apply a pluggable API for custom application events/notifications and you've got yourself a product! Make it horizontally scale and you got even more products!
We've developed a custom-specific c# windows service at work (to handle all of mflow's notifications), but it uses long-polling so pretty in-efficient, next implementation will be node.js + web sockets ftw!
@Paul, btw in-case you didn't know your blog is down :(
Maybe I haven't looked at the right part of the source code, and I'm just missing it, but it would be really nice if the client library exposed Begin/End methods that forward to the Begin/End methods on WebRequest. I know that these methods are a pain to use from C#, but they could be trivially transformed into async workflows in F# for some elegant code that doesn't tie up any ThreadPool threads while waiting for a response from the DB server.
What Mongo did is to create a shared database, which is probably going to hurt them, when the mongos server dies.
It is simpler on the face of it, because you don't have to deal with a lot of things, but it also mean that you introduced a single point of failure.
OSS is AGPL, Commerical is pretty standard commercial license.
I am not quite sure what you are actually asking here.
Please join the mailing list and give me some more information about what you ned.
So your main license is AGPL - does that mean everything built with it has to be open source? You can't even build a utility or tool without releasing the source? Or is it more like LGPL where things can be closed source, but only if they just link to the library. I am not familiar with AGPL.
Are you planning to release a commercial version? Under what terms? It seems to me that most open source projects that try a commercial version don't really do that well. I mean some do, but MySql probably has 1000 users who use it commercially illegally for every one paying user. Maybe that is ok.
I like the concepts you have here. Definitely love the .Net 4 and task library concepts. These are things that will set you apart.
If you distribute your code, including if you put it on a server, you need to make it OSS.
Yes, there will be a commercial version, under pretty standard commercial license.
We will announce it all on the 18th
@Jason - it's a misconception that Mysql has to be paid for to use it commercially
kudos for System.Transactions support
@Rob - ISTM you have to pay for it if you install it on a Windows Server
First, thanks for answering this question properly and with much details.
I am keen to say it looks good on paper but a few features are not yet ready. Still I believe they will. I know you are a devoted developer.
I still have 2 concerns about the project. They affect developers like me but probably not most of the others.
Is the project going to be cross platform ? I do like C# and .Net but I do develop mostly on Mac and target platforms from Linux to Windows. So, I switched to Mono for .Net which open cross platform opportunities.
I know a bit about Tokyo, Redis, Couch but I settled for MongoDB. To be fair, I use it with Python not with .Net (not even IronPython). Still, this project like many others have been around for a while and are field tested with a big community support and several commiters, sometimes with big names using them. Often they originated from one of those big .com. My concern here is about real life usage. RavenDB is brand new and doesn't really have this background.
Still I understand there is a lack of a .Net friendly option, but I believe you could have contributed an elegant solution to reduce the amount of friction .Net developers have with the others. There is always a possibility to build on top of the low-level APIs these stores provide.
That said, I wish I could work on such a project. There is a lot of great problems to solve here and hours of fun.
Looks great Oren ... started testing it yesterday. Got a few questions:
Is it possible to 'disable' the help pages in the server ?(removed from the HTTP UI + prevent them being stored inside the database itself)
Why is there already a ~100 Mb hit taken by the server after 1st initialization (might have an impact for embedded scenario).
I got a crash when using a TransactionScope around a session ... will send u small test case about it.
On top of what was already said, it looks amazingly cheap to do Domain-Driven oriented prototyping using RavenDB. Also, can't wait to test it paired with NServiceBus.
Re: Mono, right now, it won't run there, and running there is a low priority item.
Re: why I wrote RavenDB, see the post :-)
Yes, it is possible, just not exposed by default, I'll add that.
@Paul why would that be? I'm not an expert, but MySql is just a plain ordinary GPL license, nothing in there specifically for OS usage etc :)
I am interested in how Raven DB handles failure, and adding/removing new nodes to scale?
That depends on the mode that you use.
If you use sharding, it will split data among the shards, and it can handle add nodes dynamically.
Removing nodes (or node failures) requires replication to be able to recover that, which is something that Raven does.