RavenDB 3.0 Status Update

time to read 7 min | 1260 words

We are gearing up for the RavenDB Conference in April and we just released a private alpha preview sneak peek to a few external people. But we have been working on RavenDB 3.0 for the past 18 months or so, some bits in it are actually dated from 2011(!) that we only now are able to actually put into production. That is a lot of work that is going on and it is easy to actually get lost in what is going on there.

So, without further ado, here are the major highlights of RavenDB 3.0…

Indexing

Yes, we did more work on improving indexing performance. But that is actually secondary. What we really focused on this release are operational indexing concerns.

What does this means? Well, to start with, internally we don’t use the index name any longer. That means that we can do silly things like make an index delete async. Users that have large indexes, especially map/reduce or indexes with LoadDocument calls found that deleting an index can take a very long while. Now this is no longer the case, we are now able to immediately delete the index, and actually do the cleanup in the background.

For that matter, another operational concern people has is the introduction of new indexes. Especially if the index in question covers just small part of a very large database, that used to take a very long time. The index would have to go through all the documents in the database to complete indexing. Now, we are able to make several optimizations that means that we can take just the relevant documents, and complete indexing much more quickly in this scenario.

Introducing a new index would split it entirely from the other indexes while it is running, so you won’t have to deal with a new index slowing down other indexes. And neither will big deletes impact indexing performance in this manner any longer, we have much better interleaving of that now.

Still on an operational bent, we now have much better reporting on what is actually being indexed. You can see what the indexes are doing, and act accordingly.

From a development point of view, we have added some nice things. The ability to index an attachment, so you can index big text without it residing in documents. We have also added some better situational awareness in the indexing code. We have some people who were doing… funny things there. Indexes that were producing hundreds of index entries per every document they indexed. Then we had to deal with the associated performance problems. We now can detect and warn about that, and we let the user specific the valid limits on a per index case.

There are other stuff in indexing, but I want to go over the rest of what we did for 3.0…

RavenFS

RavenFS was created because RavenDB’s attachments are nice, but they aren’t nearly good enough. We have a lot of users that want to use attachments, then they find that they can’t see them, search on them in meaningful ways, etc. More importantly, they are limited because we intended them to be of relatively small size. Users really asked us for something better, and RavenFS is the answer.

We are talking about a replicated file system, which supports very large files as well as all the facilities to work and manage them. It was explicitly designed to  be geo-distributed and it can drastically decrease the network load on systems that need to share very large files that change frequently.

I’m going to talk about RavenFS quite a bit in my keynote in the RavenDB Conference, but I think that it is really cool and there are some very nice use cases for it. Just for fun, it has been used in production by several customers for the past 2 years, so we already have some great experience with that.

JVM Client API

A fully functional JVM client opens us for more fronts with regards to who can make use of RavenDB. We already have people building applications using that. And we intend to have more clients for additional platforms after 3.0 is shipped.

Internal changes

There have been a lot of that, actually. But the one of the most important ones is that we are now hosting RavenDB on top of OWIN and Web API. The change, from our own HTTP server, was done in order to help our users have a better foundation to understand how RavenDB works internally, and to encourage contributions. It also allow us to do some nice things, like have an end point that documents all the endpoints in a RavenDB server.

Another important change is that we are moving away from the Silverlight studio in favor of a brand new HTML5 studio. That is quite exciting, especially because the performance and responsiveness of the system as a whole become so much better. And it doesn’t hurt that we don’t have to deal with the complexities of Silverlight.

Voron

This one probably got the most attention, but hopefully it will be least noticed once you actually get the bits. Voron is our new storage engine. In fact, it is more than that, it is a new way for us to store data, which we use in RavenDB to store our transactional data. The reason it is more than that is that it isn’t limited to what RavenDB currently needs to do. It can do much more, and we are already in the process of doing quite a lot more with it than one might suspect from outside. I’ll leave that for later, and just talk about what we have already done.

RavenDB now have the option of running on Voron. In fact, we have tested it with the entire RavenDB test suite (over 3,000 tests) and it passes with flying colors.

Voron will allow us to run on Linux, at some point, but it is a lot more important that it allows us to very carefully tune our storage usage and get a much better appreciation for how we are actually doing things. We expect to be able to do some really nice things with it, and it has already shown itself to be competitive with regards to performance against Esent.

Operations

There is always that, isn’t it. And there is a reason why is is last, but never least, in this list. (Try to say that repeatedly, fast Smile).

We kicked off performance counters, which caused no end of operational headaches (corrupted counters, permission issues, hanging, etc) in favor of an internal metrics library. Because it is internal, we are able to add a lot more metrics and a lot more meaningful metrics to the system.

We have new endpoints that expose even more internal states. We improved periodic backup support so it would be much nicer to work with (we now allow to define: full export every week, periodic export every day). There are quite a few goodies there available for the ops people to get insight into what is going on.

And… those are the highlights, and the code aspect of things.

We have  a team of about a dozen people working on RavenDB at this time, and we keep growing. This is quite exciting, and I’m really looking forward to getting to meet our users in the conference…

Here is a hint, there are going to be surprises…