Ayende @ Rahien

It's a girl

RavenDB 3.0 Status Update

We are gearing up for the RavenDB Conference in April and we just released a private alpha preview sneak peek to a few external people. But we have been working on RavenDB 3.0 for the past 18 months or so, some bits in it are actually dated from 2011(!) that we only now are able to actually put into production. That is a lot of work that is going on and it is easy to actually get lost in what is going on there.

So, without further ado, here are the major highlights of RavenDB 3.0…

Indexing

Yes, we did more work on improving indexing performance. But that is actually secondary. What we really focused on this release are operational indexing concerns.

What does this means? Well, to start with, internally we don’t use the index name any longer. That means that we can do silly things like make an index delete async. Users that have large indexes, especially map/reduce or indexes with LoadDocument calls found that deleting an index can take a very long while. Now this is no longer the case, we are now able to immediately delete the index, and actually do the cleanup in the background.

For that matter, another operational concern people has is the introduction of new indexes. Especially if the index in question covers just small part of a very large database, that used to take a very long time. The index would have to go through all the documents in the database to complete indexing. Now, we are able to make several optimizations that means that we can take just the relevant documents, and complete indexing much more quickly in this scenario.

Introducing a new index would split it entirely from the other indexes while it is running, so you won’t have to deal with a new index slowing down other indexes. And neither will big deletes impact indexing performance in this manner any longer, we have much better interleaving of that now.

Still on an operational bent, we now have much better reporting on what is actually being indexed. You can see what the indexes are doing, and act accordingly.

From a development point of view, we have added some nice things. The ability to index an attachment, so you can index big text without it residing in documents. We have also added some better situational awareness in the indexing code. We have some people who were doing… funny things there. Indexes that were producing hundreds of index entries per every document they indexed. Then we had to deal with the associated performance problems. We now can detect and warn about that, and we let the user specific the valid limits on a per index case.

There are other stuff in indexing, but I want to go over the rest of what we did for 3.0…

RavenFS

RavenFS was created because RavenDB’s attachments are nice, but they aren’t nearly good enough. We have a lot of users that want to use attachments, then they find that they can’t see them, search on them in meaningful ways, etc. More importantly, they are limited because we intended them to be of relatively small size. Users really asked us for something better, and RavenFS is the answer.

We are talking about a replicated file system, which supports very large files as well as all the facilities to work and manage them. It was explicitly designed to  be geo-distributed and it can drastically decrease the network load on systems that need to share very large files that change frequently.

I’m going to talk about RavenFS quite a bit in my keynote in the RavenDB Conference, but I think that it is really cool and there are some very nice use cases for it. Just for fun, it has been used in production by several customers for the past 2 years, so we already have some great experience with that.

JVM Client API

A fully functional JVM client opens us for more fronts with regards to who can make use of RavenDB. We already have people building applications using that. And we intend to have more clients for additional platforms after 3.0 is shipped.

Internal changes

There have been a lot of that, actually. But the one of the most important ones is that we are now hosting RavenDB on top of OWIN and Web API. The change, from our own HTTP server, was done in order to help our users have a better foundation to understand how RavenDB works internally, and to encourage contributions. It also allow us to do some nice things, like have an end point that documents all the endpoints in a RavenDB server.

Another important change is that we are moving away from the Silverlight studio in favor of a brand new HTML5 studio. That is quite exciting, especially because the performance and responsiveness of the system as a whole become so much better. And it doesn’t hurt that we don’t have to deal with the complexities of Silverlight.

Voron

This one probably got the most attention, but hopefully it will be least noticed once you actually get the bits. Voron is our new storage engine. In fact, it is more than that, it is a new way for us to store data, which we use in RavenDB to store our transactional data. The reason it is more than that is that it isn’t limited to what RavenDB currently needs to do. It can do much more, and we are already in the process of doing quite a lot more with it than one might suspect from outside. I’ll leave that for later, and just talk about what we have already done.

RavenDB now have the option of running on Voron. In fact, we have tested it with the entire RavenDB test suite (over 3,000 tests) and it passes with flying colors.

Voron will allow us to run on Linux, at some point, but it is a lot more important that it allows us to very carefully tune our storage usage and get a much better appreciation for how we are actually doing things. We expect to be able to do some really nice things with it, and it has already shown itself to be competitive with regards to performance against Esent.

Operations

There is always that, isn’t it. And there is a reason why is is last, but never least, in this list. (Try to say that repeatedly, fast Smile).

We kicked off performance counters, which caused no end of operational headaches (corrupted counters, permission issues, hanging, etc) in favor of an internal metrics library. Because it is internal, we are able to add a lot more metrics and a lot more meaningful metrics to the system.

We have new endpoints that expose even more internal states. We improved periodic backup support so it would be much nicer to work with (we now allow to define: full export every week, periodic export every day). There are quite a few goodies there available for the ops people to get insight into what is going on.

And… those are the highlights, and the code aspect of things.

We have  a team of about a dozen people working on RavenDB at this time, and we keep growing. This is quite exciting, and I’m really looking forward to getting to meet our users in the conference…

Here is a hint, there are going to be surprises…

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

peter
02/26/2014 12:26 PM by
peter

Is there a reason the conference page prevents text selection? I wanted to google durham nc and found that i couldn't select it.

Khalid Abuhakmeh
02/26/2014 12:59 PM by
Khalid Abuhakmeh

Wow, that is a lot of stuff (stating the obvious). The indexing changes sound amazing, especially about the part where you know what an index is doing. Also I am surprised that indexing in RavenDB 3.0 will be intelligent enough to select particular documents out of storage rather than cycle through all documents. That should be a huge plus.

I'm curious to see how you index and attachment, probably for another blog post. Great work and congratulations to you and the RavenDB team and community.

Ayende Rahien
02/26/2014 01:01 PM by
Ayende Rahien

Khalid, Take a look at LoadAtatchment, IIRC.

Ayende Rahien
02/26/2014 01:05 PM by
Ayende Rahien

peter, No, there isn't. We'll fix that.

J Healy
02/26/2014 03:00 PM by
J Healy

"operational indexing concerns" HUGE!!! Thanks for that - it will go far in tilting the balance in favor of using RavenDB on projects going forward...

Khalid Abuhakmeh
02/26/2014 06:47 PM by
Khalid Abuhakmeh

I found this and it looks pretty straight forward to index an attachment. https://github.com/ayende/ravendb/pull/439/files.

Dumb question based on the unit test, what if you decide to do something stupid like index an image attachment. What does that do to the Lucene index? What do the terms end up looking like?

Nick Champion
02/26/2014 09:08 PM by
Nick Champion

Following up on Khalid's comment about indexing being intelligent enough to pick the correct docs from storage and not need to cycle through all docs, does this feature rely on using voron as the storage engine or will it also work with Esent?

Sounds like some great features to look forward to!

Ayende Rahien
02/27/2014 06:43 AM by
Ayende Rahien

Khalid, We read the attachment as text. If you try to pass an image in, you'll get the same result as in opening a png in notepad.

Ayende Rahien
02/27/2014 06:43 AM by
Ayende Rahien

Nick, That feature is not dependent on the storage engine used. It will work with Esent and with Voron.

Comments have been closed on this topic.