Ayende @ Rahien

It's a girl

Awesome RavenDB feature of the Day, Compression

RavenDB stores JSON documents. Internally on disk, we actually store the values in BSON format. This works great, but there are occasions where users are storing large documents in RavenDB.

In those cases, we have found that compressing those documents can drastically reduce the on-disk size of the documents.

Before we go on, we have to explain what this is for. It isn’t actually disk space that we are trying to save, although that is a nice benefit. What we are actually trying to do is reduce the IO cost that we have when loading / saving documents. By compressing the documents before they hit the disk, we can save in valuable IO time (at the expense of using relatively bountiful CPU time). Reducing the amount of IO we use have a nice impact on performance, and it means that we can put more documents in our page cache without running out of room.

And yes, it does reduce the total disk size, but the major thing is the IO cost.

Note that we only support compression for documents, not for indexes. The reason for that is quite simple, for indexes, we are doing a lot of random reads, whereas with documents, we almost always go with the read/write the full thing.

Because of that, we would have needed to break the index apart to manageable chunks (and thus allow random reads), but that would pretty much ensure poor compression ratio. We run some tests, and it just wasn’t worth the effort.

A final thought, this feature is going to be available for RavenDB Enterprise only.

I am not showing any code because the only thing you need to do to get it to work is use:

<add key="Raven/ActiveBundles" value="Compression"/>

And everything works, just a little bit smaller on disk Smile.


Published at

Originally posted at

Comments (6)

RavenDB 1.0 & Newtonsoft.Json 4.5.7

A common complaint that we hear about RavenDB 1.0 is that it depends on Newtonsoft.Json 4.0.8, while many libraries are already using 4.5.7. We already resolved the problem once and for all in the RavenDB 1.2 branch, but that is a few months from going live yet.

Therefor, we create a new nuget package: http://nuget.org/packages/RavenDB.Client/1.0.971

This nuget package is the exact same as 960, except that we compiled it against Newtonsoft.Json 4.5.7. Note that this is only supported for the client mode, if you want to run RavenDB Server or RavenDB Embedded, it is still going to require Newtonsoft.Json 4.0.8 in the 1.0 version.

The main idea is that you can get to run against RavenDB Server using Newtonsoft.Json 4.5.7 on the client side, which is the most common scenario for RavenDB.


Published at

Originally posted at

Comments (15)

The RavenDB Tour in the states

I mentioned a few times that we have been combing Europe recently, teaching RavenDB in a lot of courses.

The time is approaching for the same thing to happen in the states as well.

In this tour, we are also going to dedicate some time to discuss some of the really awesome features that we have upcoming in the next release of RavenDB. I talked about some of them in the blog, and more is upcoming, come see us and learn all about it….


Published at

Originally posted at

Comments (2)

RavenDB Changes API on the wire

I promised that I’ll talk about the actual implementation details of how RavenDB deal with changes, after moving from SignalR to our own implementation.

First, let us examine the problem space. We need to be able to get notified by the server whenever something interesting happened. We don’t want to do active polling.

That leaves the following options:

  • TCP Connection
  • WebSockets
  • Long Polling
  • Streamed download

TCP Connections won’t work here. We are relying on HTTP for all things, and I like HTTP. It is easy to work with, there are great tools (thanks, Fiddler!) around to do that and you can debug/test/scale it without major hurdles. Writing you own TCP socket server is a lot of fun, but debugging why something went wrong is not.

WebSockets would have been a great options, but they aren’t widely available yet, and won’t work well without special servers, which I currently don’t have.

Long Polling is an option, but I don’t like it. It seems like a waste and I think we can do better.

Finally, we have the notion of a streamed download. This is basically the client downloading from the server, but instead of having the entire request download in one go, the server will send events whenever it has something.

Given our needs, this is the solution that we choose in the end.

How it works is a tiny bit complex, so let us see if I can explain with a picture. This is the Fiddler trace that you see when running a simple subscription test:


The very first thing that happens is that we make a request to /changes/events?id=CONNECTION_ID, the server is going to keep this connection open, and whenever it has something new to send to the client, it will use this connection. In order to get this to work, you have to make sure to turn off bufferring in IIS (HttpListener doesn’t do buffering) and when running in Silverlight, you have to disable read buffering. Once that is done, on the client side you need to read from the server in an async manner and raise events whenever you got a full response back.

For our purposes, we used new lines as response marker, so we would read from the stream until we got a new line, raise that event, and move on.

Now, HTTP connections are only good for one request/response. So we actually have a problem here, how do we configure this connection?

We use a separate request for that. Did you note that we have this “1/J0JP5” connection id? This is generated on the client (part an always incrementing number, part random) for each connection id. The first part is a sequential id that is used strict to help us debug things “1st request, 2nd request” are a log easier than J0JP5 or some guid.

We can then issue commands for this connection, in the sample above you can see those commands for watching a particular document and finally stopping altogether.

This is what the events connection looks like:


Each change will be a separate line.

Now, this isn’t everything, of course. We still have to deal with errors and network hiccups, we do that by aborting the events connection are retrying. On the server, we keep  track of connections and pending messages for connections, and if you reconnect within the timeout limit (a minute or so), you won’t miss any changes.

If this sounds like the way SignalR works, that is no accident. I think that SignalR is awesome software, and I copied much of the design ideas off of it.


Published at

Originally posted at

Comments (15)

Awesome RavenDB Feature of the day: Eval Patching, Part II–Denormalized References

I mentioned yesterday that I am keeping the best for today. What I am going to show you is how you can use Eval Patching for keeping track of denormalized references.

In this case, we have Users & Posts. Each Post contains a UserName property as well as the user id. When the user changes his name, we need to update all of the relevant posts.

Here is how you can do this:

    new IndexQuery{Query = "UserId:" + userId},
    new AdvancedPatchRequest
            Script = @"
var user = LoadDocument(this.UserId);
this.UserName = user.Name;

And this is a really simple scenario, the options that this opens, the ability to load a separate document and modify the current document based on its value is really powerful.


Published at

Originally posted at

Comments (20)

Silverlight streaming - the race condition is already included

I am not sure how to call this issue, except maddening. For a simple repro, you can check this github repository.

The story is quite simple, let us assume that you need to send a set of values from the server to the client. For example, they might be tick values, or updates, or anything of this sort.

You can do this by keeping an HTTP connection open and sending data periodically. This is a well known technique and it works quite well. Except in Silverlight, where it works, but only if you put the appropriate Thread.Sleep() in crucial places.

Here is an example of the server behavior:

var listener = new HttpListener
    Prefixes = {"http://+:8080/"}
while (true)
    var ctx = listener.GetContext();
    using (var writer = new StreamWriter(ctx.Response.OutputStream))

In this case, note that we are explicitly flushing the response, then just wait. If you look at the actual network traffic, you can see that this will actually be sent, the connection will remain open, and we can actually send additional data as well.

But how do you consume such a thing in Silverlight?

var webRequest = (HttpWebRequest)WebRequestCreator.ClientHttp.Create(new Uri("http://localhost:8080/"));
webRequest.AllowReadStreamBuffering = false;
webRequest.Method = "GET";

Task.Factory.FromAsync<WebResponse>(webRequest.BeginGetResponse, webRequest.EndGetResponse, null)
    .ContinueWith(task =>
        var responseStream = task.Result.GetResponseStream();


We start by making sure that we disable read buffering, then we get the response and start reading from it. The read method is a bit complex, because is has to deal with partial response, but it should still be fairly obvious what is going on:

byte[] buffer = new byte[128];
private int posInBuffer;
private void ReadAsync(Stream responseStream)
        (callback, o) => responseStream.BeginRead(buffer, posInBuffer, buffer.Length - posInBuffer, callback, o),
        responseStream.EndRead, null)
        .ContinueWith(task =>
            var read = task.Result;
            if (read == 0) 
                throw new EndOfStreamException();
            // find \r\n in newly read range

            var startPos = 0;
            byte prev = 0;
            bool foundLines = false;
            for (int i = posInBuffer; i < posInBuffer + read; i++)
                if (prev == '\r' && buffer[i] == '\n')
                    foundLines = true;
                    // yeah, we found a line, let us give it to the users
                    var data = Encoding.UTF8.GetString(buffer, startPos, i - 1 - startPos);
                    startPos = i + 1;
                    Dispatcher.BeginInvoke(() =>
                        ServerResults.Text += data + Environment.NewLine;
                prev = buffer[i];
            posInBuffer += read;
            if (startPos >= posInBuffer) // read to end
                posInBuffer = 0;
            if (foundLines == false)

            // move remaining to the start of buffer, then reset
            Array.Copy(buffer, startPos, buffer, 0, posInBuffer - startPos);
            posInBuffer -= startPos;
        .ContinueWith(task =>
            if (task.IsFaulted)

While I am sure that you could find bugs in this code, that isn’t the crucial point.

If we run the server, then run the SL client, we could see that we get just one lousy byte, and that is it. Now, reading about this, it appears that in some versions of some browsers, you need to send 4KB of data to get the connection going. But that isn’t what I have observed. I tried sending 4KB+ of data, and I still saw the exact same behavior, we got called for the first byte, and nothing else.

Eventually, I boiled it down to the following non working example:


Versus this working example:


Yes, you got it right, if I put the thread sleep in the server, I’ll get both values in the client. Without the Thread.Sleep, we get only the first byte. It seems like it isn’t an issue of size, but rather of time, and I am at an utter loss to explain what is going on.

Oh, and I am currently awake for 27 hours straight, most of them trying to figure out what the )(&#@!)(DASFPOJDA(FYQ@YREQPOIJFDQ#@R(AHFDS:OKJASPIFHDAPSYUDQ)(RE is going on.


Published at

Originally posted at

Comments (22)

Awesome RavenDB Feature of the day: Evil Patching

Oh, wait, that is actually Eval Patching.

From the very start, RavenDB supported the ability to patch documents. To send a command to the server with some instructions about how to modify a document or a set of documents. For example, we have this:

                        new PatchRequest
                                Type = PatchCommandType.Add,
                                Name = "Comments",
                                Value = RavenJObject.FromObject(comment)

This approach works, is easy to understand and support, and is quite simple to implement.

Unfortunately, it is limited. Users have all sort of very complex scenarios that they want to run that we aren’t really suitable for. For example, if a user wanted to move from a FirstName, LastName properties to FullName, this won’t give that to you.

Enter Matt Warren, who has contributed some really nice features to RavenDB (like facets), and who contributed the ability to do patching by sending a JavaScript function to the server.

Here is how it works using the new syntax:

    new AdvancedPatchRequest
        Script = "this.Comments.push(newComment)",
        Values = {{"newComment", comment}}

Note that you can send variables to the server and they are exposed to your script.

How about our previous example of moving from FirstName, LastName to FullName? Let us see:

 new IndexQuery{Query = "Tag:Users"},
 new AdvancedPatchRequest
        Script = @"
this.FullName = this.FirstName + ' ' + this.LastName;
delete this.FirstName;
delete this.LastName;

So we support full computation abilities during the patch Smile. So now you can just modify things pretty much as you feel like.

Here are a few other interesting things you can do.

Remove an item by value from an array:

    new AdvancedPatchRequest
        Script = "this.Tags.Remove(tagToRemove)"
        Values = {{"tagToRemove", "Interesting"}}

Remove an item using a condition:

    new AdvancedPatchRequest
        Script = "this.Comments.RemoveWhere(function(comment) { return comment.Spam; });"

This isn’t all, mind, but I’ll keep the really cool part for my next post.


Published at

Originally posted at

Comments (29)

It uses async, run for the hills (On .Net 4.0)

One of the major problems in .NET 4.0 async operation stuff is the fact that an unobserved exception will ruthlessly kill your application.

Let us look at an example:


On startup, check the server for any updates, without slowing down my system startup time. All well and good, as long as that server is reachable.

When it doesn’t, it will throw an exception, but not on the current thread, it will be thrown on another thread, and when the task is finalized, it will raise an UnobservedTaskException. Okay, so I’ll fix that and write code like this:

CheckForUpdatesAsync().ContinueWith(task=> GC.KeepAlive(task.Exception));

And that would almost work, except the implementation of CheckForUpdateAsync is:

private static Task CheckForUpdatesAsync()
    var webRequest = WebRequest.Create("http://myserver.com/update-check");
    webRequest.Method = "POST";
    return webRequest.GetRequestStreamAsync()
        .ContinueWith(task => task.Result.WriteAsync(CurrentVersion))
        .ContinueWith(task => webRequest.GetResponseAsync())
        .ContinueWith(task => new StreamReader(task.Result.GetResponseStream()).ReadToEnd())
        .ContinueWith(task =>
                              if (task.Result != "UpToDate")

Note the highlighted line, where we are essentially ignoring the failure to write to the server. That task is going to go away unobserved, the result, when GC happens, you’ll have an unobserved task exception.

This sort of error has all of the fun aspects of a good problem:

  • Only happen during errors
  • Async in nature
  • Bring down your application
  • Error location and error notification are completely divorced from one another

It is actually worse than having a memory leak!

This post explains some of the changes made with regards to unobserved exceptions in 4.5, and I wholeheartedly support this, but in 4.0, writing code that uses the TPL is easy and fun, but require careful code review to make sure that you aren’t leaking an unobserved exception.


Published at

Originally posted at

Comments (20)

Rant: SignalR, Crazyiness, Head Butting & Wall Crashing

Before I get to the entire story, a few things:

  • The SignalR team is amazingly helpful.
  • SignalR isn’t released, it is a 0.5 release.
    • Even so, the version that I was using was the very latest, not even the properly released 0.5 version.
  • My use cases are probably far out from what SignalR is set out to support.
  • A lot of the problems were actually my fault.

One of the features for 1.2 is the changes features, a way to subscribe to notifications from the databases, so you won’t have to poll for them. Obviously, this sounded like a good candidate for SingalR, so I set out to integrate SignalR into RavenDB.

Now, that ain’t as simple as it sounds.

  • SignalR relies on Newtonsoft.Json, which RavenDB also used to use. The problem with version compact meant that we ended up internalizing this dependency, so we have had to resolve this first.
  • RavenDB runs in IIS and as its own (HttpListener based) host. SignalR does the same, but makes assumptions about how it runs.
  • We need to minimize connection counts.
  • We need to support logic & filtering for events on both server side and client side.

The first two problems we solved by brute force. We internalized the SignalR codebase and converted its Netwonsoft.Json usage to the RavenDB’s internalize version. Then I wrote modified one of the SignalR hosts to allow us to integrate that with the way RavenDB works.

So far, that was relatively straightforward process. Then we had to write the integration parts. I posted about the external API yesterday.

My first attempt to write it was something like this:

    public class Notifications : PersistentConnection
        public event EventHandler Disposed = delegate { }; 
        private HttpServer httpServer;
        private string theConnectionId;

        public void Send(ChangeNotification notification)
            Connection.Send(theConnectionId, notification);
        public override void Initialize(IDependencyResolver resolver)
            httpServer = resolver.Resolve<HttpServer>();

        protected override System.Threading.Tasks.Task OnConnectedAsync(IRequest request, string connectionId)
            this.theConnectionId = connectionId;
            var db = request.QueryString["database"];
                throw new ArgumentException("The database query string element is mandatory");

            httpServer.RegisterConnection(db, this);

            return base.OnConnectedAsync(request, connectionId);

        protected override System.Threading.Tasks.Task OnDisconnectAsync(string connectionId)
            Disposed(this, EventArgs.Empty);
            return base.OnDisconnectAsync(connectionId);

This is the very first attempt. I then added the ability to add items of interest via the connection string, but that is the basic idea.

It worked, I was able to write the feature, and aside from some issues that I had grasping things, everything was wonderful. We had passing tests, and I moved on to the next step.

Except that…. sometimes…. those tests failed.  Once every so often, and that indicate a race condition.

It took a while to figure out what was going on, but basically, what happened was that sometimes, SignalR uses a long polling transport to send messages. Note the code above, we register for events as long as we are connected. In long polling system (and in general in persistent connections that may come & go), it is quite common to have periods of time where you aren’t actually connected.

The race condition would happen because of the following sequence of events:

  • Connected
  • Got message (long pooling, cause disconnect)
  • Disconnect
  • Message raised, client is not connected, message is gone
  • Connected
  • No messages for you

I want to emphasize that this particular issue is all me. I was the one misusing SignalR, and the behavior makes perfect sense.

SignalR actually contains a message bus abstraction exactly for those reasons. So I was supposed to use that. I know that now, but then I decided that I probably using the API at the wrong level, and moved to use hubs and groups.

In this way, you could connect to the hub, request to join to the group watching a particular document, and voila, we are done. That was the theory, at least. In practice, this was very frustrating.  The first major issue was that I just couldn’t get this thing to work.

The relevant code is:

    return temporaryConnection.Start()
                .ContinueWith(task =>

                    hubConnection = temporaryConnection;
                    proxy = hubConnection.CreateProxy("Notifications");

Note that I create the proxy after the connection has been established.

That turned out to be an issue, you have to create the proxy first, then call start. If you don’t, SignalR will look like it is working fine, but will ignore all hub calls. I had to trace really deep into the SignalR codebase to figure that one out.

In my opinion (already communicated to the team) is that if you start a hub without a proxy, that is probably an error and should throw.

Once we got that fix, things started to work, and the test run.

Most of the time, that is. Once in a while, the tests would fail. Again, the issue was a race condition. But I wasn’t doing anything wrong, I was using SignalR’s API in a way straight out of the docs. This turned out to be a probably race condition inside InProcessMessageBus, where because of multiple threads running, registering for a group inside SignalR isn’t visible on the next request.

That was extremely hard to debug.

Next, I decided to do away with hubs, by this time, I had a lot more understanding of the way SignalR worked, and I decided to go back to persistent connections, and simply implement the message dispatch in my code, rather than rely on SignalR groups.

That worked, great. The tests even passed more or less consistently.

The problem was that they also crashed the unit testing process, because of leaked exceptions. Here is one such case, in HubDispatcher.OnRecievedAsync():

 return resultTask
                .ContinueWith(_ => base.OnReceivedAsync(request, connectionId, data))

Note that “_” parameter. This is a convention I use as well, to denote a parameter that I don’t care for). The problem here is that this parameter is a task, and if this task failed, you have a major problem, because on .NET 4.0, this will crash your system. In 4.5, that is fine and can be safely ignored, but RavenDB runs on 4.0.

So I found those places and I fixed them.

And then we run into hangs. Specifically, we had issues with disposing of connections, and sometimes of not disposing them, and…

That was the point when I cut it.

I like the SignalR model, and most of the codebase is really good. But it is just not in the right shape for what I needed. By this time, I already have a pretty good idea about how SignalR operates, and it was a work of a few hours to get it working without SignalR. RavenDB now sports a streamed endpoint that you can register yourself to, and we have a side channel that you can use to send commands on to the server. It might not be as elegant, but it is simpler by a few orders of magnitude, and once we figure that out, we have a full blown working system at our hands.  All the test passes, we have no crashes, yeah!

I will post exactly on how we did it in a future post.


Published at

Originally posted at

Comments (17)

Awesome feature of the day, RavenDB Changes API

This was a really hard feature. I’ll discuss exactly why and how in my next post, but for now, I pretty much want to gloat about this. We now have the ability to subscribe to events from the server on the client.

This opens up some really nice stories for complex apps, but for now, I want to show you what it does:


You can add subscribe to multiple documents (or even all documents) and you can also subscribe to changes in indexes as well.

Why is this an awesome feature? It opens up a lot of interesting stories.

For example, let us assume that the user is currently editing an order. You can use this feature to detect, at almost no cost, when someone have changed that order, saving the user frustration when / if he tries to save his changes and get a concurrency exception.

You can also use this to subscribe to a particular index and update in memory caches on update, so the data is kept in memory, and you don’t have to worry about your cache being stale, because you’ll be notified when it does and can act on this.

You can even use this to watch for documents of a particular type coming in and do something about that. For example, you might setup a subscription for all alerts, and whenever any part of the system writes a new alert, you will show that to the user.

The last one, by the way, is a planned feature for RavenDB Studio itself. As well as a few others that I’ll keep hidden for now Smile.


Published at

Originally posted at

Comments (27)

System vs. User task security: Who pays the sports writer?

Let us assume for a moment that we are building a system for a sports site. We have multiple authors, submitting articles, and we pay each author for those articles.

The data model might look like this:


In this post, I want to talk about the security implications of such a system. Typically, this gets translated to requirements such as:

  • Authors can edit their articles.
  • Authors cannot modify / view any payments.

Which very often gets boiled down to something like this:


What do you think of such a system? My approach, this is a horrible mess altogether. Think what it means for something like this:

public ActionResult SubmitArticle(Article article)
        return View();


    var payment = GetOrCreatePaymentFor(article.Author);


    return RedirectToAction("index");

In order to run, this code would actually have to run under several different security credentials in order to work successfully.

That is before we take into account how using multiple users for different operations would result in total chaos for small things like connection pooling.

In real world systems, the security can’t really operate based on the physical structure of the data in the data store. It is far too complex to manage. Instead, we implement security by separating the notion of the System performing tasks (such as adding a payment for an article) that are system tasks, and the System performing tasks on behalf of  the user.

The security rules are implemented in the system, and the application user have no physical manifestations (such as being DB users) in the system at all.

And to the commentators, I know there are going to be some of you are going to claim that physical security at the database level is super critical, but while you are doing that, please also answer the problems of connection pooling and the complexities of multiple security contexts required for most real world business operations.

Implications of design decisions: Read Striping

When using RavenDB replication, you have the option to do something that is called “Read Striping”. Instead of using replication only for High Availability and Disaster Recovery, we can also spread our reads among all the replicating servers.

The question then becomes, how would you select which server to use for which read request?

The obvious answer is to do something like this:

Request Number % Count of Servers = current server index.

This is simple to understand, easy to implement and oh so horribly wrong in a subtle way that it is scary.

Let us look at the following scenario, shall we?


  • Session #1 loads users/3 from Server A
  • Session #1 then query (with non stale results) from Server B
  • The users/3 document hasn’t been replicated to Server B yet, so Server B sends back a reply “here are my (non stale) results)”
  • Session #1 assumes that users/3 must be in the results, since they are non stale, and blows up as a result.

We want to avoid such scenarios.

Werner Vogels, Amazon’s CTO, has the following consistency definitions:

  • Causal consistency. If process A has communicated to process B that it has updated a data item, a subsequent access by process B will return the updated value and a write is guaranteed to supersede the earlier write. Access by process C that has no causal relationship to process A is subject to the normal eventual consistency rules.
  • Read-your-writes consistency. This is an important model where process A after it has updated a data item always accesses the updated value and never will see an older value. This is a special case of the causal consistency model.
  • Session consistency. This is a practical version of the previous model, where a process accesses the storage system in the context of a session. As long as the session exists, the system guarantees read-your-writes consistency. If the session terminates because of certain failure scenarios a new session needs to be created, and the guarantees do not overlap the sessions.
  • Monotonic read consistency. If a process has seen a particular value for the object any subsequent accesses will never return any previous values.
  • Monotonic write consistency. In this case the system guarantees to serialize the writes by the same process. Systems that do not guarantee this level of consistency are notoriously hard to program.

What is the consistency model offered by this?

Request Number % Count of Servers = current server index.

As it turned out, it is pretty much resolves into “none”. Because each request may be re-directed into a different server, there is no consistency guarantees that we can make.  That can make reasoning about the system really hard.

Instead, RavenDB choose to go another route, and we use the following formula to calculate which server we will use.

Session Number % Count of Servers = current server index.

By using a session, rather than a request, counter to decide which server we will use when spreading reads, we ensure that all of the requests made within a single session are always going to the same server, ensuring we have session consistency. Within the scope of a single session, we can rely on that consistency, instead of the chaos of no consistency at all.

It is funny, how such a small change can have such profound implication.


Published at

Originally posted at

Comments (9)

Unprofessional code, take II–Answer

Yesterday, I called this code unprofessional, and it is.

The reason that I call this code unprofessional is that it is a black box. Oh, sure, this is a great code in terms of readability and understand what is going on. But it is also a very complex piece of code, with a lot of things that are going on and it is in a critical location.

This sort of code cannot be a black box. Case in point, take a look at this email thread. We have a customer who is trying to understand why dynamic indexes are being created, when he has the proper indexes to answer query.

And I had to ask a lot of questions to figure out what the problem was. Turned out, it is by design issue, but we could fix that Smile.

That is not a good place to be. I want my customers to be able to figure those things on their own.

That is why the code now looks like this:


And you can ask RavenDB to explain how he got to a specific decision.

For example, when we want to figure out why a specific index was selected (or, more likely, was not selected), we can issue the following request:


Which will give us the following output:

      "Reason":"Can't choose an index with a different number of from clauses / SelectMany, will affect queries like Count()."
{ "Index":"Freebies/Search", "Reason":"Index does not apply to entity name: Orders" }, { "Index":"Orders/Search", "Reason":"The following fields are missing: " }, { "Index":"Orders/Stats", "Reason":"Can't choose a map/reduce index for dynamic queries." }, { "Index":"Products/Stats", "Reason":"Can't choose a map/reduce index for dynamic queries." }, { "Index":"Raven/DocumentsByEntityName", "Reason":"Query is specific for entity name, but the index searches across all of them, may result in a different type being returned." }, { "Index":"Test", "Reason":"Index does not apply to entity name: Orders" }, { "Index":"Trials/Search", "Reason":"Index does not apply to entity name: Orders" }, { "Index":"Auto/Orders/ByOrderNumber", "Reason":"Selected as best match" } ]

As you can see, that is a pretty good way to debug things, in production, without any complex steps or dedicated tools.

And, like in the previous example, reducing support costs is important.


Published at

Originally posted at

Comments (12)

Unprofessional code, take II

Since the previous time that I posted unprofessional code, I got a lot of comments about how I am cruel to poor developers, I decided to post some of my own code that isn’t professional.

The following is part of the RavenDB Dynamic Query Optimizer:


Hint, this single expression goes on for about 120 lines (the length isn’t the reason it is not professional) and is quite important for RavenDB.


Published at

Originally posted at

Comments (38)

The importance of temporary indexes for ad hoc queries

In RavenDB, when you make an query without explicitly specifying which index you want to use, the query optimizer will select one for you. If none is found that can satisfy your query, the query optimizer will create one that matches your query, but it will do so on a temporary basis. That index will function normally, but if it isn’t used, it will be removed after a while. If it is heavily used, it will be converted to a real index.

I was reminded of this today when I realized that I had a bug in our code that caused a value to be misspelled. There were just a few such documents, and I went into the studio and fixed them manually. I had to use a dynamic query to do so, and I was amused to realize that this is the exact scenario for which we built them. An admin doing ad hoc operations, probably to resolve some bug or issue.

Sometimes, just having things work out the way you planned make for a great day. And hey, this is what I was avoiding:


Published at

Originally posted at

Good errors

I am trying out the new Encryption support in RavenDB 1.2 (written by Dor), and the very first thing I got was a failure:

The Raven/Encryption/Key setting must be set to an encryption key. The key should be in base 64, and should be at least 8 bytes long.
You may use EncryptionSettings.GenerateRandomEncryptionKey() to generate a key. If you'd like, here's a key that was randomly generated: <add key="Raven/Encryption/Key" value="3w17MIVIBLSWZpzH0YarqRlR2+yHiv1Zq3TCWXLEMI8=" />

I like this!


Published at

Originally posted at

Comments (5)

Raven Suggest: Review & Options

Phil Jones has posted some demo code for doing suggestions in RavenDB.

        public JsonResult CompanySuggestions(string term)
            var rq = RavenSession.Query<Company, Companies_QueryIndex>();

            var ravenResults = rq.Search(x => x.Name, string.Format("*{0}*", term), 
escapeQueryOptions: EscapeQueryOptions.AllowAllWildcards,
options: SearchOptions.And) .Take(5) .ToList(); return Json(ravenResults.Select(x => new { id = x.Id, value = x.Name, description = string.Format("({0}, {1})", x.Category, x.Location) })); }


And the data looks like this:


The Companies_QueryIndex is a simple one, with the Name field marked as analyzed.

This code works, but it isn’t ideal. In fact, it contains an major problem. It uses string.Format("*{0}*", term), which is identical to doing a Foo LIKE ‘%’ + @term +’%’ in SQL. And bad for the same reason, it means that we can’t use the index efficiently, but have to scan the entire index.

The test app also populate the database with about 30,000 documents, enough to get a handle on how thins are going, even if this is a relatively small data set. Let us see how this actually behaves:



There are a few things to note here:

  • The query time is significant. – Leading wildcard queries tend to be expensive, they are essentially an O(N) query.
  • The results are… interesting. – In particular, look at the first few results, they are a match, but not what I would have expected.

A very simple change would be:



All I did was remove the leading wildcard, and suddenly the results look at a lot nicer, if only because they are closer to what I am actually searching on. By the way, note that in the last result, we are actually finding a company whose second name starts with Lua, it isn’t just a raw StartsWith, since we are actually working on individual tokens levels, not the entire name level. This is because we indexed the Name as analyzed.

Let us see how this plays out in the actual app, first, using *term*:


Now, using term*:


That is somewhat better, but the results are still poor in terms of search relevance. Let us try something a little bit different:

 var ravenResults = rq
    .Search(x => x.Name, term)
    .Search(x=>x.Name, term + "*", escapeQueryOptions: EscapeQueryOptions.AllowPostfixWildcard
.Take(5) .ToList();

Here we force the search to search for the user’s actual term, as well as terms that starts with it. This means that any results that actually contains the name will stand up:


The most important thing about searching is to know that the problem is NOT that you can’t find enough information, it is that you find too much information that is not relevant.

But a better alternative all together might be using RavenDB suggest feature:

public JsonResult CompanySuggestions(string term)
    var rq = RavenSession.Query<Company, Companies_QueryIndex>()
        .Search(x => x.Name, term)

    var ravenResults = rq

    if(ravenResults.Count < 5)
        var suggestionQueryResult = rq.Suggest();

        ravenResults.AddRange(RavenSession.Query<Company, Companies_QueryIndex>()
                                  .Search(x => x.Name, string.Join(" ", suggestionQueryResult.Suggestions))
                                  .Take(5 - ravenResults.Count));

    return Json(ravenResults.Select(x => new
        id = x.Id,
        value = x.Name,
        description = string.Format("({0}, {1})", x.Category, x.Location)

Which result in:


What we are doing here is basically query the database for exact matches. Then, if we have no matches, we ask RavenDB to suggest additional words, and then query for those.

Here are the raw network traffic:

Query:  Name:<<lua>>
        Time: 1 ms
        Index: Companies/QueryIndex
        Results: 1 returned out of 1 total.

    Index: Companies/QueryIndex
    Term: lua
Time: 1 ms Query: Name:<<wlua luaz luka luau laua luoa lue luw lut lum luj lui lug luf lwa>> Time: 3 ms Index: Companies/QueryIndex Results: 4 returned out of 22 total.

As you can see, the speed difference between this and the first version is non trivial. More than that, note that it found companies with names such as “wlua”, which was also found in the first (*term*) version.

Just about any searching strategy would require that you take into account the dataset, the customer requirements, the user behavior and many more. But I would start with the suggestion option before I would go to anything as brute force as *term*.


Published at

Originally posted at

Comments (5)

RavenDB, .NET memory management and variable size obese documents

We just got a support issue from a customer, regarding out of control memory usage of RavenDB during indexing. That was very surprising, because a few months ago I spent a few extremely intense weeks making sure that this won’t happen, building RavenDB auto tuning support.

Luckily, the customer was able to provide us with a way to reproduce things locally. And that is where things get interesting. Here are a few fun facts, looking at the timeline of the documents, it was something like this:


Note that the database actually had several hundreds of thousands of documents, and the reason I am showing you this merely to give you some idea about the relative sizes.

As it turned out, this particular mix of timeline sizes is quite unhealthy for RavenDB during the indexing period. Why?

RavenDB has a batch size, the number of documents that would be indexed in a particular batch. This is used to balance between throughput and latency in RavenDB. The higher the batch, the higher the latency, but the bigger the throughput.

Along with the actual number of documents to index, we also have the need to balance things like CPU and memory usage. RavenDB assumes that the cost of processing a batch of documents is roughly related to the number of documents.

In other words, if we just used 1 GB to index 512 documents, we would probably use roughly 2 GB to index the next 1,024 documents. This is a perfectly reasonable assumption to make, but it also hide an implicit assumption in there, that the size of documents is roughly the same across the entire data set. This is important because otherwise, you have the following situation:

  • Index 512 documents – 1 GB consumed, there are more docs, there is more than 2 GB of available memory, double batch size.
  • Index 1,024 documents – 2.1 GB consumed, there are more docs, there is more than 4 GB of available memory, double batch size.
  • Index 2,048 documents – 3 GB consumes, there are more docs, there is enough memory, double batch size.
  • Index 4,092 documents -  and here we get to the obese documents!

By the time we get to the obese documents, we have already increased our batch size significantly, so we are actually trying to read a LOT of documents, and suddenly a lot of them are very big.

That caused RavenDB to try to consume more and more memory. Now, if it HAD enough memory to do so, it would detect that it is using too much memory, and drop back, but the way this dataset is structured, by the time we get there, we are trying to load tens of thousands of documents, many of them are in the multi megabyte range.

This was pretty hard to fix, not because of the actual issue, but because just reproducing this was tough, since we had other issues just getting the data in. For example, if you were trying to import this dataset in, and you choose a batch size that was greater than 128, you would also get failures, because suddenly you had a batch of documents that were extremely large, and all of them happened to fall within a single batch, resulting in a error saving them to the database.

The end result of this issue is that we now take into account actual physical size in many more places inside RavenDB, and that this error has been eradicated. We also have much nicer output for the smuggler tool Smile.

On a somewhat related note. RavenDB and obese documents.

RavenDB doesn’t actually have a max document size limitation. In contrast to other document databases, which have a hard limit at 8 or 16 MB, you can have a document as big as you want*. It doesn’t mean that you should work with obese documents. Documents that are multi megabytes tend to be… awkward to work with, and they generally aren’t respecting the most important aspect of document modeling in RavenDB, follow the transaction boundary. What does it means that it is awkward to work with obese documents?

Just that, it is awkward. Serialization times are proportional to the document time, as are retrieval time from the server, and of course, the actual memory usage on both server and client are impacted by the size of the documents. It is often easier to work with smaller documents that a few obese ones.

* Well, to be truthful, we do have a hard limit, it is somewhere just short of the 2 GB mark, but we don’t consider this realistic.


Published at

Originally posted at

Comments (6)

Geo Location & Spatial Searches with RavenDB–Part VII–RavenDB Client vs. Separate REST Service

In my previous post, I discussed how we put the GeoIP dataset in a separate database, and how we access it through a separate session. I also asked, why use RavenDB Client at all? I mean, we might as well just use the REST API and expose a service.

Here is how such a service would look like, by the way:

public class GeoIPClient : IDisposable
    private readonly HttpClient httpClient;

    public GeoIPClient(string url, ICredentials credentials)
        httpClient = new HttpClient(new HttpClientHandler{Credentials = credentials})
            BaseAddress = new Uri(url),

    public Task<Location> GetLocationByIp(IPAddress ip)
        if (ip.AddressFamily != AddressFamily.InterNetwork)
            return null;

        var reverseIp = (long)BitConverter.ToUInt32(ip.GetAddressBytes().Reverse().ToArray(), 0);

        var query = string.Format("Start_Range:[* TO 0x{0:X16}] AND End_Range:[0x{0:X16} TO NULL]", reverseIp);

        return httpClient.GetAsync("indexes/Locations/ByRange?pageSize=1&" + query)
            .ContinueWith(task => task.Result.Content
                .ContinueWith(task1 => task1.Result.Results.FirstOrDefault())).Unwrap();


    public void Dispose()

I think that you can agree that this is fairly simple and easy to understand. It make it explicit that we are just going to query the database and it is even fairly easy to read.

Why not go with that route?

Put simply, because it is doing only about 10% of the things that we do in the RavenDB Client. The first thing that pops to mind is that this service doesn’t support caching, HTTP ETag responses, etc. That means that we would have to implement that ourselves. This is decidedly non trivial.

The RavenDB Client will automatically cache all data for you if it can, you don’t have to think about it, worry about it or even pay it any mind. It is just there and working hard to make sure that you application is more performant.

Next, this will only support Windows Authentication. RavenDB also support OAuth, so if you wanted to run this on RavenHQ, for example, which requires OAuth. You would have to write some additional stuff as well.

Finally, using the RavenDB Client leaves us open to do additional things in the future very easily, while using a dedicate service means that we are on the hook for implementing from scratch basically anything else that we want.

Sure, we could implement this service using RavenDB Client, but that is just adding layers, and I really don’t like that. There is no real point.

Reviewing NAppUpdate

I was asked to review NAppUpdate, a simple framework for providing auto-update support to .NET applications, available here. Just for to make things more interesting, the project leader of NAppUpdate is Itamar, who works for Hibernating Rhinos, and we actually use NAppUpdate in the profiler.

I treated it as an implementation detail and never actually looked at it closely before, so this is the first time that I am actually going over the code. On first impression, there is nothing that makes me want to hurl myself to the ocean from a tall cliff:


Let us dig deeper, and almost on the first try, we hit something that I seriously dislike.


Which leads to this:

public static class Errors
    public const string UserAborted = "User aborted";
    public const string NoUpdatesFound = "No updates found";

And I wouldn’t mind that, except that those are used like this:


There are actually quite a lot of issues with this small code sample. To start with, LastestError? Seriously?!

LatestError evoke strong memories of GetLastError() and all the associated fun with that.

It doesn’t give you the ability to multiple errors, and it is a string, so you can’t put an exception into it (more on that later).

Also, note how this work with the callback and the return code. Both of which have a boolean for success/failure. That is wrong.

That sort of style was valid for C, but .NET, we actually have exceptions, and they are actually quite a nice way to handle things.

Worse than that, it means that you have to check the return value, then go the LatestError and check what is going on there, except… what happen if there was an actual error?


Note the todo, it is absolutely correct. You really can’t just call a ToString() on an exception and get away with it (although I think that you should). There are a lot of exceptions where you would simply won’t get the required information. ReflectionException, for example, ask you to look at the LoaderExceptions property, and there are other such things.

In the same error handling category, this bugs me:


This is an exception that implements the serialization exception, but doesn’t have the [Serializable] attribute, which all exceptions should have.

Moving on, a large part of what NAU does is to check remote source for updates, then apply it. I liked this piece of code:

public class AppcastReader : IUpdateFeedReader
    // http://learn.adobe.com/wiki/display/ADCdocs/Appcasting+RSS

    #region IUpdateFeedReader Members

    public IList<IUpdateTask> Read(string feed)
        XmlDocument doc = new XmlDocument();
        XmlNodeList nl = doc.SelectNodes("/rss/channel/item");

        List<IUpdateTask> ret = new List<IUpdateTask>();

        foreach (XmlNode n in nl)
            FileUpdateTask task = new FileUpdateTask();
            task.Description = n["description"].InnerText;
            //task.UpdateTo = n["enclosure"].Attributes["url"].Value;
            task.UpdateTo = n["enclosure"].Attributes["url"].Value;

            FileVersionCondition cnd = new FileVersionCondition();
            cnd.Version = n["appcast:version"].InnerText;
            task.UpdateConditions.AddCondition(cnd, BooleanCondition.ConditionType.AND);


        return ret;


This is integrating with an external resources, and I like that this is simple to read and understand. I don’t have a lot of infrastructure going on in here that I have to deal with just to get what I want done.

There is a more complex feed reader for an internal format that allows you to use the full option set of NAU, but it is a bit complex (it does a lot more, to be fair), and again, I dislike the error handling there.

Another thing that bugged me on some level was this code:


The issue, and I admit that this is probably academic, is what happen if the string is large. I try to avoid exposing API that might force users to materialize a large data set in memory. This has implications on the working set, the large object heap, etc.

Instead, I would probably exposed a TextReader or even a Stream.

Coming back to the error handling, we have this:

FileDownloader fd = null;
if (Uri.IsWellFormedUriString(url, UriKind.Absolute))
    fd = new FileDownloader(url);
else if (Uri.IsWellFormedUriString(baseUrl, UriKind.Absolute))
    fd = new FileDownloader(new Uri(new Uri(baseUrl, UriKind.Absolute), url));
else if (Uri.IsWellFormedUriString(new Uri(new Uri(baseUrl), url).AbsoluteUri, UriKind.Absolute))
    fd = new FileDownloader(new Uri(new Uri(baseUrl), url));

if (fd == null)
    throw new ArgumentException("The requested URI does not look valid: " + url, "url");

My gripe is with the last line. Instead of doing it like this, I would have created a new Uri and let it throw the error. It is likely that it will have much more accurate error information about the actual reason this Uri isn’t valid.

On the gripping hand, we have this guy:


This is the abstraction that NAU manages, the Prepare() method is called to do all the actual work (downloading files, for example) and Execute() is called when we are done and just want to do the actual update.

I know that I am harping about this, but this is really important. Take a look at another error handling issue (this is representative of the style of coding inside this particular class) in the RegistryTask:


So, I have an update that is failing at a customer site. What error do I have? How do I know what went wrong?

Worse, here is another snippet from the FileUpdateTask:


For some things, an exception is thrown, for other, we just return false.

I can’t follow the logic for this, and trying to diagnose issues in production can be… challenging.

In summary, I went through most of the actual important bits of NAppUpdate, and like in most reviews, I focused on the stuff that can be improved. I didn’t touch on the stuff it does well, and that is its job.

We have been using NAppUpdate for the last couple of years, with very few issues. It quite nicely resolves the entire issue of auto updates to the point were in our code, we only need to call a few methods and it takes cares of the entire process for us.

Major recommendations from this codebase:

  • Standardize the error handling approach. It is important.
  • And even more important, logging is crucial to be able to diagnose issues in the field. NAU should log pretty much everything it does and why it does it.

This will allow later diagnosal of issues with relative ease, vs. “I need you to reproduce this and then break into the debugger”.

RavenDB and complex tagging

In the RavenDB mailing list, we got a question about tagging. In this case, the application need:

1. Tags have identity  ("set" has a different meaning if I'm talking math, music or sports).

2. I want to know who tagged what and when.

2. I want to do this once, as a service, so i don’t need have ids in each document i want to tag. In my app, there are many such document types.

Let us see how we can approach this in RavenDB. We are going to do it like so:


Note that because tags have identity, we store only the tag id inside the tagged object, along with the required information about who & when it was tagged.

Now, let us try to have some fun with this. Let us say that I want to be able to show, given a specific album, all the albums that have any of the same tags as the specified album.

We start by defining the following index:


Note that the naming convention matches what we would expect using the default Linq convention, so we can easily query this index using Linq.

And now we want to query it, which, assuming that we are starting with albums/1, will look like:


This translate to “show me all of the albums that share any of the specified tags, except albums/1”.

And this is pretty much it, to be fair. Oh, if you want to show the tags names you’ll have to include the actual tags documents, but there really isn’t anything complex going on.

But what about the 3rd requirement?

Well, it isn’t really meaningful. You can move this Tags collection to a the layer super type, but if you want to be able to do nice tagging with RavenDB, this is probably the easiest way to go.


Published at

Originally posted at

Comments (13)

RavenHQ Outage: What happened and what WILL happen

“We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area,” Amazon reported at 8:30 pm Pacific time on Friday the 29th.

Among those servers were RavenHQ-web-1 server (responsible for showing the www.RavenHQ.com marketing site) and RavenHQ-DB-1 (responsible for all the databases located on 1.ravenhq.com).

The reason for that is apparently a big storm that hit the  US-East-1 region (Virginia data center). The data center lost power (and no generators were up, somehow) for about 30 minutes which caused extended outage for some of our customers. Small comforts, but we were in the same boat as Netflix, Heroku, Pinterest, and Instagram, among others.

Before we dig any deeper. No data was lost, and we resumed normal operations within a few hours. Users with replicated plans had no interruption of service.

During the outage, we were in the process of bringing up a new node with all of the databases from the impacted servers, but that would have entailed customers having to change connection strings, and the outage was resolved before we got to that point.

I want to show you how RavenHQ is architected from a physical stand point. What you can see isn’t actually the servers (those are fairly dynamic) but it is enough for you to get the picture.


In particular, both RavenHQ-web-1 and RavenHQ-db-1 were in US-East-1 region, and were impacted by this issue. The good news is that the rest of our servers were located in different availability regions and were not impacted by the issue.

In particular, that was a good stress test (which we could have done without, thank you very much, Amazon’s non working generators) for our HA scenario. None of the replicated plans customers experienced much of a problem, we had an auto failover to the secondary server (and depending on your plan, if that would have gone down, the failover would go to the tertiary server, etc). We actually have a customer that has a 4 ways master / master replicated plan, so he is super safe Smile.

Unfortunately, that means that any customer that wasn’t on a replicated plan and was located on a US-East-1 server felt the impact. Unfortunately, that meant a lot of the free plan customers, since those are predominantly located on that region.  As well as a number of actual paying customers.

We are sorry for that, and we all understand the need to balance between “what happen if” and “what does it cost”. As a result, we are going to offer all existing customers a 25% discount for all replicated plans for the next 6 months. Just contact us and ask for an upgrade to the replicated plan, and we will set it up for you.

One of the things that we are trying to do in RavenHQ is really commoditize the notion of a database that is just there, and you worry not about it. Going with the replicated plans is probably the best way to go about doing that, since your data is going to live on at least two physically remote servers, and we have auto failover ready to pop in and support your application if there are any issues. That is why we are offering the discount, as a way to make it even more affordable to go into High Availability mode.

Speaking of High Availability, I should probably talk about what happened to www.ravenhq.com, and the answer is fairly simple. The cobbler's children go barefoot. We spent a lot of time designing and building RavenHQ to be sustainable in the face of outages, but we focused all of our attention into the actual production instances, we didn’t really pay any mind to www.ravenhq.com. As far as we are concerned, this is a marketing site, and it was a low priority for HA story.

Unfortunately when we actually had an outage, people really freaked out because www.ravenhq.com was down, even though we had the actual database servers up and running, the website being down gave the impression that all of RavenHQ was down, which was decidedly not the case.

Lessons learned

  • We need to encourage customers to go to the replicated plans by default. They are more expensive,yes, but they are also safer.
  • We need a better process in the case of outages, to move databases from failed nodes to new ones, and inform customers about this change.
  • More parts of are actual core infrastructure require to be HA. In particular, we need:
    • To make sure that authentication works when core servers are missing (wasn’t a problem in this particular case, but our investigation revealed that it could be, so we need to solve that).
    • Ensure that www.ravenhq.com is fully replicated.
    • Create a /status page, where you can look at the status of the various servers and see how they are acting.

The last two are more for peace of mind than any real production need, but any cloud service runs on trust, and we think that adding those would ensure that if there are any problems in the future, we would be able to provide you with better service.


Published at

Originally posted at

Comments (2)

That ain’t no Open Source that I see here

Some things just piss me off. But before I get to what pissed me off this time, let me set the scene.

We usually request from candidates applying for a job in Hibernating Rhinos to submit some piece of code that they wrote. They get additional points if their code is an Open Source project.

Some people have some… issues with the concept. The replies I got were:

  • I don’t know if I can’t send you the code, I’ll have to ask my employer. (Which seems really silly thing to do, considering you want to get the code to show it to some other company that you want to hire you).
  • Here is the code, but don’t tell anyone. (Those usually get deleted immediately after I send them a scathing reply about things like IP and how important it is to respect that).
  • Here is my last course code. (Which is what actually triggered this post).

Here is the deal, if you aren’t coding for fun, you are not suitable for a developer position in Hibernating Rhinos. Just to give you some idea, currently we have the following pet projects that I am aware of:

  • Jewish Sacred Books repository – display / commentary
  • Jewish Sacred Books repository – search / organization (Note that the two are by two different people and are totally unrelated.)
  • Music game app for Android, iOS and WP7
  • Personal finance app
  • Auto update library for .NET
  • Various OSS projects

And probably other stuff that I am not aware of. (Just for the record, those are things that they are working on their own time, not company time. And not because I or anyone else told them).

Why is this relevant? Because I keep getting people who think submitting some random piece of code that they have from their latest university course is a good way to show their mad code skillz.

I mean, sure, that might do it, but consider carefully what sort of projects you are usually given as part of university courses. They are usually very small, focusing on just one aspect, and they are totally driven by whatever the crazy professor think is a valid coding standard. Usually, that is NOT a good candidate for sending a code to a job interview.

I am going to share just one line from a codebase that I recently got:

private void doSwap(ref Album io_Album1, ref Album io_Album2)

The code is in C#, in case you are wondering. And you can probably learn a lot about the state of the codebase from just this line of code. Among my chief complaints:

  • Violating the .NET framework naming guidelines (method name).
  • Violating the .NET framework naming guidelines (argument names).
  • Swapping parameters, seriously?! What, are you writing your own sort routine? And yeah, the answer is yes.

When I pinged the author of the code, he replies that this was because of the course requirements. They had a strict polish notation guidelines, and io_ is for a input & output parameter.

They had other guidelines (you may not use foreach, for example) that explained some strangeness in the codebase.

But that isn’t really the point. I can understand crazy coding standards, what I can’t understand is why someone would submit something that would raise so many red flags so quickly as part of a job application process.

This is wasting everyone’s time. And that is quite annoying.


Published at

Originally posted at

Comments (46)