Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,317 | Comments: 46,923

filter by tags archive

RavenDB – Working with the query statistics

time to read 2 min | 260 words

Querying in RavenDB is both similar and dissimilar to the sort of queries that you are familiar with from RDBMS. For example, in RavenDB, it is crazily cheap to find out count(*). And then there is the notion of potentially stale queries that we need to expose.

Following the same logic as my previous post, I want to have an easy way of exposing this to the user without making it harder to use. I think that I found a nice balance here:

RavenQueryStatistics stats;
var results =  s.Query<User>()
     .Statistics(out stats)
     .Where(x => x.Name == "ayende")

The stats expose some additional information about the query:


You can use those to decide how to act.

But I have to admit that, like in the MultiQuery case with NHibernate, the real need for this was to be able to do paged query + total count in a single remote request.

Distributed transactions with remote queuing system

time to read 3 min | 529 words

I got into an interesting discussion about that two days ago, so I thought that I would put down some of my thoughts about the topic.

Just to put some context into operation here, I am NOT talking about solving the generic problem of DTC on top of remote queuing system. I am talking specifically about sharing a transaction between a remote queuing system and a database. This is important if we want to enable “pull msg from queue, update db” scenarios. Without those, you are running into danger zones of processing a message twice.

We will assume that the remote queuing system operations on the Dequeue/Timeout/Ack model. In which you dequeue a message from the queue, and then you have a timeout to acknowledge its processing before it is put back into the queue. We will also assume a database that supports transactions.

Basically, what we need to do is to keep a record of all the messages we processed. We do that by storing that record in the database, where it is subject to the same transactional rules as the rest of our data. We would need a table similar to this:

CREATE TABLE Infrastructure.ProcessedMessages
   MsgId uniqueidentifier primary key,
   Timestamp DateTime not null

Given that, we can handle messages using the following code:

using(var tx = con.BeginTransaction())
    var msg = queue.Dequeue();

    // process the msg using this transaction context

Because we can rely on the database to ensure transactional consistency, we can be sure that:

  • we will only process a message once
  • an error in processing a message will result in removing the process record from the database

We will probably need a daily job or something similar to clear out the table, but that would be about it.


Queuing Systems

time to read 4 min | 756 words

It is not a surprise that I have a keen interest in queuing system and messaging. I have been pondering building another queuing system recently, and that led to some thinking about the nature of queuing systems.

In general, there are two major types of queuing systems. Remote Queuing Systems and Local Queuing Systems. While the way they operate is very similar, they are actually major differences in the way you would work with either system.

With a Local Queuing System, the queue is local to your machine, and you can make several assumptions about it:

  • The queue is always going to be available – in other words: queue.Send()  cannot fail.
  • Only the current machine can pull data from the queue – in other words: you can use transactions to manage queue interactions.
  • Only one consumer can process a message.
  • There is a lot of state held locally.

Examples of Local Queuing Systems include:

  • MSMQ
  • Rhino Queues

A Remote Queuing System uses a queue that isn’t stored on the same machine. That means that:

  • Both send and receive may fail.
  • Multiple machines may work against the same queue.
  • You can’t rely on transactions to manage queue interactions.
  • Under some conditions, multiple consumers can process the same message.
  • Very little state on the client side.

An example of Remote Queuing System is a Amazon SQS.

Let us take an example of simple message processing with each system. Using local queues, we can:

using(var tx = new TransactionScope())
   var msg = queue.Receive();

   // process msg

There are a lot actually going on here. The act of receiving a message in transaction means that no other consumer may receive it. If the transaction complete, the message will be removed from the queue. If the transaction rollbacks, the message will become eligible for consumers once again.

The problem is that this pattern of behavior doesn’t work when using remote queues. DTC are a really good way to kill both scalability and performance when talking to remote systems. Instead, Remote Queuing System apply the concept of a timeout.

var msg = queue.Receive( ackTimeout: TimeSpan.FromMniutes(1) );

// process msg


When the message is pulled from the queue, we specify the time that we promise to process the message by. The server is going to set aside this message for that duration, so no other consumer can receive it. If the ack for successfully processing the message arrives in the specified timeout, the message is deletes and everything just works. If the timeout expires, however, the message is now available for other consumers to process. The implication is that if for some reason processing a message exceed the specified timeout, it may be processed by several consumers. In fact, most Remote Queuing Systems implement a poison message handling so if X number of time consumers did not ack a message in the given time frame, the message is marked as poison and moved aside.

It is important to understand the differences between those two systems, because they impact the system design for systems using it. Rhino Service Bus, MassTransit and NServiceBus, for example, all assume that the queuing system that you use is a local one.

A good use case for a remote queuing system is when your clients are very simple (usually web clients) or you want to avoid deploying a queuing system.

The building blocks of a database: Transactional & persistent hash table

time to read 2 min | 373 words

I am working on managed storage solution for RavenDB in an off again on again fashion for a while now. The main problem Is that doing something like this is not particularly hard, but it is complex. You either have to go with a transaction log or an append only model.

There are more than enough material on the matter, so I won’t touch that. The problem is that building that is going to take time, and probably a lot of it I decided that it is better off to have something than nothing, and scaled back the requirements.

The storage has to have:

  • ACID
  • Multi threaded
  • Fully managed
  • Support both memory and files
  • Fast
  • Easy to work with

The last one is important. Go and take a look at the codebase of any of the available databases. They can be… pretty scary.

But something has to be give, so I decided that to make things easier, I am not going to implement indexing on the file system. Instead, I’ll store the data on the disk, and keep the actual index completely in memory. There is an overhead of roughly 16 bytes per key plus the key itself, let us round it to 64 bytes per held per key. Holding 10 million keys would cost ~600 MB. That sounds like a lot, because it is. But it actually not too bad. It isn’t that much memory for modern hardware. And assuming that our documents are 5 KB in size, we are talking about 50 GB for the database size, anyway.

Just to be clear, the actual data is going to be on the disk, it is the index that we keep in memory. And once we have that decision, the rest sort of follow on its own.

It is intentionally low level interface, and mostly it gives you is a Add/Read/Remove interface, but it gives you multiple tables, the ability to do key and range scans and full ACID compliance (including crash recovery).

And the fun part, it does so in 400 lines!


Smart post re-scheduling with Subtext

time to read 1 min | 114 words

As you know, I uses future posting quite heavily, which is awesome, as long as I keep to the schedule. Unfortunately, when you have posts two or three weeks in advance, it is actually quite common for you need to post things in a more immediate sense.

And that is just a pain. I just added smart re-scheduling to my fork of Subtext. Basically, it is very simple. If I post now, I want the post now. If I post it in the future, move everything one day ahead. If I post with no date, put it as the last item on the queue.

This is the test for this feature.

Why I hate implementing Linq

time to read 6 min | 1200 words

Linq has two sides. The pretty side is the one that most users see, where they can just write queries and C# and a magic fairy comes and make it work on everything.

The other side is the one that is shown only to the few brave souls who dare contemplate the task of actually writing a Linq provider. The real problem is that the sort of data structure that a Linq query generates has very little to the actual code that was written. That means that there are multiple steps that needs to be taken in order to actually do something useful in a real world Linq provider.

A case in point, let us take NHibernate’s Linq implementation. NHibernate gains a lot by being able to use the re-linq project, which takes care of a lot of the details of linq parsing. But even so, it is still too damn complex. Let us take a simple case as an example. How the Cacheable operation is implemented.

  from order in session.Query<Order>()
  where order.User == currentUser
  select order

Cacheable is used to tell NHibernate that it should use the 2nd level cache for this query. Implementing this is done by:

  1. Defining an extension method called Cachable:
  2. public static IQueryable<T> Cacheable<T>(this IQueryable<T> query)
  3. Registering a node to be inserted to the parsed query instead of that specific method:
  4. MethodCallRegistry.Register(new[] { typeof(LinqExtensionMethods).GetMethod("Cacheable"),}, typeof(CacheableExpressionNode));
  5. Implement the CacheableExpressionNode, which is what will go into the parse tree instead of the Cacheable call:
  6. public class CacheableExpressionNode : ResultOperatorExpressionNodeBase
  7. Actually, the last thing was a lie, because the action really happens in the CacheableResultOperator, which is generated by the node:
  8. protected override ResultOperatorBase CreateResultOperator(ClauseGenerationContext clauseGenerationContext)
        return new CacheableResultOperator(_parseInfo, _data);
  9. Now we have to process that, we do that by registering operator processor:
  10. ResultOperatorMap.Add<CacheableResultOperator, ProcessCacheable>();
  11. That processor is another class, in which we finally get to the actual work that we wanted to do:
  12. public void Process(CacheableResultOperator resultOperator, QueryModelVisitor queryModelVisitor, IntermediateHqlTree tree)
        NamedParameter parameterName;
        switch (resultOperator.ParseInfo.ParsedExpression.Method.Name)
            case "Cacheable":
                tree.AddAdditionalCriteria((q, p) => q.SetCacheable(true));


Actually, I lied. What is really going on is that this is just the point where we are actually registering our intent. The actual code will be executed at a much later point in time.

To foretell the people who knows that this is an overly complicated mess that could be written in a much simpler fashion…

No, it can’t be.

Sure, it is very easy to write a trivial linq provider. Assuming that all you do is a from / where / select and another else. But drop into the mix multiple from clauses, group bys, joins, into clauses, lets and… well, I could probably go on for a while there. The point is that industrial strength Linq providers (i.e. non toy ones) are incredibly complicated to write. And that is a pity, because it shouldn’t be that hard!

Exception handling for flow control is EVIL!!!

time to read 4 min | 792 words

I just spent over half a day trying to fix a problem in NHibernate. To be short one of the updates that I made caused a backward compatibility error, and I really wanted to fix it. The actual error condition happen only when you use triple nested detached criteria with reference to the root from the inner most child. To make things fun, that change was close to 9 months ago, and over a 1,000 revisions.

I finally found the revision that introduced the error, and started looking at the changes. It was a big change, and a while ago, so it took some time. Just to give you some idea, here is where the failure happened:

public string GetEntityName(ICriteria criteria)
  ICriteriaInfoProvider result;
  if(criteriaInfoMap.TryGetValue(criteria, out result)==false)
    throw new ArgumentException("Could not find a matching criteria info provider to: " + criteria);
  return result.Name;

For some reason, the old version could find it, and the new version couldn’t. I traced how we got the values to criteriaInfoMap every each way, and I couldn’t see where the behavior was different between the two revisions.

Finally, I resorted to a line by line revert of the revision, trying to see when the test will break. Oh, I forgot to mention, here is the old version of this function:

public string GetEntityName(ICriteria criteria)
  string result;
  criteriaEntityNames.TryGetValue(criteria, out result);
  return result;

The calling method (several layers up) looks like this:

private string[] GetColumns(string propertyName, ICriteria subcriteria)
  string entName = GetEntityName(subcriteria, propertyName);
  if (entName == null)
    throw new QueryException("Could not find property " + propertyName);
  return GetPropertyMapping(entName).ToColumns(GetSQLAlias(subcriteria, propertyName), GetPropertyName(propertyName));

But even when I did a line by line change, it still kept failing. Eventually I got fed up and change the GetEntityName to return null if it doesn’t find something, instead of throw.

The test passed!

But I knew that returning null wasn’t valid, so what the hell(!) was going on?

Here is the method that calls the calling method;

public string[] GetColumnsUsingProjection(ICriteria subcriteria, string propertyName)
  // NH Different behavior: we don't use the projection alias for NH-1023
    return GetColumns(subcriteria, propertyName);
  catch (HibernateException)
    //not found in inner query , try the outer query
    if (outerQueryTranslator != null)
      return outerQueryTranslator.GetColumnsUsingProjection(subcriteria, propertyName);

I actually tried to track down who exactly wrote this mantrap, (this nasty came from the Hibernate codebase). But I got lost in migrations, reformatting, etc.

All in all, given how I feel right now, probably a good thing.

Synchronization primitives, MulticastAutoResetEvent

time to read 4 min | 638 words

I have a very interesting problem within RavenDB. I have a set of worker processes that all work on top of the same storage. Whenever a change happen in the storage, they wake up and start working on it. The problem is that this change may be happening while the worker process is busy doing something other than waiting for work, which means that using Monitor.PulseAll, which is what I was using, isn’t going to work.

AutoResetEvent is what you are supposed to use in order to avoid losing updates on the lock, but in my scenario, I don’t have a single worker, but a set of workers. And I really wanted to be able to use PulseAll to release all of them at once. I started looking at holding arrays of AutoResetEvents, keeping tracking of all changes in memory, etc. But none of it really made sense to me.

After thinking about it for  a while, I realized that we are actually looking at a problem of state. And we can solve that by having the client hold the state. This led me to write something like this:

public class MultiCastAutoResetEvent 
    private readonly object waitForWork = new object();
    private int workCounter = 0;
    public void NotifyAboutWork()
        Interlocked.Increment(ref workCounter);
        lock (waitForWork)
            Interlocked.Increment(ref workCounter);
    public void WaitForWork(TimeSpan timeout, ref int workerWorkCounter)
        var currentWorkCounter = Thread.VolatileRead(ref workCounter);
        if (currentWorkCounter != workerWorkCounter)
            workerWorkCounter = currentWorkCounter;
        lock (waitForWork)
            currentWorkCounter = Thread.VolatileRead(ref workCounter);
            if (currentWorkCounter != workerWorkCounter)
                workerWorkCounter = currentWorkCounter;
            Monitor.Wait(waitForWork, timeout);

By forcing the client to pass us the most recently visible state, we can efficiently tell whatever they still have work to do or do they have to wait.

Why RavenDB is OSS

time to read 2 min | 279 words

There are actually several reasons for that, not the least of which is that I like working on OSS projects. But there is also another reason for that, which is very important for adoption:


Joel Spolsky have an interesting article about just this topic.

When I sit down to architect a system, I have to decide which tools to use. And a good architect only uses tools that can either be trusted, or that can be fixed. "Trusted" doesn't mean that they were made by some big company that you're supposed to trust like IBM, it means that you know in your heart that it's going to work right. I think today most Windows programmers trust Visual C++, for example. They may not trust MFC, but MFC comes with source, and so even though it can't be trusted, it can be fixed when you discover how truly atrocious the async socket library is. So it's OK to bet your career on MFC, too.

You can bet your career on the Oracle DBMS, because it just works and everybody knows it. And you can bet your career on Berkeley DB, because if it screws up, you go into the source code and fix it. But you probably don't want to bet your career on a non-open-source, not-well-known tool. You can use that for experiments, but it's not a bet-your-career kind of tool.

I have used the same logic myself in the past, and I think it is compelling.

What is the cost of storage, again?

time to read 3 min | 447 words

So, I got a lot of exposure about my recent post about the actual costs of saving a few bytes from the fields names in schema-less databases. David has been kind enough to post some real numbers about costs, which I am going to use for this post.

The most important part is here:

  • We have now moved to a private cloud with Terremark and use Fibre SANs. Pricing for these is around $1000 per TB per month.
  • We are not using a single server – we have 4 servers per shard so the data is stored 4 times. See why here. Each shard has 500GB in total data so that’s 2TB = $2000 per month.

So, that gives a price point of 4 US Dollars per gigabyte.

Note that this is a pre month cost, which means that it is going to cost a whopping of 48$ per year. Now, that is much higher cost than the 5 cents that I gave earlier, but let us see what this gives us.

We will assume that the saving is actually higher than 1 GB, let us call it 10 GB across all fields across all documents. Which seems a reasonable number.

That thing now costs 480 $per year.

Now let us put this in perspective, okay?

At 75,000$ a year (which is decidedly on the low end, I might add), that comes to less than 2 days of developer time.

It is also less than the following consumer items

  • The cheapest iPad - 499$
  • The price of the iPhone when it came out - 599$

But let us talk about cloud stuff, okay?

  • A single small Linux instance on EC2 – 746$ per year.

In other words, your entire saving isn’t even the cost of adding the a single additional node to your cloud solution.

And to the nitpickers, please note that we are talking about data is is already replicated 4 times, so it already includes such things as backups. And back to the original problem. You are going to lose more than 2 days of developer time on this usage scenario when you have variable names like tA.

A much better solution would have been to simply put the database on a compressed directory, which would slow down some IO, but isn’t really important to MongoDB, since it does most operations in RAM anyway, or just implement document compression per document, like you can do with RavenDB.


  1. RavenDB Conference videos: Implementing CQRS and Event Sourcing with RavenDB - 10 hours from now
  2. How did the milk get to the fridge? - about one day from now
  3. RavenDB Conference videos: Building Codealike: a journey into the developers analytics world - 4 days from now
  4. Low level Voron optimizations: Transaction lock handoff - 5 days from now
  5. RavenDB Conference Videos: Delving into Documents with Data Subscriptions - 6 days from now

And 7 more posts are pending...

There are posts all the way to Mar 10, 2017


  1. RavenDB Conference videos (12):
    21 Feb 2017 - Zapping ever faster
  2. Low level Voron optimizations (5):
    20 Feb 2017 - Recyclers do it over and over again.
  3. Implementing low level trie (4):
    26 Jan 2017 - Digging into the C++ impl
  4. Answer (9):
    20 Jan 2017 - What does this code do?
  5. Challenge (48):
    19 Jan 2017 - What does this code do?
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats