RavenDB’s transaction merging

time to read 5 min | 860 words

I like reading interesting code bases. And in this case, I have been reading through the leveldb codebase for a while. Most of the time, I am concentrating on grokking how other people are doing things. And the knowledge that I glean from those is only relevant a while later.

In this case, I was able to apply leveldb’s knowledge in very short order. We got a complaint about the speed of RavenDB under high transactional writes. The customer complained that they were seeing about 100 – 200 writes/sec with multi threaded inserts, with a separate session per document. This simulate pretty well the behavior of a web application that does a write per request.

The problem was that we basically had many write requests all competing on the disk. Since all transactions need to call fsync, that meant that we were limited by the number of fsync that we could call on the physical medium (more or less, it is a bit more complex than that, but that works).

There really isn’t much that I can do about it when the transactions are sequential. But there might be when we have parallel transactions. Instead of make them wait for one another, I took a leaf from the leveldb codebase and decided to merge them. I re-wrote the code so we would use the following approach:

pendingBatches.Enqueue(pending);

while (currentTask.Completed() == false && pendingBatches.Peek() != pending)
{
    batchCompletedEvent.Wait();
}
if (currentTask.Completed())
    return pending.Result;

lock (putSerialLock)
{
    try
    {
        batchCompletedEvent.Reset();

        var sp = Stopwatch.StartNew();
        TransactionalStorage.Batch(actions =>
        {
            int totalCommands = 0;
            PendingBatch currentBatch;
            while (totalCommands < 1024 &&  // let us no overload the transaction buffer size too much
                    pendingBatches.TryDequeue(out currentBatch))
            {
                batches++;
                if (ProcessBatch(currentBatch) == false)
                    // we only go on committing transactions as long as we are successful, if there is an error, 
                    // we abort and do the rest in another cycle
                    break;
                actions.General.PulseTransaction();
                totalCommands += currentBatch.Commands.Count;
            }
        });
    }
    finally
    {
        batchCompletedEvent.Set();
    }
}

As you can see, this code is very similar to the leveldb one. We queue the transaction to be executed and then we check if we are the first in the queue. If so, we will execute that transaction and continue executing all available transactions.

The key here is that because we merge those transactions, we can benefit from only calling fsync once, at the end of the global operation.

This code is nice because it allows us to take advantage on load on our system. The more, the more efficient we can batch things. But at the same time, if there isn’t any load, we don’t care.

Note that I limited the amount of work that can be done in merged transactions, because we don’t want the first transaction, the one that is actually doing the work for all of the others, to wait for too long. This is a really pretty way of doing this, especially since it doesn’t even require a background thread, which is how I usually solved this issue.

Oh, and the results?

On my machine, without this change, we get about 400 writes / second. Afterward, with 25 threads, we get over 1,100 writes / second.