Fruits of the poisonous tree: Voron performance refactoring

time to read 6 min | 1006 words

Earlier this month I posted our results for the initial performance of Voron. We learned a lot from that, and it led us to do some major refactoring to the codebase. Now, we are back on a more or less stable ground, so it is time to actually see what is going on. Where before we had a baseline against other storage engines, now we are testing against the previous version as well as Esent.

To get a good and fixed numbers, I decided to run the tests on a EC2 machine, in particular, an m1.large instance. The idea is that this gives us a simple way to get a standard machine config that anyone can use to test out as well.

For the purpose of this benchmark, we are doing writes to one of the temporary storage disks, rather than a permanent EBS drive. This will presumably be somewhat faster than real world configuration, but I think that running in an environment where we are actually using a common production machine (rather than a dedicated development machine) should give us better numbers, certainly more realistic ones.

The first thing to check, of course, is the simplest. Sequential writes:

image 

Here we can see that we are actually worse off than we were before. That is actually kind of sad. But we’ll get to that later on.

For random reads, we are still very good, and it is obvious that we are pretty early make use of all of the resources that the machine gives us. Here we are consistently slightly faster than the older version, but that is something that is probably accidental. We certainly did no work on improving read performance (there was very little need to do so, after all).

image

Random writes was  a big surprise:

image

As it happens, it appears that we are faster, significantly so, than Esent in this scenario. Which was quite a surprise. For some reason, we are also faster in the new code than the old version. Which is quite interesting.

Finally, we have random reads.

image

No, it isn’t an error. We see a rate of ~1,600 reads/sec for Esent single threaded. I am not quite sure why this is behaving in this fashion, but I am willing to say that there is something wrong and just discount this result.

I mentioned before that it appears that we are actually slower than the old code. So I decided to check what is actually going on there. My tool of choice, the profiler.  Here is the old version:

image

You can see that we have done two runs (one sequential, one random) of 10,000 transactions, writing a total of 2 million items. And roughly 26% of the time is spent committing the transaction.

Digging into that, a lot of that time is spent just flushing to disk:

image

Note, however, that a decidedly non trivial amount of time is actually spent on ImmutableDictionary, but I’ll touch on that later on.

With our changes, the new code looks like:

image

The one thing that jumps at me is that the cost of creating a transaction is now very significant. And here is why:

image

The actual reason for this is likely that we also have the background journal flushing, that now also needs to take the transaction lock. Which is introducing contention into the mix. We’ll need to look into how to resolve that.

Let us dig into the commit. The change there was pretty much the whole reason for what we were doing.

image

You can see that we are using WriteFileGather, and unlike before, where syncing took about 15% of the overall time. Now it is taking just 7%. So we are better than 50% improvement on that score.

But the really interesting bit? Look at what was by far the most expensive operation there. It was the calls to immutable dictionary. In fact, let us look at the results in the profiler for the immutable collections:

image

image

So now we have two new things that we need to investigate. One is reducing contentions, and the second is checking how we can optimize our usage of the immutable collections.

I’ll report on our finding…