How do they DO this?

time to read 5 min | 807 words

As mentioned, we are doing some more performance work in Voron. And we got some really surprising results there. Voron is writing at really good rate, (better than anything else we tested against), just not a good enough rate.

To be fair, if we haven’t seen the Esent benchmark with close to 750k writes / second, we might have been happy, but obviously it is possible to be much faster than we are now. So I decided to figure it out.

To start with, I run Voron through a profiler, and verified that the actual cost there was purely in calling FlushFileBuffers (the Windows version to fsync). In fact, in our tests, about 75% of the time was spent just calling this function. The test in questions does 1 million inserts, using 10,000 transactions of 100 items each. But Esent can basically do so many it doesn’t even count. So how do they do that?

I’m going to dedicate this post to discussing the process for finding it out, then spend the next one or two discussing the implications. At this level, we can’t really use something like a profiler to figure out what is wrong, we need a more dedicated tool. And in this case, we are talking about Process Monitor. It gives you the ability to see what system calls are being made on your system.

Here is what it looks like when we are committing a transaction with Voron:

image

And here is what it looks like when we are committing a transaction with Esent:

image

I was curious to test SQL Server too, and here is what it looks like when SQL Server is committing a transaction:

image

And if I’m already doing this, here is SQL CE transaction commit:

image

No, this isn’t a mistake. It didn’t do anything. By default, SQL CE only flushes to memory. You have to force it to flush to disk my using tx.Commit(CommitMode.Immediate); If you do that, the transaction commits looks like this:

image

Not a mistake, you still get nothing. It appears that even with Immediate, it is only writing to disk when it feels like it. At a guess, it is using memory mapped files and doing FlushViewOfFile, instead of calling FlushFileBuffers, but I am not really sure.

Since I run the benchmarks without immediate, I decided to try running the SQL CE stuff there again. Here are the numbers:

image

This brings to mind an interesting questions, what the hell is it doing that takes so long if it doesn’t even flush to disk?

Anyway, let us look at the SQLite version:

image

And… I don’t really know how to comment on that, to tell you the truth. I can’t figure out what it is doing, and I probably don’t really want to.

Now, let us look at LMDB:

image

I am not really sure how to explain the amount of work done here. I think that work because it uses manual file I/O. When I use the WriteMap option, I get:

image

Which is more reasonable and expected.

I would have shown leveldb as well, but I can’t run it on Windows.

I think that this is enough for now. I’ll discuss the implications of the difference in behavior in my next post. In the meantime, I would love to know what you think about this.