With malice aforethought, we can try even better
Continuing on with the same theme from our last post. How can we improve the speed in which we write to disk. In particular, I am currently focused on my worst case scenario:
fill rnd buff 10,000 tx : 161,812 ms 6,180 ops / sec
This is 10,000 transactions all running one after another, and taking really way too long to go about doing their thing. Now, we did some improvements and we got it all the way to 6,340 ops / sec, but I think you’ll agree that even this optimization is probably still bad. We spent more time there, trying to figure out exactly how we can do micro optimizations, and we got all the way up to 8,078 ops /sec.
That is the point where I decided that I would really like to look at the raw numbers that I can get from this system. So I wrote the following code:
This code mimic the absolute best scenario we could hope for. Zero cost for managing the data, pure sequential writes. Note that we call to Flush(true) to simulate 10,000 transactions. This code gives me: 147,201 ops / sec.
This is interesting, mostly because I thought the reason our random writes with 10K transactions are bad was the calls to Flush(), but it appears that this is actually working very well. I then tested this with some random writes, by adding the following lines before line 13:
I then decided to try it with memory mapped files, and I wrote:
You’ll note that I am not doing any flushing here. That is intention for now, using this, I am getting 5+ million ops per second. But since I am not doing flushing, this is pretty much me testing how fast I can write to memory.
Adding a single flush cost us 1.8 seconds for a 768MB file. And what about doing the right thing? Adding the following in line 26 means that we are actually flushing the buffers.
Note that we are not flushing to disk, we still need to do that. But for now, let us try doing it. This single line changed the code from 5+ million ops to doing 170,988 ops / sec. And that does NOT include actual flushing to disk. When we do that, too, we get a truly ridiculous number: 20,547 ops / sec. And that explains quite a lot, I think.
For reference, here is the full code:
This is about as efficient a way you can get for writing to the disk using memory mapped files if you need to do that using memory mapped files in a transactional manner. And that is the absolute best case scenario, pretty much. Where we know exactly what we wrote and were we wrote it. And we always write a single entry, of a fixed size, etc. In Voron’s case, we might write to multiple pages at the same transaction (in fact, we are pretty much guaranteed to do just that).
This means that I need to think about other ways of doing that.