Voron InternalsThe diff is the way
I talked about the goals of using diffs for the journals in a previous post, but in this one, I want to talk about what it actually did. To start with, it turn out that using diffs in the journal invalidates quite a few optimizations that we had. We had a whole bunch of stuff that we could do to avoid writing data to the data file if there are additional modifications to it. When using diffs, we can’t do that, because the next diff is building on the previous version. That actually ended up improving the code quality, because a bunch of pretty tricky code had to go away.
It was tricky code because we tried to reduce the data file I/O, but only in the cases that it was allowed, which wasn’t always. Now we always write the latest version of all the modified pages, which might add some additional I/O in some cases, but in practice, this didn’t show up in our benchmarks. And the reduction of complexity might have been worth it even if it was.
We actually have two diff implementation, one that tests two different versions of a page and find the differences, and one that test a page against zeros. That, in addition to collapsing runs of zeros, is the entire implementation. We are actually not as small as we could get, because we insert some metadata into the diff stream to allow easier debugging and error detection. But the nice thing about the diff output is that we are still compressing it, so working extra hard to reduce the uncompressed data isn’t that important if compression will cover it anyway.
We tested before & after of using diffs for journal writes, and we found the following:
|Journal size (inserts)||37% reduction in size|
|Journal size (updates)||80% reduction in size|
|Bulk insert speed||5% – 10% speed improvement *|
|Writes / sec (inserts)||12% improvement|
|Writes / sec (updates)||18% improvement|
* If the sizes went down so significantly, why haven’t the insert & update speed improve by a comparable amount?
The answer to that is that for the most part, reducing the amount of data we are writing is awesome, a major limiting factor is the number of times we write to the journal, rather than the number of bytes. And we have to write once per transaction, so that is a limiting factor.
However, the current numbers we are looking at is a benchmark showing roughly 30,000 write requests / second and are unable to get anything higher, because we have saturated the local network. We are currently setting up a dedicated testing environment to see how far we can push this.