When perf optimization breaks tests in a GOOD way
You might have noticed a theme going on in RavenDB. We care a lot about performance. The problem with optimizing performance is that sometimes you have a great idea, you implement it, the performance gains are there to be had - and then a test fails… and you realize that your great idea now needs to be 10 times more complex to handle a niche edge case.
We did a lot of work around optimizing the performance of RavenDB at the lowest levels for the next major release (8.0), and we got a persistently failing test that we started to look at.
Here is the failing message:
Restore with MaxReadOpsPerSecond = 1 should take more than '11' seconds, but it took '00:00:09.9628728'
The test in question is ShouldRespect_Option_MaxReadOpsPerSec_OnRestore
, part of the MaxReadOpsPerSecOptionTests
suite of tests. What it tests is that we can limit how fast RavenDB can restore a database.
The reason you want to do that is to avoid consuming too many system resources when performing a big operation. For example, I may want to restore a big database, but I don’t want to consume all the IOPS on the server, because there are additional databases running on it.
At any rate, we started to get test failures on this test. And a deeper investigation revealed something quite amusing. We made the entire system more efficient. In particular, we managed to reduce the size of the buffers used significantly, so we can push more data faster. It turns out that this is enough to break the test.
The fix was to reduce the actual time that we budget as the minimum viable time. And I have to say that this is one of those pull requests that lights a warm fire in my heart.
Comments
Comment preview
Join the conversation...