Even though RavenDB 7.0 isn’t quite out of the gate yet (expect the release very soon), I want to start talking about the RavenDB 7.1 release. This release is going to represent the biggest change in RavenDB in about 7 years, and at the same time, it is expected to be a complete non-event for most people.
We care enough about performance that we have both a dedicated team to verify it as well as really talented people whose sole job is to run benchmarks and optimize RavenDB. This year, the goal was to test RavenDB’s performance when running on large machines (many hundreds of GBs or RAM, dozens of cores, etc). The idea was to find bottlenecks and remove them, and we quickly found a few.
Performance optimization is often a very repetitive process. Find a problem, figure out the root cause, and move on. It is a process full of small changes and 0.1% improvements. The key is that if you practice this routinely, you end up with real measurable results over time, since your changes compound.
Most of the time, these are the changes you celebrate:
Here you can see a 6.6% increase in the number of operations per millisecond. That is a pretty big change, especially since it affects reading documents from storage.
We were roughly two months into this process when we ran into the end of the line. The scenario in question was 95% reads / 5% writes, with the idea of seeing how quickly the system responds under load, but without stressing it to the limits. Take a look at our findings:
What you can see here is that creating a transaction is costly, andover 50% of that cost is due to contention over a lock. We didn’t notice that until we had a lot of cores to go around, since there wasn’t a sufficient number of threads actively contending for this lock. Once we started looking for it, the problem was very apparent and visible on smaller machines as well.
RavenDB creates a single transaction per request, so that means we are spending a lot of time doing essentially nothing useful. What is worse, this is actively spinning. So the more cores we have, the more CPU we’ll use and the less useful work we’ll do. Yuck!
When we looked at the root cause, the problem became pretty apparent. Take a look at the following snippet:
_txCreation.EnterReadLock(); // creating read transaction
try
{
_cancellationTokenSource.Token.ThrowIfCancellationRequested();
tx = new LowLevelTransaction(previous.LowLevelTransaction, transactionPersistentContext, context);
ActiveTransactions.Add(tx);
}
finally
{
_txCreation.ExitReadLock();
}
using (PreventNewTransactions()) // during commit of write transaction
{
if (tx.Committed && tx.FlushedToJournal)
Interlocked.Exchange(ref _transactionsCounter, tx.Id);
State = tx.State;
Journal.Applicator.OnTransactionCommitted(tx);
ScratchBufferPool.UpdateCacheForPagerStatesOfAllScratches();
Journal.UpdateCacheForJournalSnapshots();
tx.OnAfterCommitWhenNewTransactionsPrevented();
}
What happens is that during the commit of a write transaction we need to make updates to a bunch of places. We also need all new read transactions to see all our changes at once. So during commit, we take a short lock to update a couple of locations. And when we create a write transaction, we hold a read lock to be able to properly read all those values safely.
This particular code (and the approach in general) dates back to the very early days of Voron:
On one hand, I look at this code and cringe internally. On the other hand, it has served us really well for a long time. Of course, on the gripping hand, that code has been around for long enough to have grown some barnacles.
Notice the OnAfterCommitWhenNewTransactionsPrevented() call we have there? It is meant to allow other operations that need to run while no read transactions are allowed, in order to maintain in-memory consistency.
Fixing this issue was both straightforward and quite complex at the same time. Instead of storing all the data in various places, I introduced the following record:
record EnvironmentStateRecord(
Pager.State DataPagerState,
long TransactionId,
ImmutableDictionary<long, PageFromScratchBuffer> ScratchPagesTable,
TreeRootHeader Root,
long NextPageNumber,
(long Number, long Last4KWritePosition) Journal,
object ClientState);
The actual details of what it does aren’t really that interesting. What is important is that we have an immutable record that is being generated by the write transaction. That record contains all the database state that we carry over between transactions. When we want to publish this state, we can just make a single atomic write to a pointer, no need for a lock.
Of course, it can’t be that simple. We needed to essentially convert large portions of the code to take their state from the environment record and provide a way for higher-level code to add their own state to the record on write.
That took about two months of hard work, I think. Not complicated work, just grinding through all of those places, figuring out the best way to represent that, and making all the necessary changes.
Unfortunately, that wasn’t the only lock convoy we found. Files inside RavenDB are held by a class called Pager (since it serves pages of data). A file (Pager) may hold multiple States, which is a fancy way of referring to a memory-mapped range we use. The Pager may hold multiple such mapping concurrently. For example, if we increase the file size, we need to map the memory again - so for a certain period of time, we have two such mappings.
A transaction may create a reference to such a state (and thus may access the mapping) for its lifetime. At the end of the transaction, we can release the state, and if it is the last reference, we’ll close the mapping.
This is implemented using the following code:
This runs for multiple Pagers for each transaction, mind you. The actual AddRef() call is very cheap, but the lock…
public void AddRef() => Interlocked.Increment(ref _refs);
As you can see from the image, this is also pretty old code. The problem was how to deal with the issue. I have to be able to keep track of the mapped memory if a transaction is using it, but at the same time, I want to dispose of it quickly if it isn’t being used.
The answer to this question was to lean on the GC. Instead of implementing AddRef() / Release(), I hold the Pager.State object and handle disposal in the finalizer. If there is a reference to it, the finalizer isn’t called, so this gives me the exact behavior I want, for free.
However… I also need to be able to dispose of this explicitly when closing the pager. A common scenario is that I’m closing the Pager and deleting the directory. If I have an outstanding reference to the file, that will fail.
Here is how I’m dealing with the issue. The Pager holds a Weak-Reference to the State and may explicitly dispose of it:
public void Dispose()
{
foreach (WeakReference<State> state in _states)
{
if (state.TryGetTarget(out var v))
{
v.Dispose();
}
}
}
That also allowed me to refactor the way we deal with files in general inside Voron. The first version of Voron did everything in managed code (including significant P/Invoke and cross-platform work). At some point, we introduced a native DLL (PAL - Platform Abstraction Layer - written in C) that allowed us to have a more consistent API.
The issue was that there was no clean division of labor between the native C code and the managed code, and some responsibilities were split between them in a way that wasn’t very clean.
RavenDB is using Zig
To be rather more exact, I’m using Zig as a build system to make cross-compilation a lot easier. I initially moved everything to use Zig, but it turns out that Zig isn’t able to produce DLLs that would run properly on older versions of Windows.
At least, nothing that I did convinced Zig to give me something that wouldn’t fail to load. So we are still using MSVC to compile for Windows and then Zig to both Linux & Mac (x64 & ARM). If you have good ideas on how to solve that, I would really appreciate it.
As a final thought, this work took several months, and when I got to the end, we got a pull request with some interesting numbers (+11.3K LOC, -11.2K LOC).
I have to admit that I feel a bit bad about that for the reviewers 🙂. I was really hoping that I would get a net negative result in the number of lines changed, but that couldn’t be helped. The end result is that all of our state is now far simpler, but we didn’t actually change anything in the way we behave. We removed two frequently hit lock convoys and allowed RavenDB to be far more efficient.
How efficient? Well… that is a bit complex to answer, because we didn’t stop here. There are more things that happened in RavenDB 7.1 that I need to talk to you about. We’ll cover them in my next post.