When we started to build Voron, we based some of its behavior around how LevelDB works. While the storage details are heavily influenced by LMDB, we did take a few things from LevelDB. In particular, the idea of transaction merging.
You can read about our implementation in our design notes for Voron from 2013. The idea is that even though you have a single writer, you can prepare the transaction separately, then submit the transaction to be processed by a dedicated thread. This thread will merge all pending transaction requests into a single physical transaction and allow us to parallelize some of the work, amortizing the cost of going to disk across multiple concurrent transactions.
This is how Voron is running now, and it was a feature that I, personally, was very excited about. And in RavenDB 4.0, we killed this feature.
If this is such a great and exciting feature, why kill it and go to a single writer only mode?
There are actually several distinct reasons, each of them serving as a big black mark against this feature.
The first strike against this feature is that is result in much higher memory usage and copying of data. Whenever we need to create a transaction, we have to write all the data into a temporary buffer, which is then sent to the transaction merger. This result in memory hanging around longer, higher allocations, and double copying of the data.
The second strike against this feature is that it result in unpredictable behavior. Because transactions are merged on a first come/first served basis, small differences in the execution of transactions can dramatically change the order of operations that is actually committed. Usually it doesn’t matter, but if we need to track down on a particular issue, that is a really important. Having a single writer means that we have very predictable behavior.
The third strike against this feature is that it leads to concurrency aware code. Because you are going to submit a transaction to be processed, there is potentially other transactions that can change the data that you rely on. We have ways to handle that, but requesting optimistic concurrency checks to be done, but this end up being quite complex to manage properly.
The forth strike against this feature is that the major reason it was needed was that we wanted to be able to parallelize the work of indexing and documents, and that was meant to handle just that. But the re-shaping of the indexes storage and documents storage means that we have separate Voron storages for the documents and for each index, so we still have this ability, but were able to remove this code and reduce our complexity significantly.
More posts in "The design of RavenDB 4.0" series:
- (26 May 2016) The client side
- (24 May 2016) Replication from server side
- (20 May 2016) Getting RavenDB running on Linux
- (18 May 2016) The cost of Load Document in indexing
- (16 May 2016) You can’t see the map/reduce from all the trees
- (12 May 2016) Separation of indexes and documents
- (10 May 2016) Voron has a one track mind
- (05 May 2016) Physically segregating collections
- (03 May 2016) Making Lucene reliable
- (28 Apr 2016) The implications of the blittable format
- (26 Apr 2016) Voron takes flight
- (22 Apr 2016) Over the wire protocol
- (20 Apr 2016) We already got stuff out there
- (18 Apr 2016) The general idea