Note: I started writing a few minutes after starting the performance test, it took that long to run.
As you are probably already aware, we used the Immutable Collections package in Voron. That turned out to be a cause for pretty major performance issues, so I started investigating other approaches for dealing with the problem of both immutability and high performing code.
Before I go on, I would like to explain what we are using the immutable collections for. Basically, we have a page translation table, which tells us which pages belong where. This table changes on every write transaction.
However, locking isn’t a good option for us, because once started, the transaction view of the world can never be changed. That means that while we are modifying it in a write transaction, the read transaction should not ever see the modifications. This is exactly the case where immutable collections make sense. Read only views aren’t going to work, since once the write tx modify the page table, that change is going to be visible to read transactions, which isn’t what we want.
Our test scenario is that after each write tx commit, we will update the page table with all the new pages locations. That means that we wrote the following test case:
Every iteration is a “transaction”, and the work inside the for loop simulate us updating the page locations. Note, however, that the code above uses a mutable dictionary, and the changes are neither thread safe, nor are we isolating other readers from seeing our changes.
We can do this using immutable dictionary using the following code:
Note that we are using the optimal “SetItems” method, so we only modify the immutable dictionary once per transaction. That means that we create roughly 1 instance per transaction.
I then decided to see what would be the cost of running this with 50,000 transactions. Note that this means that in the end we have 149,697 items in the dictionary.
- Cost for running this in mutable dictionary: 1 minute and 12 seconds.
- Cost for running this in immutable dictionary: About 30 minutes.
Cost for reading 10,000 items from the dictionary:
- 0.0002 seconds for mutable dictionary
- 0.0061 seconds for immutable dictionary
Remember, the reason we go to this place is because we actually got performance issues specifically because of the SetItems calls. But note that we also have a performance problem for reads as well.
Initially, I tried to do the simple thing of just creating a new instance for every transaction. That worked quite nicely to remove the read costs, but it had a non trivial cost for adds, quite aside from the GC cost. The test is:
Cost for adding using new instance every time:
- 7 minutes and 37 seconds.
So right off the bat, we are over 4 times faster than immutable dictionary. But this is still expensive. But at least is has the same read speed as a standard dictionary, because it is.
In order to deal with that, I created something that I call LinkedDictionary. A linked dictionary is based on the following idea: Since dictionaries can only contain a single value per item, it is perfectly safe to layer them on one another. This means that we can just have a linked list of dictionaries, and when we want to read from it, we just need to walk through the list in reverse addition order. That means that it is pretty much perfect for append only linked list model.
The downside of that is that we now have to traverse the linked list in order to find our value. The good thing about this is that we are able to limit that, by introducing a merge step along the way. The merge step would merge all the pending parts into a single dictionary. Currently we are doing a merge every 32 writes or so. This strikes a nice balance between read and write speed.
The end result is that for 50,000 iterations, we have the following values:
|Mutable Dictionary||Immutable Dictionary||Cloned Dictionary||Linked Dictionary|
1 minutes and 12 seconds
29 minutes and 28 seconds
7 minutes and 37 seconds
4 minutes and 37 seconds
In other words, we are able to get write speeds that are just about 4 times the standard mutable behavior, and we are able to get read speeds that are half the immutable dictionary model.
For reference, the entire code is in the following gist.