Excerpts from the RavenDB Performance team reportJSON & Structs in Voron

time to read 3 min | 570 words

What seems to be a pretty set in stone rule of thumb is that when building a fool proof system, you will also find a better fool. In this case, the fool was us.

In particular, during profiling of our systems, we realized that we did something heinous, something so bad that it deserve a true face palm:

In essence, we spent all this time to create a really fast key/value store with Voron. We spent man years like crazy to get everything working just so… and then came the time to actually move RavenDB to use Voron. That was… a big task, easily as big as writing Voron in the first place. The requirements RavenDB makes of its storage engine are anything but trivial. But we got it working. And it worked good. Initial performance testing showed that we were  quite close to the performance of Esent, within 10% – 15%, which was good enough for us to go ahead with.

Get things working, get them working right and only then get them working fast.

So we spent a lot of time making sure that things worked, and worked right.

And then the time came to figure out if we can make things faster. Remember, Voron is heavily based around the idea of memory mapping and directly accessing values at very little cost. This is important, because we want to use Voron for a whole host of stuff. That said, in RavenDB we use it to store JSON documents, so we obviously store the JSON in it. There are quite a lot of book keeping information that we keep track of.

In particular, let us focus on the indexing stats. We need to keep track of the last indexed etag per index, when it was changed, number of successful indexing attempts, number of failed indexing attempts, etc.

Here is how we updated those stats:

image

As you can imagine, a lot of the time was actually spent just deserializing and deserializing the data back & forth from JSON. In fact, a lot of time was spend doing just that.

Now, indexing status needs to be queried on every single query, so that was a high priority for us to fix. We did this by not storing the data as JSON, but storing the data in its raw form directly into Voron.

Voron gives us a memory of size X, and we put all the relevant data into that memory, and pull the data directly from the raw memory. No need for serialization/deserialization step.

There is a minor issue about the use of strings .While using fixed size structure (like the indexing stats) is really easy, we just read/write to that memory. When dealing with variable length data (such as strings), you have harder time at it, but we dealt with that as well by defining a proper way to encode string lengths and read them back.

The results, by the way:

image

That means that we got close to 4 times faster! That is certainly going to help us down the road.

More posts in "Excerpts from the RavenDB Performance team report" series:

  1. (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
  2. (18 Feb 2015) JSON & Structs in Voron
  3. (13 Feb 2015) Facets of information, Part II
  4. (12 Feb 2015) Facets of information, Part I
  5. (06 Feb 2015) Do you copy that?
  6. (05 Feb 2015) Optimizing Compare – Conclusions
  7. (04 Feb 2015) Comparing Branch Tables
  8. (03 Feb 2015) Optimizers, Assemble!
  9. (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
  10. (29 Jan 2015) Optimizing Memory Comparisons, size does matter
  11. (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
  12. (27 Jan 2015) Optimizing Memory Comparisons
  13. (26 Jan 2015) Optimizing Memory Compare/Copy Costs
  14. (23 Jan 2015) Expensive headers, and cache effects
  15. (22 Jan 2015) The long tale of a lambda
  16. (21 Jan 2015) Dates take a lot of time
  17. (20 Jan 2015) Etags and evil code, part II
  18. (19 Jan 2015) Etags and evil code, Part I
  19. (16 Jan 2015) Voron vs. Esent
  20. (15 Jan 2015) Routing