Excerpts from the RavenDB Performance team reportRouting

time to read 4 min | 759 words

We have been quite busy lately doing all sort of chores. The kind of small features that you don’t really notice, but make a lot of difference over time. One of the ways that we did that was to assign a team for the sole purpose of improving the performance of RavenDB. This is done by writing a lot of stress tests, running under profilers and in general doing very evil things to the code to get it to work much faster. The following is part of the results that we have found so far, written mostly by Tal.

We decided to use Yahoo! Cloud Service Benchmark (AKA YCSB) for that purpose, while YCSB is great in creating stress on the target database servers it does not offer complex interface to stress complex scenario, and seems to be targeting key value solutions. YCSB allows you to basically control some parameters to the workload engine and implementation of five basic operation: Insert, Update, Read, Scan and Delete of a single record, in a single table/collection.

Note: I think that this is because it focused (at least initially) primarily on the Big Table type of NoSQL solutions.

This very thin interface allows us to mainly stress the PUT/GET document. The first thing we found is that running with indexes actually slowed us down since we are “wasting” precious CPU time on indexing and not doing any querying, that is enough to show you the true face of benchmark miracle, since where have you ever seen a database that you didn’t wish to run a query upon?  But this is a topic for another post… lets go back to performance.

The second thing we found was a hotspot in the route data retrieval  mechanism of the Web API/ While using Web API routing seems to work like magic, when you get to have approximately four hundred endpoints you will start suffering from performance issues retrieving route data. In our case we were spending about 1ms per http request just to retrieve the route data for the request, that may not seems like much but for 1,500 request that took 5 seconds to execute this took 1.5 seconds just for route data retrieval. Please, do not try to extrapolate the throughput of RavenDB from those numbers since we ran this test on a weak machine also under a profiling tool that slow it down significantly.

Note: Effectively, for our use case, we were doing an O(400) operation on every request, just to find what code we needed to run to handle this request.

We decided to cache the routes for each request but just straight forward caching wouldn’t really help in our case. The problem is that we have many unique requests that would fill the cache and then wouldn’t actually be of use.

Our major issue was requests that had a route that looked like so databases/{databaseName}/docs/{*docId}. While we we do want to cache routes such as: databases/{databaseName}/stats. The problem is that we have both types of routes and they are used quite often so we ended up with a greatest common route cache system. That means that /databases/Northwind/docs/Orders/order1 and /databases/Northwind/docs/Orders/order2 will go into the same bucket that is /databases/Northwind/docs.

In that way, we would still have to do some work per request, but now we don’t have to go through all four hundred routes we have to find the relevant routes.

What was the performance impact you wonder? We took off 90% of get route data time that is spending 0.15 seconds instead of 1.5 seconds for those 1,500 requests, reducing the overall runtime to 3.65 seconds, that is 27% less run time.

Note: Again, don’t pay attention to the actual times, it is the rate of improvement that we are looking at.

As an aside, another performance improvement that we made with regards to routing was routing discovery. During startup, we scan the RavenDB assembly for controllers, and register the routes. That is all great, as long as you are doing that once. And indeed, in most production scenarios, you’ll only do that once per server start. That means that the few dozen milliseconds that you spend on this aren’t really worth thinking about. But as it turned out, there was one key scenario that was impacted by this. Tests! In tests, you start and stop RavenDB per test, and reducing this cost is very important to ensuring a pleasant experience. We were able to also optimize the route discovery process, and we were actually able to reduce our startup time from cold boot to below what we had in 2.5.

More posts in "Excerpts from the RavenDB Performance team report" series:

  1. (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
  2. (18 Feb 2015) JSON & Structs in Voron
  3. (13 Feb 2015) Facets of information, Part II
  4. (12 Feb 2015) Facets of information, Part I
  5. (06 Feb 2015) Do you copy that?
  6. (05 Feb 2015) Optimizing Compare – Conclusions
  7. (04 Feb 2015) Comparing Branch Tables
  8. (03 Feb 2015) Optimizers, Assemble!
  9. (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
  10. (29 Jan 2015) Optimizing Memory Comparisons, size does matter
  11. (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
  12. (27 Jan 2015) Optimizing Memory Comparisons
  13. (26 Jan 2015) Optimizing Memory Compare/Copy Costs
  14. (23 Jan 2015) Expensive headers, and cache effects
  15. (22 Jan 2015) The long tale of a lambda
  16. (21 Jan 2015) Dates take a lot of time
  17. (20 Jan 2015) Etags and evil code, part II
  18. (19 Jan 2015) Etags and evil code, Part I
  19. (16 Jan 2015) Voron vs. Esent
  20. (15 Jan 2015) Routing