Trie based routing
In my previous post, I discussed the cost of routing in MVC vs. our own implementation. The MVC routing / infrastructure consumed about 90% of the time, while the code that actually does stuff is left with less than 10%. In our implementation, the actual request handling code gets about 85% of the time, while all the infrastructure is handled in the other 15%.
Of that , a large portion of that time is actually spent on routing. In our tests, we saw something like 40% of the time spent in selecting the right controller action to execute, this is when there is just one such action that can be run. Now, to be fair, this is a scenario where we don’t actually have any meaningful logic. We are just saying “hello world” over and over again, so the infrastructure costs are really noticeable in this benchmark. But that is the point, we want to see what are the fixed costs of the infrastructure we use.
Now, in RavenDB we have several hundreds routes (last count was something upward of 400), and we noticed that routing take a lot of time to select the right action to run. We optimized the hell out of that to the point by specializing that to our own situation and based on our own knowledge. But when we started to look at moving our code to Kestrel, we took the time to move to a system that is a better fit for us.
The major reason routing is such an expensive issue in MVC is that it does quite a lot. It needs to handle route selection, capturing values, coercing data to the right types and a lot more. All of that takes time, memory and cpu cycles. But as it turn out, we don’t really need all of that. We have relatively simple routing needs.
Here are a few example of RavenDB routes:
So we have the fixed routes, like /admin/stats. We have routes that contain a database name in the middle, and we have routes with a suffix. We can take advantage of that to create an optimal solution.
In our case, we used a trie:
In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.
The cost of querying a trie is O(L), where L is the length of the key that we are querying. Here is how a very small trie looks in our tests.
We start from the route, matching each character against the trie node key. Once we have consumed the full key, we check if it has any children starting with the next character in the url. Note that the Children array here is of size 127. This allows us to index directly into the array (at cost of O(1)) to find the next trie node. We didn’t want to use a dictionary here because of its size and its cost relative to an array access.
This has some interesting implications:
- The route definition is limited to ASCII characters (but the actual urls are not, mind).
- Regardless of the number of routes, we are ensured of reliable performance.
But that still doesn’t let us handle routes like the ones above. In order to do that, we had to extend the syntax a bit.
/admin/stats would match exactly. That is easiest.
/databases/*/docs is a route that has a capture. The * symbol says “matches anything but /”, so we can get the database name captured without complex matching. Note that by capture, I mean just record the start position and length, we don’t want to pull a substring out of that and pay for an extra allocation.
/databases/*/indexes/$ is another route that has special meaning. The $ symbol says “the match is done, even if the url has additional characters”. That way we can capture embedded ids easily (again, no allocations).
The end result?
Or roughly a route match in 0.9 μs. Pretty good, even if I say so myself.
I suppose all these posts about optimizations applied are showing what is coming in version 4. It would be nice to benchmark cumulative effect of all them combined together
Dejan, That will be coming, yes
Damn! I'm really amazed. Have you implemented the trie yourself? Is it a IDictionary<T,TT>? I'm very impressed.
Vinicius, This is our own impl, and it isn't using Dictionary. In high performance, the cost of dictionary is very high. Even in theoretical terms alone, we are at least twice as fast as the dictionary. The dictionary needs to do at least O(2W) operations (where W is the length of the key), we just need to do O(W)
Would love to see you focusing your incredible skills and effort on improving the .net core code vs making specialized versions of components for your projects.
Eric, I'm not sure that this is possible. One of the reasons that we can get much better performance is that we are able to narrow the requirements. For example, I can say that my routes are only ASCII. But I can't make such a decision for users of the .NET FX. It is very common to want to have routes that have unicode characters. And that has a cost that you have to pay
Apparently they changed the router to use a trie in RC2. So it seems that you probably could have contributed your efforts there. Also, everything in .net core is pay for play so maybe you could've helped the entire .net community with a routing option that used an ASCII only route table. I would imagine that the vast majority of apps would fall into the ASCII only category. Also, if you contribute things to .net core it's less code in your app and more code that the MS team will maintain for you. You are a highly skilled developer and you could really set the example for the .net community by embracing the fact that .net has been completely open sourced and making contributions that the entire .net ecosystem can benefit from.
@Eric Smith We are actually providing effort, there has been specific optimizations that have been done based on information we provided, most of them in the JIT area but also on investigations on specialized routines on Kestrel. I did myself the prior analysis that allowed Ben Adams to pinpoint and optimize the popcount implementation for the headers parsing which had an insane performance boost because of that.
https://github.com/dotnet/coreclr/issues/1619 https://github.com/dotnet/corefx/issues/2209#issuecomment-139636114 https://github.com/aspnet/KestrelHttpServer/pull/245
Routing as Ayende said is very much specific for the needs of RavenDB (because of the restrictions we apply). Can contribute such things? Certainly, but that means there must have to be an extension point and general agreement that having such optimization is necessary (will payoff accounting the complexity of mantaining it).
Yes, I understand and I appreciate your efforts. Just seems like this one could've been contributed, but maybe I'm wrong. :-)
Eric, I strongly doubt that this can be contributed. We accept strict limitation on both functionality and external API in order to get this kind of performance.
Do you have a link to the trie implementation? I'm a bit curious since there was some talk about people using that data structure in a ruby meetup I frequent.
Oskar, Take a look: https://github.com/ayende/ravendb/blob/v4.0/src/Raven.Server/Routing/Trie.cs
Is there any chance you could release a copy of Trie.cs as MIT etc?
Thanks for the link!
Chris: Or someone could do a port based on an existing trie implementation in another language. There is a trie in c++ that could perhaps be ported to c#
Oskar, What would be the point? Can you expand?
Chris, If you want to use that class in another project, go right ahead. I don't think it would be very useful in general, though
Oren, I'm hoping that there is an OK implementation of the data structure with some tests. The C++ version looks a bit noisy, so might not be ideal for a port.
Oskar, I'm not following why you would want a port in the first place?
Oren, usually I'm porting in order to get an implementation (so that I can spend less time writing the code). In this case, I clearly looked at the wrong starting point.
Looks like the trie algorithm is simple enough. The c++ version I found is heavily optimised...