Trie based routing

time to read 4 min | 764 words

In my previous post, I discussed the cost of routing in MVC vs. our own implementation. The MVC routing / infrastructure consumed about 90% of the time, while the code that actually does stuff is left with less than 10%. In our implementation, the actual request handling code gets about 85% of the time, while all the infrastructure is handled in the other 15%.

Of that , a large portion of that time is actually spent on routing. In our tests, we saw something like 40% of the time spent in selecting the right controller action to execute, this is when there is just one such action that can be run. Now, to be fair, this is a scenario where we don’t actually have any meaningful logic. We are just saying “hello world” over and over again, so the infrastructure costs are really noticeable in this benchmark. But that is the point, we want to see what are the fixed costs of the infrastructure we use.

Now, in RavenDB we have several hundreds routes (last count was something upward of 400), and we noticed that routing take a lot of time to select the right action to run. We optimized the hell out of that to the point by specializing that to our own situation and based on our own knowledge. But when we started to look at moving our code to Kestrel, we took the time to move to a system that is a better fit for us.

The major reason routing is such an expensive issue in MVC is that it does quite a lot. It needs to handle route selection, capturing values, coercing data to the right types and a lot more. All of that takes time, memory and cpu cycles. But as it turn out, we don’t really need all of that. We have relatively simple routing needs.

Here are a few example of RavenDB routes:

/admin/stats
/databases/{databaseName}/docs
/databases/{databaseName}/indexes/{*id}

So we have the fixed routes, like /admin/stats. We have routes that contain a database name in the middle, and we have routes with a suffix. We can take advantage of that to create an optimal solution.

In our case, we used a trie:

In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.

The cost of querying a trie is O(L), where L is the length of the key that we are querying. Here is how a very small trie looks in our tests.

We start from the route, matching each character against the trie node key. Once we have consumed the full key, we check if it has any children starting with the next character in the url. Note that the Children array here is of size 127. This allows us to index directly into the array (at cost of O(1)) to find the next trie node. We didn’t want to use a dictionary here because of its size and its cost relative to an array access.

This has some interesting implications:

The route definition is limited to ASCII characters (but the actual urls are not, mind).
Regardless of the number of routes, we are ensured of reliable performance.

But that still doesn’t let us handle routes like the ones above. In order to do that, we had to extend the syntax a bit.

/admin/stats would match exactly. That is easiest.

/databases/*/docs is a route that has a capture. The * symbol says “matches anything but /”, so we can get the database name captured without complex matching. Note that by capture, I mean just record the start position and length, we don’t want to pull a substring out of that and pay for an extra allocation.

/databases/*/indexes/$ is another route that has special meaning. The $ symbol says “the match is done, even if the url has additional characters”. That way we can capture embedded ids easily (again, no allocations).

The end result?

Or roughly a route match in 0.9 μs. Pretty good, even if I say so myself.

Tweet Share Share 20 comments

Tags:

Comments

18 Feb 2016
10:40 AM

Dejan

I suppose all these posts about optimizations applied are showing what is coming in version 4. It would be nice to benchmark cumulative effect of all them combined together

18 Feb 2016
11:37 AM

Oren Eini

Dejan, That will be coming, yes

18 Feb 2016
11:40 AM

Vinicius Menezes

Damn! I'm really amazed. Have you implemented the trie yourself? Is it a IDictionary<T,TT>? I'm very impressed.

18 Feb 2016
11:46 AM

Oren Eini

Vinicius, This is our own impl, and it isn't using Dictionary. In high performance, the cost of dictionary is very high. Even in theoretical terms alone, we are at least twice as fast as the dictionary. The dictionary needs to do at least O(2W) operations (where W is the length of the key), we just need to do O(W)

18 Feb 2016
12:22 PM

Eric Smith

Would love to see you focusing your incredible skills and effort on improving the .net core code vs making specialized versions of components for your projects.

18 Feb 2016
12:29 PM

Oren Eini

Eric, I'm not sure that this is possible. One of the reasons that we can get much better performance is that we are able to narrow the requirements. For example, I can say that my routes are only ASCII. But I can't make such a decision for users of the .NET FX. It is very common to want to have routes that have unicode characters. And that has a cost that you have to pay

18 Feb 2016
12:42 PM

Eric Smith

Apparently they changed the router to use a trie in RC2. So it seems that you probably could have contributed your efforts there. Also, everything in .net core is pay for play so maybe you could've helped the entire .net community with a routing option that used an ASCII only route table. I would imagine that the vast majority of apps would fall into the ASCII only category. Also, if you contribute things to .net core it's less code in your app and more code that the MS team will maintain for you. You are a highly skilled developer and you could really set the example for the .net community by embracing the fact that .net has been completely open sourced and making contributions that the entire .net ecosystem can benefit from.

18 Feb 2016
15:50 PM

Federico Lois

@Eric Smith We are actually providing effort, there has been specific optimizations that have been done based on information we provided, most of them in the JIT area but also on investigations on specialized routines on Kestrel. I did myself the prior analysis that allowed Ben Adams to pinpoint and optimize the popcount implementation for the headers parsing which had an insane performance boost because of that.

https://github.com/dotnet/coreclr/issues/1619 https://github.com/dotnet/corefx/issues/2209#issuecomment-139636114 https://github.com/aspnet/KestrelHttpServer/pull/245

Routing as Ayende said is very much specific for the needs of RavenDB (because of the restrictions we apply). Can contribute such things? Certainly, but that means there must have to be an extension point and general agreement that having such optimization is necessary (will payoff accounting the complexity of mantaining it).

18 Feb 2016
16:42 PM

Eric Smith

Yes, I understand and I appreciate your efforts. Just seems like this one could've been contributed, but maybe I'm wrong. :-)

19 Feb 2016
17:43 PM

Oren Eini

Eric, I strongly doubt that this can be contributed. We accept strict limitation on both functionality and external API in order to get this kind of performance.

22 Feb 2016
10:50 AM

Oskar Gewalli

Do you have a link to the trie implementation? I'm a bit curious since there was some talk about people using that data structure in a ruby meetup I frequent.

22 Feb 2016
10:52 AM

Oren Eini

Oskar, Take a look: https://github.com/ayende/ravendb/blob/v4.0/src/Raven.Server/Routing/Trie.cs

23 Feb 2016
17:26 PM

Chris Marisic

Is there any chance you could release a copy of Trie.cs as MIT etc?

24 Feb 2016
05:27 AM

Oskar Gewalli

Thanks for the link!

Chris: Or someone could do a port based on an existing trie implementation in another language. There is a trie in c++ that could perhaps be ported to c#

24 Feb 2016
05:53 AM

Oren Eini

Oskar, What would be the point? Can you expand?

24 Feb 2016
05:58 AM

Oren Eini

Chris, If you want to use that class in another project, go right ahead. I don't think it would be very useful in general, though

24 Feb 2016
06:25 AM

Oskar Gewalli

Oren, I'm hoping that there is an OK implementation of the data structure with some tests. The C++ version looks a bit noisy, so might not be ideal for a port.

24 Feb 2016
06:28 AM

Oren Eini

Oskar, I'm not following why you would want a port in the first place?

24 Feb 2016
06:43 AM

Oskar Gewalli

Oren, usually I'm porting in order to get an implementation (so that I can spend less time writing the code). In this case, I clearly looked at the wrong starting point.

24 Feb 2016
07:35 AM

Oskar Gewalli

Looks like the trie algorithm is simple enough. The c++ version I found is heavily optimised...

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB