The cost of routing in NancyFX

time to read 7 min | 1393 words

I got the following feedback on this post on Twitter.


Using trie for routing isn't new by a long shot. It is a fairly obvious optimization. But not all tries are equal to one another. I decided to see what it would look like using Nancy.

Here is the code I run:

 class Program
     static void Main(string[] args)
         using (var host = new NancyHost(new Uri("http://localhost:8080")))

 public class SampleModule : Nancy.NancyModule
     public SampleModule()
         Get["/"] = _ => "Hello World!";

And like before, we tested this by running it in Release mode using the following gobench command:

.\gobench.exe -c 100 -r 25000 -u http://localhost:8080/

The results weren't good.

Requests:                          2,500,000 hits
Successful requests:               2,500,000 hits
Network failed:                            0 hits
Bad requests failed (!2xx):                0 hits
Successful requests rate:             23,364 hits/sec
Read throughput:                   3,785,046 bytes/sec
Write throughput:                  2,009,345 bytes/sec
Test time:                               107 sec

On the same machine, similar tests using WebAPI gave me a 56,818 hits/sec and using my routing trie we get 86,206 hits/sec. Nancy also seems to be using a lot more CPU than the other two solutions.

Now, we could leave it like that, but that wouldn't be fair. Let us dig a little bit further and see if we can figure out what is costing us there?

I run the same scenario under the profiler, and we got some interesting results.


In terms of percentage, it compares favorably to my trie's results. 1.91% compared to 1.02%. But it terms of absolute time (same load, both running under the profiler), we see 64.7 seconds for 2.5 million routes in the nancy implementation vs. 2.2 seconds for 2.5 million routes in my implementation. We'll ignore the rest of the costs of Nancy, because there is quite a lot here that doesn't matter. Let us focus specifically on the routing trie, shall we?


Now, here is the really sad thing. This is running under a profiler, which slows things down tremendously.

Under those conditions, this code still manage to process a route in just 25 μs. That is pretty awesome, regardless of how you look at it. And for the vast majority of applications, that is way more than what you'll see in most applications. The moment you start actually doing stuff, you aren't even in the same territory. But one of the things that we have been focusing on is the need to speed RavenDB up by [not ready to divulge yet, but let us say… a lot]. So we pay attention to every little bit.

Now, let us look in detail on this code in GetMatches, which at the time of this writing looked something like this:

return this.routeTries[method].GetMatches(path.Split(splitSeparators, StringSplitOptions.RemoveEmptyEntries), context)

I'm actually going to ignore the code we see here and just talk about the profiler output, because in this case, the code is misleading.

We can see a whole bunch of things that would fail our current code review. Here is a commonly called method that does a whole bunch of expensive allocations.

On each call, we:

  • Split the string – which is both expensive in computation  and creates multiple sub strings and the array to contain them.
  • Create two dictionaries:
    • one is done inside the TrieNode.GetMatches call:
      return this.GetMatches(segments, 0, new Dictionary<string, object>(this.AdditionalParameters), context);
    • one is done inside the TrieNode.BuildResults call:
      var parameters = new Dictionary<string, object>(capturedParameters);
  • Call ToArray – which create another allocation for the results.
  • Uses Any & Select – which often requires allocation of a lambda instance.

There are also the dictionary lookup (which is actually done twice, once using ConstainsKey, then using the indexer):

if (!this.routeTries.ContainsKey(method))
    return MatchResult.NoMatches;

return this.routeTries[method] /* removed for clarity */;

Note that just replacing this with a TryGetValue is likely to give us about 8% improvement in this method.

It actually gets better, looking at the code, it looks like we hit the code path for the root route, while other routes need to do more work. I changed the route to be Get["/values"], and run the benchmark under profiler again.

The only change I made is that now I have a non root route, the cost for routing is up by about 75%:



Now, I'm being utterly unfair here. NancyFX actually has a really sophisticated routing system, it does a lot. I mean, really a lot.

My routing trie, on the other hand, is intentionally dumb and require the request handling code to participate in some decisions. This is done intentionally, because it gives me the chance to do a whole bunch of optimizations cheaply. To start with, I don't need to do any allocations whatsoever throughout the call, and the request handling code can also skip string allocations in most cases.

This isn't an issue of algorithms. It is an issue of what you are trying to do.

For comparisons, doing the same amount of work in my current version (which was even further optimized), give us this:


You can see what the cost of routing is for us. While the optimized root route cost for Nancy is 25 μs, we are able to route the request in 0.23 μs, that is two orders of magnitudes faster. My entire routing cost can fit 10 times in the cost of a single dictionary lookup in the NancyFX case.

So, to answer the original question, no, I really couldn't just use NancyFX code to get "this optimization". The algorithm is important, but the actual implementation details are far more interesting and relevant.