The cost of routing in NancyFX

Feb 24 2016

The cost of routing in NancyFX

time to read 7 min | 1393 words

I got the following feedback on this post on Twitter.

Using trie for routing isn't new by a long shot. It is a fairly obvious optimization. But not all tries are equal to one another. I decided to see what it would look like using Nancy.

Here is the code I run:

 class Program
 {
     static void Main(string[] args)
     {
         using (var host = new NancyHost(new Uri("http://localhost:8080")))
         {
             host.Start();
             Console.ReadLine();
         }
     }
 }

 public class SampleModule : Nancy.NancyModule
 {
     public SampleModule()
     {
         Get["/"] = _ => "Hello World!";
     }
 }

And like before, we tested this by running it in Release mode using the following gobench command:

.\gobench.exe -c 100 -r 25000 -u http://localhost:8080/

The results weren't good.

Requests:                          2,500,000 hits
Successful requests:               2,500,000 hits
Network failed:                            0 hits
Bad requests failed (!2xx):                0 hits
Successful requests rate:             23,364 hits/sec
Read throughput:                   3,785,046 bytes/sec
Write throughput:                  2,009,345 bytes/sec
Test time:                               107 sec

On the same machine, similar tests using WebAPI gave me a 56,818 hits/sec and using my routing trie we get 86,206 hits/sec. Nancy also seems to be using a lot more CPU than the other two solutions.

Now, we could leave it like that, but that wouldn't be fair. Let us dig a little bit further and see if we can figure out what is costing us there?

I run the same scenario under the profiler, and we got some interesting results.

In terms of percentage, it compares favorably to my trie's results. 1.91% compared to 1.02%. But it terms of absolute time (same load, both running under the profiler), we see 64.7 seconds for 2.5 million routes in the nancy implementation vs. 2.2 seconds for 2.5 million routes in my implementation. We'll ignore the rest of the costs of Nancy, because there is quite a lot here that doesn't matter. Let us focus specifically on the routing trie, shall we?

Now, here is the really sad thing. This is running under a profiler, which slows things down tremendously.

Under those conditions, this code still manage to process a route in just 25 μs. That is pretty awesome, regardless of how you look at it. And for the vast majority of applications, that is way more than what you'll see in most applications. The moment you start actually doing stuff, you aren't even in the same territory. But one of the things that we have been focusing on is the need to speed RavenDB up by [not ready to divulge yet, but let us say… a lot]. So we pay attention to every little bit.

Now, let us look in detail on this code in GetMatches, which at the time of this writing looked something like this:

return this.routeTries[method].GetMatches(path.Split(splitSeparators, StringSplitOptions.RemoveEmptyEntries), context)
                                          .ToArray();

I'm actually going to ignore the code we see here and just talk about the profiler output, because in this case, the code is misleading.

We can see a whole bunch of things that would fail our current code review. Here is a commonly called method that does a whole bunch of expensive allocations.

On each call, we:

Split the string – which is both expensive in computation and creates multiple sub strings and the array to contain them.
Create two dictionaries:

one is done inside the TrieNode.GetMatches call:
return this.GetMatches(segments, 0, new Dictionary<string, object>(this.AdditionalParameters), context);

one is done inside the TrieNode.BuildResults call:

var parameters = new Dictionary<string, object>(capturedParameters);

Call ToArray – which create another allocation for the results.
Uses Any & Select – which often requires allocation of a lambda instance.

There are also the dictionary lookup (which is actually done twice, once using ConstainsKey, then using the indexer):

if (!this.routeTries.ContainsKey(method))
{
    return MatchResult.NoMatches;
}

return this.routeTries[method] /* removed for clarity */;

Note that just replacing this with a TryGetValue is likely to give us about 8% improvement in this method.

It actually gets better, looking at the code, it looks like we hit the code path for the root route, while other routes need to do more work. I changed the route to be Get["/values"], and run the benchmark under profiler again.

The only change I made is that now I have a non root route, the cost for routing is up by about 75%:

Now, I'm being utterly unfair here. NancyFX actually has a really sophisticated routing system, it does a lot. I mean, really a lot.

My routing trie, on the other hand, is intentionally dumb and require the request handling code to participate in some decisions. This is done intentionally, because it gives me the chance to do a whole bunch of optimizations cheaply. To start with, I don't need to do any allocations whatsoever throughout the call, and the request handling code can also skip string allocations in most cases.

This isn't an issue of algorithms. It is an issue of what you are trying to do.

For comparisons, doing the same amount of work in my current version (which was even further optimized), give us this:

You can see what the cost of routing is for us. While the optimized root route cost for Nancy is 25 μs, we are able to route the request in 0.23 μs, that is two orders of magnitudes faster. My entire routing cost can fit 10 times in the cost of a single dictionary lookup in the NancyFX case.

So, to answer the original question, no, I really couldn't just use NancyFX code to get "this optimization". The algorithm is important, but the actual implementation details are far more interesting and relevant.

Tweet Share Share 8 comments

Tags:

Comments

24 Feb 2016
11:43 AM

Catalin Pop

Yet another "you could have used that" answer from the internet ... Thank you for posting this, it shows the effort required to combat thrown out opinions ...

24 Feb 2016
14:37 PM

Optimus

Hi,

What's the best way to proceed to avoid the allocation required by a lambda instance in the cases of Any(...) and Select(...)? Just replace them with a for loop? BTW loving these articles on performance.

Regards.

24 Feb 2016
16:20 PM

Sebastiaan Dammann

The routing you do in Raven is fairly static right? You're not adding dynamic routes? It would be interesting to see if a T4 generated solution would be faster. That should also invoking methods without boxing value types (if applicable).

By the way, Microsoft Account login on your blog is broken.

24 Feb 2016
17:30 PM

Oren Eini

Optimus, Yes, avoiding the use of Linq in general is the best way to do so. You can also see those guidelines in Roslyn project.

24 Feb 2016
17:32 PM

Oren Eini

Sebastiaan, Yes, we don't add routes on the fly. That is the same for most systems, BTW. It is very rare for a system to support dynamic routes (note that parameterized routes are different).

I'm not sure that I'm following how you would use code gen to solve this? Note that the major cost that I'm looking here isn't invoking the right action, but finding which action to invoke

25 Feb 2016
08:58 AM

Rob

As always a really informative post, maybe you should do a pull request to nancy, seems like everyone can benefit.

25 Feb 2016
09:10 AM

Oren Eini

Rob, What I did was analyze the code, I did no modifications, nor do I intend to do any

06 Mar 2016
15:42 PM

Stephen

Oren,

Two questions.

1) What profiler is it that you're using for these posts? 2) How do you get gobench?

Thank you, from the States

-Stephen

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB