Nagle algorithm is a pretty annoying thing. Basically, it means that when we write to the network, the TCP stack will wait a bit to see if we have more stuff to send to that destination before actually emitting the packet. Most people run into this when they wonder why the minimum time for a remote operation is 200ms, even on the local network. Then you figure out how to disable Nagle Algorithm, and you are happy again.
Nagle algorithm was designed for remote terminals, where the speed difference between a human typing and the machine sending packet was big enough that each single letter you typed would be sent as a separate packet. That led to a really high overhead, 40 bytes overhead to send just a single byte to the server, and the number of packets that were sent may be high enough to cause the pipe to choke. So you buffer it, the basic algorithm goes like this, if we don’t have enough data to send a full packet, and if 200 ms didn’t pass since the first buffered data, wait up to 200 ms for more data. In this manner, if you type fast enough, you will send more than a single character to the server, dramatically reducing the cost of talking with the server, and speeding up everything significantly.
So this is Nagle Algorithm, it is a pretty low level TCP detail, and often overlooked by people. If you don’t study networks, you’ll typically only find out about it when you have a perf problem that you can’t figure out. So how is this relating to business applications?
Imagine that you work on a system that does snail mail sending. A user will call the API and then a few days later a physical letter will show up at your door. Nothing really special about that, right? Except that we charge users based on the plan they choose. For simplicity’s sake, we’ll say that we have two plans:
- Pay as you go – can send as many letters as they want, and we charge a certain amount for each.
- Budgetted plan – can send up to certain number of letters per month.
In either case, it is pretty important to us to record that the mail was sent, and if the user is on a budget plan (or has spending alerts on his account), we need to respond to it in certain ways.
Overall, there is nothing really surprising here, and the code to handle this is pretty simple, right?
The problem is that under load, we’re going to have query a few requests going on to the billing service, and that is likely to be a bottleneck for us. Note that this is the case because while processing a single RecordMail request is pretty fast, the problem is that we are actually going to have to wait for both the actual processing and the back and forth between the machines. If the cost of processing a RecordMail request is 1 ms, and the cost of going over the network is 0.5 ms, that adds up quickly.
Now, let us see how we can do it better. We’ll start with moving from making the call immediately to placing the details in a queue.
You can see that we use a task completion source to wait for the result of the call. Next, we need to actually handle this, this is done in the following methods:
What this ends up doing is to reduce the network costs. Instead of going to the server once for every request, we’ll go to the server once per 200 ms or every 256 requests. That dramatically reduce network traffic. And the cost of actually sending 256 requests instead of just one isn’t really that different. Note that this gives us a higher latency per request, since we may have to wait longer for the request to go out to the server, but it also gives me much better throughput, because I can process more requests in a shorter amount of time.
A change like that can push your performance up by an order of magnitude or more. It also gives you economy of scale benefits on the other side. Instead of having to do a fixed amount of work around processing each request, we can now do it over a whole bunch of them.
For example, if I need to load the customer entity, instead of doing it once per mail, I can do that once per batch, and it is likely that this customer will be in the batch more than once, reducing the amount of work I have to do.