HTTP benchmark and pipelining
Here is an interesting problem. If you want to load test a server, it is very hard to truly to do so. Put simply, after a while, the problem isn’t with your code, it is with the ability of the surrounding systems to actually get the requests to you fast enough.
In this case, let us talk about what is going on when you are actually doing an HTTP request.
We’ll start from the following code:
Seems pretty simple, right? And all we need to do is to actually send enough of those and we’ll be able to put enough load on the server to matter, right? Except that it doesn’t quite works like this. Let us see what the code above is actually doing by stripping away the HTTP later and dropping down to TCP, shall we?
And that looks good, right? Except that it is still hiding some details. I’m too lazy to go down to raw sockets and demonstrate the details, and anyway it would be way too much code to show here.
Here is a diagram that demonstrate what is going over the network for the two code sample above:
+---------+ +---------+ | Client | | Server | +---------+ +---------+ | | | [SYN] | |------------------------------->| | | | [SYN-ACK] | |<-------------------------------| | | | [SYN] | |------------------------------->| | | -----------------------------\ | |-| Connection now established | | | |----------------------------| | | | [GET / HTTP 1.1] | |------------------------------->| | | -------------------\ | |-| The HTTP request | | | |------------------| | | | [HTTP/1.1 302 Found ... ] | |<-------------------------------| | | --------------------\ | |-| The HTTP response | | | |-------------------| | | -----------------------------------\ | |-| Client now will close connection | | | |----------------------------------| | | | FIN | |------------------------------->| | | | ACK | |<-------------------------------| | | | FIN | |<-------------------------------| | | | ACK | |------------------------------->| | |
Note that this is for the simplest case, assuming that the response is just one packet, assume no packet drops, and ignore stuff like HTTPS, which adds another 4 packets to the initialization, and we are also accounting for the last 4 packets that are required to properly close a connection. This is important, because if you are trying to do high load benchmark, creating and not properly closing TCP connections means that you’ll soon run out of available ports (all your connections will be in CLOSE_WAIT or TIME_WAIT state).
Now, the problem is that this is really expensive. As in, wow expensive. So pretty much as soon as the web started to hit it off (mid 90s or so), people realized that this isn’t going to work, and the notion of Keep-Alive was born.
With Keep-Alive, you are going to reuse the same TCP connection to send multiple requests to the server. The idea is that once the connection is open, there is a strong likelihood that you’ll use it again soon, so why pay the 7 packets cost for opening & closing the TCP connection?
With that optimization, we then have:
+---------+ +---------+ | Client | | Server | +---------+ +---------+ | | | [SYN] | |------------------------------->| | | | [SYN-ACK] | |<-------------------------------| | | | [SYN] | |------------------------------->| | | -----------------------------\ | |-| Connection now established | | | |----------------------------| | | | [GET / HTTP 1.1] | |------------------------------->| | | -------------------\ | |-| The HTTP request | | | |------------------| | | | [HTTP/1.1 302 Found ... ] | |<-------------------------------| | | --------------------\ | |-| The HTTP response | | | |-------------------| | | | [GET /index HTTP 1.1] | |------------------------------->| | | -------------------\ | |-| 2nd HTTP request | | | |------------------| | | | [HTTP/1.1 200 ... ] | |<-------------------------------| | | --------------------\ | |-| 2nd HTTP response | | | |-------------------| | | -----------------------------------\ | |-| Client now will close connection | | | |----------------------------------| | | | FIN | |------------------------------->| | | | ACK | |<-------------------------------| | | | FIN | |<-------------------------------| | | | ACK | |------------------------------->| | |
And the more requests we make to the server, the better we are. Now, there is another trick that we can apply here. Remember that TCP is stream oriented, not packet oriented. That means that as far as the calling code is concerned, we aren’t actually seeing packets, just bytes arriving one after another.
So we can change the way things work to this:
+---------+ +---------+ | Client | | Server | +---------+ +---------+ | | | [SYN] | |-------------------------------------------------------------->| | | | [SYN-ACK] | |<--------------------------------------------------------------| | | | [SYN] | |-------------------------------------------------------------->| | | -----------------------------\ | |-| Connection now established | | | |----------------------------| | | | [GET / HTTP 1.1, GET /data HTTP 1.1, GET /fast HTTP 1.1] | |-------------------------------------------------------------->| | | -------------------------------------\ | |-| 3 HTTP requests in a single apcket | | | |------------------------------------| | | | [HTTP/1.1 302 Found ..., HTTP/1.1 200, HTTP 403] | |<--------------------------------------------------------------| | | ----------------------------------\ | |-| All HTTP response in one packet | | | |---------------------------------| | | -----------------------------------\ | |-| Client now will close connection | | | |----------------------------------| | | | FIN | |-------------------------------------------------------------->| | | | ACK | |<--------------------------------------------------------------| | | | FIN | |<--------------------------------------------------------------| | | | ACK | |-------------------------------------------------------------->| | |
What we did is pretty simple. Instead of waiting for the server to respond to the request, and only then reuse the connection to send the next one, we can send the requests immediately one after the other, without waiting.
In some cases, we can even package multiple requests into a single TCP packet. And the server (shouldn’t) care about that.
Here is what this looks like in practice:
Now, naïve server code will fail here, because it will read from the socket into a buffer, (including some part of the next request), and then forget about that. But it isn’t hard to make sure that this work properly, and that is the key for all high performance servers.
Basically, the real problem is driving enough packets into the server to generate load. By pipelining requests like that, we reduce the number of packets we need to send while at the same time getting a lot higher load.
The cost of routing a packet is independent of its size, and while the size you send is important for bandwidth, the packet latency is much more important for actual speed (latency vs. bandwidth, again). So if we can pack the data into fewer packets, this is a net win. In other words, this is HTTP doing car pooling.
And now that you can drive enough requests into your server to actually stress it, you can work your way into actually handling this load.
Comments
You failed to mention that you cannot return a result while transmitting another. It's useful to notice that http2 IS packet oriented, exactly for this reason.
Comment preview