Refactoring C CodeGoing to async I/O

time to read 6 min | 1064 words

Now that I have a good idea on how to use OpenSSL and libuv together, I’m going to change my code to support that mode of operation. I have already thought about this a lot, and the code I already have is ready to receive the change in behavior, I think.

One of the things that I’m going to try to do while I move the code over is properly handle all error conditions. We’ll see how that goes.

I already have the concept of a server_state_run() method that handles all the network activity, dispatching,  etc. So that should make it easy. I’m going to start by moving all the libuv code there. I’m also going to take the time to refactor everything to an API that is more cohesive and easier to deal with.

There is some trouble here, with having to merge together two similar (but not quite identical) concepts. My libuv & openssl post dealt with simply exposing a byte stream to the calling code. My network protocol code is working at a higher level. Initially I tried to layer things together, but that quickly turned out to be a bad idea. I decided to have a single layer that handles both the reading from the network, using OpenSSL and parsing the commands over the network.

The first thing to do was to merge the connection state, I ended up with this code:

There are a few things that are interesting here. On the one hand, I want to keep the state of the connection private, but on the other, we need to expose this out to the user to use some parts of it. The way libuv handles it is with comments denoting what are considered public / private portions of the interface. I decided to stick it in a dedicated struct. This also allowed me to get the size of the private members, which is important for what I wanted to do next.

The connection state struct have the following sections:

  • private / reserved – 64 bytes
  • available for user to use – 64 bytes (and aligned on 64 bytes boundary)
  • msg buffer – 8,064 bytes

The idea here is that we give the user some space to keep their own data in, and that the overall connection state size is exactly 8KB, so can fit in two OS pages. On Linux, in most cases, we’ll not need a buffer that is over 3,968 bytes long, we can even save the second page materialization (because the OS lazily allocate memory to the process). I’m using 64 bytes alignment for the user’s data to reduce any issues that the user have for storing data about the connection. It will also keep it nicely within the data the user need to handle the connection nearby the actual buffer.

I’m 99% sure that I won’t need any of these details, but I thought it is best to think ahead, and it was fun to experiment.

Here is how the startup code for the server changed:

I removed pretty much all the functions that were previously used to build it. We have the server_state_init_t struct, which contains everything that is required for the server to run. Reducing the number of functions to build this means that I have to do less and there is a lot less error checking to go through. Most of the code that I had to touch didn’t require anything interesting. Take the code from the libuv/openssl project, make sure it compiles, etc. I’m going to skip talking about the boring stuff.

I did run into an couple of issues that are worth talking about. Error handling and authentication. As mentioned, I’m using client certificates for authentication, but unlike my previous code, I’m not explicitly calling SSL_accept(), instead, I rely on OpenSSL to manage the state directly.

This means that I don’t have a good location to put the checks on the client certificate that is used. For that matter, our protocol starts with the server sending an: “OK\r\n” message to the client to indicate successful connection. Where does this go? I put all of this code inside the handle_read() method.

This method is called whenever libuv has more data to give us on the connection. The actual behavior is on ensure_connection_intialized(), where we check a flag on the connection, and if we haven’t done the initialization of the connection, we check i OpenSSL consider the connection established. If it is established, we validate the connection and then send the OK to start the ball rolling.

You might have noticed a bunch of work with flags CONNECTION_STATUS_WRITE_AND_ABORT and CONNECTION_STATUS_INIT_DONE. What is that about?

Well, CONNECTION_STATUS_INIT_DONE is self explanatory, I hope. This just tells us whatever the connection has already been checked or not. This save us the cost of validate the client cert of each packet. Usually, SSL handshake means that we could do this check only inside the “need to read more from the network”, but I think that there are certain communication patterns in which the SSL handshake could be completed and the packet will already have additional encrypted information for the connection. For example, I’m pretty sure that TLS 1.3 0-RTT is one such case. This is why the ensure_connection_initialized() is called twice in the code.

Of more interest is the CONNECTION_STATUS_WRITE_AND_ABORT flag. This is set in one of two locations. First, if we fail to validate the certificate for the connection. Second, if we failed to process the message that was sent to us (inside read_message()).

In either case, we want to close the connection, but we have a problem: Error handling. We use libuv for all I/O, and that is asynchronous in nature. We want to write an error to the other side, to be nice, and in order to do that, we need to process the write and keep the connection around long enough that we’ll actually send it to the other side. Because of this, when this flag is set, we have the following behaviors:

  • Any newly available data on that connection is immediately discarded
  • The next write will flush all the data to the network, wait for confirmation that this was sent and close the connection.

This works very nicely to allow me to abort on an error and still get really nice errors on the other side.

As usual, you can read the full code for the network protocol for this post here.

More posts in "Refactoring C Code" series:

  1. (12 Dec 2018) Going to async I/O
  2. (10 Dec 2018) Do we need a security review?
  3. (07 Dec 2018) Implementing parsing
  4. (04 Dec 2018) Multi platform and Valgrind
  5. (03 Dec 2018) Choosing the next direction
  6. (30 Nov 2018) Giving good SSL errors to your client…
  7. (29 Nov 2018) Starting with an API
  8. (27 Nov 2018) Error handling is HARD, error REPORTING is much harder