A PKI-less secure communication channelError handling at the protocol level
One of the things that I find myself paying a lot of attention to is the error handling portion of writing software. This is one of the cases where I’m sounding puffy even to my own ears, but from over two decades of experience, I can tell you that getting error handling right is one of the most important things that you can do for your systems. I spend a lot of time on getting errors right. That doesn’t just mean error handling, but error reporting and giving enough context that the other side can figure out what we need to do.
In a secured protocol, that is a bit harder, because we need to safeguard ourselves from eavesdroppers, but I spent significant amounts of time thinking on how to do this properly. Here are the ground rules I set out for myself:
- The most common scenario is client failing to connect to the server.
- We need to properly report underlying issues (such as TCP errors) while also exposing any protocol level issues.
- There is an error during the handshake and errors during processing of application messages. Both scenarios should be handled.
We already saw in the previous post that there is the concept of the data messages and alert messages (of which there can only be one). Let’s look how that works for the handshake scenario. I’m focusing on the server side here, because I’m assuming that this one is more likely to be opaque. A client side issue can be much more easily troubleshooted. And the issue isn’t error handling inside the code, it is distributed error handling. In other words, if the server has an issue, how it reports to the client?
The other side, where the client wants to report an issue to the server, is of no interest to us. From our perspective, a client can cut off at any point (TCP connection broke, etc), so there is no meaning to trying to do that gracefully or give more data to the server. What would the server do with that?
Here is the server portion of establishing a secured connection:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
pub fn serverConnection(allocator: *std.mem.Allocator, stream: std.net.Stream, server_keys: crypto.KeyPair) !AuthenticatedConnection { errdefer stream.close(); var handshake = protocol.Server.initialize(server_keys); var reader = stream.reader(); var hello: protocol.HelloMessage = undefined; try reader.readNoEof(std.mem.asBytes(&hello)); try hello.route(&handshake); // no routing supported here var challenge = try hello.challenge(&handshake); var writer = stream.writer(); try writer.writeAll(std.mem.asBytes(&challenge)); var resp: protocol.ChallengeResponse = undefined; try reader.readNoEof(std.mem.asBytes(&resp)); var session = try handshake.generateKey(); var rc: AuthenticatedConnection = undefined; std.mem.copy(u8, &rc.pub_key, &handshake.client.long_term_public_key); rc.stream = try crypto.NetworkStream.init(allocator, stream, session); try resp.completeAuth(&handshake); return rc; }
I’m using Zig to write this code and you can see any potential error in the process marked with a try keyword. Looking at the code, everything up to line 24 (the completeAuth() call) is mechanically sending and receiving data. Any error up to that point is something that is likely network related (so the connection is broken). You can see that the protocol call challenge() can fail as does the call to generateKey() – in both cases, there isn’t much that I can do about it. If the generateKey() call fails, there is no shared secret (for that matter, it doesn’t look like that can fail, but we’ll ignore that). As for the challenge() call, the only way that can fail is if the server has failed to encrypt its challenge properly. That is not something that the client can do much about. And anyway, there isn’t a failing codepath there either.
In other words, aside from network issues, which will break the connection (meaning we cannot send the error to the client anyway), we have to wait until we process the challenge from the client to have our first viable failure. In the code above, I”m just calling try, which means that we’ll fail the connection attempt, close the socket and basically just hang up on the client. That isn’t nice to do at all. Here is what I replaced line 24 with:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
resp.completeAuth(&handshake) catch |e| { // we use the secure channel to send an error to the other side (will also abort the connection there) var msg = "Failed to validate challenge response".*; rc.stream.send_alert(crypto.AlertTypes.BadChallengeResponse, &msg) catch { // there is nothing we can do here, ignoring the error }; return e; // implicitly close the connection };
What is going on here is that by the time that I got the challenge response from the client, I have enough information to send derive the shared key. I can use that to send an alert to the other side, letting them know what the failure was. A client will complete the challenge, and if there is a handshake failure, we proceed to fail gracefully with meaning error.
But there is another point to this protocol, an alert message doesn’t have to show up only in the hand shake part. Consider a long running response that run into an error. Here is how you’ll usually handle that in TCP / HTTP scenarios, assume that we are streaming data to the client and suddenly run into an issue:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
{ "Databases": [ { "Name": "Northwind", "Disabled": false, "TotalSize": { "HumaneSize": "327.81 MBytes", "SizeInBytes": 343736320 } }, { "Name": "Darksand", "Disabled": false, "TotalSize": { Unhandled Exception: System.UnauthorizedAccessException: Access to the path '/data/darksand' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter) at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) at Microsoft.Diagnostics.Runtime.Linux.LinuxLiveDataReader.OpenMemFile() at Microsoft.Diagnostics.Runtime.Linux.LinuxLiveDataReader.ReadMemory(UInt64 address, IntPtr buffer, Int32 bytesRequested, Int32& bytesRead) at Microsoft.Diagnostics.Runtime.DacInterface.DacDataTargetWrapper.ReadVirtual(IntPtr self, UInt64 address, IntPtr buffer, Int32 bytesRequested, Int32& bytesRead) at Microsoft.Diagnostics.Runtime.DacLibrary..ctor(DataTarget dataTarget, String dacDll) at Microsoft.Diagnostics.Runtime.DataTarget.ConstructRuntime(ClrInfo clrInfo, String dac)
How do you send an error midstream? Well, you don’t. If you are lucky, you’ll have the error output and have some way to get the full message and manually inspect it. That is a distressingly common issue, by the way, and a huge problem for proper error reporting with long running responses.
With the alert model, we have effectively multiple channels in the same TCP stream that we can utilize to send a clear and independent error for the client. Much nicer overall, even if I say so myself.
And it just occurred to me that this mimics quite nicely the same approach that Zig itself uses for error handling .
More posts in "A PKI-less secure communication channel" series:
- (12 Oct 2021) Using TLS
- (08 Oct 2021) Error handling at the protocol level
- (07 Oct 2021) Implementing the record stream
- (06 Oct 2021) Coding the handshake
- (04 Oct 2021) The record layer
- (01 Oct 2021) design
Comments
Comment preview