A PKI-less secure communication channeldesign
Following our recent hiccup with certificate expiration, I spent some time thinking about what we could do better. One of the paths that this led me to was to consider how I would design the underlying communication channel for RavenDB if I had a blank slate. Currently, RavenDB uses TLS over TCP and HTTPS (which is the same thing) as the sole communication mechanism between servers and clients and between the servers in the cluster. That relies on TLS to ensure the safety of the information, as well as client certificates for authentication. TLS, of course, require the use of server certificates, which means that we have mutual authentication between clients and servers. However, the PKI infrastructure that is required to support that is freaking complex. It is mostly invisible, except when it isn’t, when something fails.
The idea in this design exercise is to consider how I would do things differently. This is a thought exercise only, not something that we intend to put into any kind of system at this point in time. The use of TLS has proven itself to be very successful and was greatly beneficial. I consider such design exercises to be vital to the overall health of a project (and my own mind), because it allows me to dive deeply into a topic and consider this from a different view point. Therefor, I’m going to proceed based on the RavenDB’s set of requirements, even though this is all theoretical.
That disclaimer aside, what do we actually need from an secure communication channel?
- Build on top of TCP – nothing else would do, and while UDP is nice to consider, that isn’t relevant for RavenDB’s scenario, so not worth considering. RavenDB makes a lot of use of the streaming nature of TCP connections. It allows us to make a lot of assumptions on the state of the other side. The key aspect we take advantage of is the fact that for a given connection, if I send you a document, I can assume that you already go (and processed successfully) all previous documents. That saves a lot of back & forth to maintain distributed state.
- Encrypted over the wire – naturally that means that we need to satisfy the same level of security as TLS.
- Provide mutual authentication of clients and servers – including in a hostile network environment.
Let’s consider what we want to achieve here. The situation is not deployment of servers and clients by many independent organizations (each distrusting all others). Instead, we are setting up a cluster of RavenDB nodes that will talk to one another as well as any number of clients that will talk to those servers. That means that we can safely assume that there is a background channel which we trust. That remove the need to setup PKI and having a trusted third party that we’ll talk to. Instead, we are going to use public key cryptography to do authentication between nodes and clients.
Here is how it is going to look like. When setting up a cluster, the admin will generate a key pair, like so:
Server Secret: I_lfn5vna3p1OxyJ_kCJzRaBOWD-vio6hvpL6b2qYs8
Server Public: oXQJcrZfMNoDDl1ZVSuJlKbREsd5yoprViQOTqmSSCk
The secret portion is going to remain written to the server’s configuration file, and the public portion will be used when connecting to the server, to ensure that we are talking to the right one. In the same sense, we’ll have the client generate a key pair as well:
Client Secret: TVwQXoiYfvuToz5NY8D27bIeJR-LgR4y8gCM4UE3ZSc
Client Public: 5nNpLTSQmqzh3yttyD1DyM2a2caLORtecPj5LQ2tIHs
With those in place, we can now setup the following configuration on the server side:
Note that the settings.json contains the key pair of the server, but only the public key of the authorized clients. Conversely, the connection string for RavenDB would be:
Server=crypto.protocol.ravendb.example;
ServerPublicKey=oXQJcrZfMNoDDl1ZVSuJlKbREsd5yoprViQOTqmSSCk;
ClientSecretKey=TVwQXoiYfvuToz5NY8D27bIeJR-LgR4y8gCM4UE3ZSc;
ClientPublicKey=5nNpLTSQmqzh3yttyD1DyM2a2caLORtecPj5LQ2tIHs;
In this case, the client connection string has the key pair of the client, and just the public key of the server. The idea is that we’ll use these to validate that either end is actually who we think they are.
The details of public key cryptography is beyond the topic of this blog post (or indeed, my own understanding, if you get down to it), but the best metaphor that I found was the color mixing one. I’ll remind you that in public key cryptography, we have:
- Client Secret Key (CSK), Client Public Key (CPK)
- Server Secret Key (SSK), Server Public Key (SPK)
We can use the following operations:
- Encrypt(CPK, SSK) –> Decrypt(SPK, CSK)
- Encrypt(SPK, CSK) –> Decrypt(CPK, SSK)
In other words, we can use a public / secret from both ends to encrypt and decrypt the data. Note that so far, everything I did was pretty bog standard Intro to Cryptography 101. Let’s see how we take those idea and turn them into an actual protocol. The details are slightly more involved, and instead of using just two key pairs, we actually need to use five(!), let’s look at them in turn.
The couple of key pairs are the one that we are familiar with, the server’s and the client’s. However, we are going to tag them with long term key pairs and show them as:
The problem with using those keys is that we have to assume that they will leak at some point. In fact, one of the threat model that TLS has is dealing with adversaries that can record all network communication between parties for arbitrary amount of time. Given that this is encrypted, and assuming that no one can deal break the encryption algorithm itself, we need to worry about key leakage after the fact. In other words, if we use a pair of key to communicate securely, but the communication was recorded, it is enough to capture a single key (from either server or client) to be able to decrypt past conversations. That is not ideal. In order to handle that, we introduce the notion of session keys. Those are keys that are in no way or shape are related to the long term keys. They are generated using secured cryptographic method and are used of a single connection. Once that connection is closed, they are discarded.
The idea is that even if you manage to lay your hands on the long term keys, the session keys, which are actually used to encrypt the communication, are long gone (and were never kept) anyway. For more details, the Wiki article on Perfect Forward Secrecy does a great job explaining the details.
I’m counting four pairs of keys so far, but I mentioned that we’ll use five in this protocol, what is that about? I’m going to introduce the idea of a middlebox key. A middlebox is a server that the client will connect to, the client wants to be able to provide just enough information to the middlebox to route the request to the right location, but without providing any external observer with any idea about what is the final destination of the client is. In essence, this is ESNI (Encrypted Server Name Indication). A key aspect of this is that the client does not trust the middlebox, and the only thing a malicious middlebox can do is to record what is the final destination of the connection. It cannot eavesdrop on the details or modify them in any way.
With all of that in place, and hopefully clear, let’s talk about the handshake that is required to make both sides verify that the other one is legit. The connection starts with a hello message, with the following details:
- Client –> Server
- Overall size: 108 bytes
- Algorithm – crypto_box (sodium) - Key exchange: X25519 Encryption: XSalsa20 stream cipher Authentication: Poly1305 MAC
Field # | Size | Content | Encrypted using |
1 | 4 | Version | Plain text |
2 | 32 | Client’s session public key | Plain text |
3 | 32 | Expected server public key | |
4 | 16 | MAC for field 3 | |
5 | 24 | Nonce for field 3 |
This requires some explanation. I know enough to know my limitation with cryptography. I’m going to lean on well known and tested library, libsodium for the actual cryptographic details and try to do as little as possible on my own. The hello message details contains just three actual fields , but the third field is encrypted. Modern encryption practices are meant to make it as hard as possible to misuse. That means that pretty much any encryption algorithm that you are likely to use will use Authenticated Encryption. This is to ensure that any modification to the cypher text will fail the decryption process, rather than give corrupted results.
To handle that scenario, we need to send a MAC (message authentication code), which you can see as field 4 on the message. The last field is a random value that will be used to ensure that when we encrypt the same data with the same keys, we'll not output the same value. That can have catastrophic impact on the safety of your system. You can think of the last two fields as part of the encryption envelope we need to properly encrypt the data.
As the first field, we have the protocol version, which allows to change the protocol over time. Note that this is the only choice that we have, there is no negotiation or choice involved here at all. If we want to change the cryptographic details of the protocol, we’ll need to create a new version for that. This is in contrast with how TLS works, where we have both clients and servers offering their supported options and having to pick which one to use. That ends up being complex, so it is simpler to tie it down. Wireguard works in a similar manner, for example.
You’ll notice that the client’s session public key is sent in the clear. That is fine, it is the public key, after all, and we ensure that each separate connection will generate a new key pair, there is nothing that can be gleaned from this data.
Now, let’s go back to the fields that are actually meaningful, the client’s session public key and the expected server public key. What is that about?
The client will first generate a key pair and send to the server the public portion of that key pair. Along with another keypair, we’ll be able to establish communication. However, what other key pair? In order to trust the remote server, we need to know its public key in advance. The administrator will be able to tell us that, of course, but requiring this is a PITA. We may want to implement TUFU (Trust Upon First Use), like SSH does, or we may want to tie ourselves to a particular key. In any event, at the protocol level, we cannot require that the public key for the server will be known before the first message, not if we want to apply it.
To solve this issue, we have to consider why we have this expected server public key in the message in the first place. This is there to provide the middlebox a secure manner to discover what server the client wants to connect to. How the client discover the public key of the middlebox is intentionally left blank here. You can use the same manner as ESNI and grab the public key from a DNS entry, for example. Regardless, a key aspect of this is that the expected server public key is meant to be advisory only. If we are able to successfully decrypt it, then we know what server public key the client is looking for. We can lookup in some table and route the connection directly, without being able to figure out anything else on the contents of any future traffic.
If we cannot successfully decrypt this, we can just ignore this and assume that the client is expecting any key (at any rate, the client itself will do its own validation down the line). In many cases, by the way, I expect that the middlebox and the end server will be one and the same, this middlebox feature is meant for some advanced scenarios, likely never to be relevant here.
The server will reply to the hello message with a challenge, here is how it looks like:
- Server –> Client
- Overall size: 168 bytes
- Algorithm – crypto_box (sodium)
Field # | Size | Content | Encrypted using |
1 | 32 | Server’s session public key | Plain text |
2 | 32 | Server’s long term public key | Client’s session public key + Server’s session secret key |
3 | 16 | MAC for field 2 | |
4 | 24 | Nonce for field 2 | |
5 | 32 | Client’s session public key | Client’s session public key + Server’s session long term key |
6 | 16 | MAC for field 6 | |
7 | 24 | Nonce for field 6 |
Here we are starting to see some more interesting details. The server is sending its session public key, to complete the key exchange between the client and server. As before, this is a transient value, generated on a per connection basis and has no relation to the actual long term key pair. There it nothing that you can figure out from the plain text public key, so we don’t mind sending it.
We send the long term key on field 2, on the other hand, encrypted. Why are we encrypting this? To prevent an outside observer from figuring out what server we are using (if we are using a middlebox).
The idea is that once we exchange the public keys for the session key pairs for both sides, we’ll encrypt the long term public key using this and let the client know. We’ll also encrypt the client’s session’s public key. This time, however, we’ll encrypt using the server long term key as well as the client’s session public key. The idea is that the server is encrypting a value that the client chose (the client’s session public key, which is also transient) and encrypt that with Authenticated Encryption. If the client can successfully decrypt that, we know that the session’s public key was encrypted using the long term secret key. In this manner, we prove that we own the long term key pair.
The client, upon receiving this message, will do the following:
- Decrypt field 2 – verifying their authenticity using the MAC in field 3.
- Decrypt field 5 – using the public key we got from the server.
Assuming that those two decryption procedures were successful, we can compare the plain text value for field 3 and field 6. If they are the same, we know that the server has the long term key pair (both public and secret). If it didn’t have the secret portion of the key, the server would be unable to properly encrypt the value so we’ll be able to read it. The fact that it does this encryption with the client’s session key (which differs on each call) means that you can’t do reply / caching or any such tricks.
The last thing that the client needs to do now is to figure out if the long term public key they got from the server is a match to the public key that they need. That can be part of a TUFU system, or we can reject the connection if the public key does not match.
- Client –> Server
- Overall size: 136 bytes
- Algorithm – crypto_box (sodium)
Field # | Size | Content | Encrypted using |
1 | 32 | Client’s long term public key | Server’s session public key + client’s session secret key |
2 | 16 | MAC for field 1 | |
3 | 24 | Nonce for field 1 | |
4 | 24 | Server’s session public key | Server’s session public key + client’s long term secret key |
2 | 16 | MAC for field 4 | |
3 | 24 | Nonce for field 4 |
At this point, the same pattern applies. The server will decrypt the client’s long term public key from field 1 using the session keys. It will then use its own secret session key in conjunction with the client’s long term public key to decrypt the value in field 4. The act of successfully decrypting the value in field 4 serves as a proof that the client indeed holds the secret key for the long term value. At the end of processing this message, the server know who is the client and verified that they posses the relevant key pair.
From there, we are left with the simple act of doing key exchange using the session keys. Now both client and server know who the other side is and have agreed on the cryptographic keys that they will use to communicate with one another.
I mentioned that I’m not an expert cryptographer, right? The design of this protocol isn’t innovative in any way. It takes heavily from the design of TLS 1.3, the most successful cryptographic protocol on the planet, which was design by people who actually know their craft here. What I’m mostly doing here is making assumptions, because I can:
- I don’t need PKI infrastructure, the communicating nodes all have a separate channel to establish trust by distributing the public keys.
- There is no need for negotiation between the client & server, we fixed all the parameters at the protocol version.
- The messages exchanged are all pretty small, that means that we can put them all on a single packet.
Most importantly of all, the entire system relies on local state, there is absolutely nothing here that relies or uses any external party. That is kind of amazing, when you think about it, and obviously one of the major reasons why I’m doing this exercise.
The tables and description above make it see exactly what is going on, even if they give all the details. I find that code make sense of code samples. Here is some sample code, showing how the server works:
The server will read the first message and then send a reply, the client will respond to the challenge, and the server will read the data and validate it. This is meant to be pseudo code, mind you, not real code. Just to get you to figure out how this interacts. Here is the client side of things:
I hope that the code sample would make it clearer what is going on. I haven’t mentioned the key generation for the follow up communication. All I talked about here is the ability to setup a key exchange after validating the keys from both sides. At the same time, the long term keys aren’t used for anything except authentication, so we get perfect forward secrecy. The idea with the middlebox key also allows us to natively support more complex routing and topologies, which is nice (but also probably YAGNI for this exercise).
I would love to get your feedback and thoughts about this idea.
More posts in "A PKI-less secure communication channel" series:
- (12 Oct 2021) Using TLS
- (08 Oct 2021) Error handling at the protocol level
- (07 Oct 2021) Implementing the record stream
- (06 Oct 2021) Coding the handshake
- (04 Oct 2021) The record layer
- (01 Oct 2021) design
Comments
You should check out double ratchet: https://signal.org/docs/specifications/doubleratchet/ Provides everything you need and - as building upon a messaging pattern - can be used for http / rest / rpc like communications with strong security. in general it's even better than "just" TLS.
you aren't using symmetric encryption for communication after handshake is done?
I am a bit at a loss why you would want to do this? Running and operating SSL is a solved problem nowadays. If you have secure back-channels why not just use self-signed and manually trusted SSL certificates?
Inventing your own protocol is almost always a bad idea. All this might be super exciting and cool - but good luck getting this certified and audited for use in security conscious enterprise settings :/. Let alone all the millions of hours of hardening that our normal crypto protocols have already received and still we end up with vulnerabilities and exploits every few years :/
Really - don't do it. There are ways to make normal PKI work that are a lot less involved and error prone than trying to do this.
Rafal,
All of this is just the handshake, not the actual encryption.
Daniel,
From the post:
I do that all the time, it is fun, educational and gives me a lot of insight into exactly what is going on under the hood.
The exercise aside, there are also the following reasons why I think something like this would be better to use than TLS.
Those are just the things off the top of my head, by the way.
At the same time, TLS is _everywhere_, pretty much every admin has at least passing familiarity with it and how to use it. The system has been battle tested and any issues with it are not my fault :-)
I'm not intending to replace TLS, but I think that building on top of well known and stable primitives, following the guidelines that were established by actual cryptographers isn't that big a deal. I'm not inventing my own crypto. I'm doing secure auth using a very similar manner to how TLS does that, just ripping aside the things that I don't need. In the same sense, the actual stream of data will be encrypted using the same algorithm and approach that you'll see in TLS.
And just to be clear, actually deploying something like that would absolutely require a review by actual experts in the field, no mistake on that.
Daniel,
That is relevant if you need to exchange _messages_, but in this case, we are talking about a streaming context. There is no need to change the keys on each message.
Hi Ayende,
i'm aware that this is for a streaming context. Despite the fact that - even if you're using double ratched - its your choice when to rotate a key (like a "new" message). The concept only adresses a mechanism on secure key exchange and derivation with strong security on future and past security. Technically you don't need to rotate at any time, so you can stay symmetric with the same key all the time.
If you alter the key derivation of double ratched a little bit you can get to a point where you can even establish a PKI based scenario and opt-in to a verification of the remote peers identity using a cert chain validation. By default it's not part of the protocol as X25519 keys without any chains are used.
Regarding the fact that TLS at HTTP layer is just the surrounding stream on the actual application protocol, you can indeed treat HTTP as messaging based protocol (also like any other RPC protocol, like gRPC). One party sends a request, another sends a response which are transmitted on a possibly shared channel (starting with http/2). The possibility that a http request or response might span multiple gigabytes of data doesn't change the fact it's still some sort of message.
In case of ravendb a message would be easiest seen as a single request and response (just like any other HTTP request today).
The ciphers can be used across multiple parallel tcp streams, so there's little to no overhead when reusing these (depending on how multichannel / session lifetime should be designed). This means you can resume a session with zero roundtrips for cipher setup on a second, third, etc. channel.
Hi Ayende,
The advice that Daniel Hölbling-Inzko gave is solid. TLS works. The problem you want to solve is establishing trust with self-signed certs, not creating a new protocol.
Jacques,
Again, this is an exercise in design, which was quite interesting and fun to do.
The problem with TLS is that it does too much. You can probably generate a bare bones self signed certificate that wouldn't cause any external activity, but that is not certain.
The problem with actually using TLS, however, is that trust is now at the system level settings, which is hard to manage. You can override that on a case by case basis, usually, but that isn't so fun to do.
You'll also need to handle all the usual caveats of TLS (algorithm negotiation, etc). The good part, however, is that you can usually assume that there is already an implementation for your needs everywhere.
This configuration resembles the Wireguard VPN where server has it private key and each client public key and clients have the server public key as its configuration. See: https://www.wireguard.com/#simple-network-interface
Carlos,
That isn't an accident, yes. I am very much a fan of Jaon's work
Comment preview