About a month ago I wrote about a particular issue that we wanted to resolve. RavenDB is using X509 certificates for authentication. These are highly secured and are a good answer for our clients who need to host sensitive information or are working in highly regulated environments. However, certificates have a problem, they expire. In particular, if you are following common industry best practices, you’ll replace your certificates every 2 – 3 months. In fact, the common setup of using RavenDB with Let’s Encrypt will do just that. Certificates will be replaced on the fly by RavenDB without the need for an administrator involvement.
If you are running inside a single cluster, that isn’t something you need to worry about. RavenDB will coordinate the certificate update between the nodes in such a way that it won’t cause any disruption in service. However, it is pretty common in RavenDB to have multi cluster topologies. Either because you are deployed in a geo-distributed manner or because you are running using complex topologies (edge processing, multiple cooperating clusters, etc). That means that when cluster A replaces its certificate, we need to have a good story for cluster B still allowing it access, even though the certificate has changed.
I outlined our thinking in the previous post, and I got a really good hint, 13xforever suggested that we’ll look at HPKP (HTTP Public Key Pinning) as another way to handle this. HPKP is a security technology that was widely used, run into issues and was replaced (mostly by certificate transparency). With this hint, I started to investigate this further. Here is what I learned:
- A certificate is composed of some metadata, the public key and the signature of the issuer (skipping a lot of stuff here, obviously).
- Keys for certificates can be either RSA or ECDSA. In both cases, there is a 1:1 relationship between the public and private keys (in other words, each public key has exactly one private key).
Given these facts, we can rely on that to avoid the issues with certificate expiration, distributing new certificates, etc.
Whenever a cluster need a new certificate, it will use the same private/public key pair to generate the new certificate. Because the public key is the same (and we verify that the client has the private key during the handshake), even if the certificate itself changed, we can verify that the other side know the actual secret, the private key.
In other words, we slightly changed the trust model in RavenDB. From trusting a particular certificate, we trust that certificate’s private key. That is what grants access to RavenDB. In this way, when you update the certificate, as long as you keep the same key pair, we can still authenticate you.
This feature means that you can drastically reduce the amount of work that an admin has to do and lead you to a system that you setup once and just keeps working.
There are some fine details that we still had to deal with, of course. An admin may issue a certificate and want it to expire, so just having the user re-generate a new certificate with the private key isn’t really going to work for us. Instead, RavenDB validates that the chain of signatures on the certificate are the same. Actually, to be rather more exact, it verifies that the chain of signatures that signed the original (trusted by the admin) certificate and the new certificate that was just presented to us are signed by the same chain of public key hashes.
In this way, if the original issuer gave you a new certificate, it will just work. If you generate a new certificate on your own with the same key pair, we’ll reject that. The model that we have in mind here is trusting a driver’s license. If you have an updated driver’s license from the same source, that is considered just as valid as the original one on file. If the driver license is from Toys R Us, not so much.
Naturally, all such automatic certificate updates are going to be logged to the audit log, and we’ll show the updated certificates in the management studio as well.
As usual, we’ll welcome your feedback, the previous version of this post got us a great feature, after all.