RavenDB uses X509 certificates for many purposes. One of them is to enable authentication by using clients certificates. This create a highly secured authentication method with quite a lot to recommend it. But it does create a problem. Certificates, by their very nature, expire. Furthermore, certificates usually have relatively short expiration times. For example, Let’s Encrypt certificates expire in 3 months. We don’t have to use the same cert we use for server authentication for client authentication as well, but it does create a nice symmetry and simplify the job of the admin.
Except that every cert replacement ( 3 months, remember? ) the admin will now need to go to any of the systems that we talk to and update the list of allowed certificates whenever we update the Let’s Encrypt certificate. One of the reasons behind this 3 months deadline is to ensure that you’ll automate the process of cert replacement, so it is obvious that we need a way to automate the process of updating third parties about cert replacements.
Our current design goes like this:
- This design applies only to the nodes for which we authenticate using our own server certificate (thus excluding Pull Replication, for example).
- Keep track of all the 3rd parties RavenDB instances that we talk to.
- Whenever we have an updated certificate, contact each of those instances and let them know about the cert change. This is done using a request that authenticate using the old certificate and providing the new one.
- The actual certificate replacement is delayed until all of those endpoints have been reached or until the expiration of the current certificate is near.
Things to consider:
- Certificate updates are written to the audit log. And you can always track the chain of updates backward.
- Obviously, a certificate can only register a replacement as long as it is active.
- The updated certificate will have the exact same permissions as the current certificate.
- A certificate can only ever replace itself with one other certificate. We allow to do that multiple times, but the newly updated cert will replace the previous updated cert.
- A certificate cannot replace a certificate that it updated if that certificate has updated certificate as well.
In other words, consider certificate A that is registered in a RavenDB instance:
- Cert A can ask the RavenDB instance to register updated certificate B, at which point users can connect to the RavenDB instance using either A or B. Until certificate A expires. This is to ensure that during the update process, we won’t see some nodes that we need to talk to using cert A and some nodes that we need to talk to using cert B.
- Cert A can ask the RavenDB instance to register updated certificate C, at which point, certificate B is removed and is no longer valid. This is done in case we failed to update the certificate and need to update with a different certificate.
- Cert C can then ask the RavenDB instance to register updated certificate D. At this point, certificate A become invalid and can no longer be used. Only certs C and D are now active.
More things to consider:
- Certain certificates, such as the ones exposing Pull Replication, are likely going to be used by many clients. I’m not sure if we should allow certificate replacement there. Given that we usually won’t use the server cert for authentication in Pull Replication, I don’t see that as a problem.
- The certificate update process will be running on only a single node in the cluster, to avoid concurrency issues.
- We’ll provide a way to the admin to purge all expired certificates (although, with one update every 3 months, I don’t expect there to be many).
- We are considering limiting this to non admin certificates only. So you will not be able to update a certificate if it has admin privileges in an automated manner. I’m not sure if this is a security feature or a feel good feature.
- We’ll likely provide administrator notification that this update has happened on the destination node, and that might be enough to allow updating of admin certificates.
Any feedback you have would be greatly appreciated.