When it comes to security, the typical question isn’t whatever they are after you but how much. I love this paper on threat modeling, and I highly recommend it. But sometimes, you have information that you just don’t want to have. In other words, you want to store information inside of the database, but without the database or application being able to read said information without a key supplied by the user.
For example, let’s assume that we need to store the credit card information of a customer. We need to persist this information, but we don’t want to know it. We need something more from the user in order to actually use it.
The point of this post isn’t actually to talk about how to store credit card information in your database, instead it is meant to walk you through an approach in which you can keep data about a user that you can only access in the context of the user.
In terms of privacy, that is a very important factor. You don’t need to worry about a rogue DBA trawling through sensitive records or be concerned about a data leak because of an unpatched hole in your defenses. Furthermore, if you are carrying sensitive information that a third party may be interested in, you cannot be compelled to give them access to that information. You literally can’t, unless the user steps up and provide the keys.
Note that this is distinctly different (and weaker) than end to end encryption. With end to end encryption the server only ever sees encrypted blobs. With this approach, the server is able to access the encryption key with the assistance of the user. That means that if you don’t trust the server, you shouldn’t be using this method. Going back to the proper threat model, this is a good way to ensure privacy for your users if you need to worry about getting a warrant for their data. Basically, consider this as one of the problems this is meant to solve.
When the user logs in, they have to use a password. Given that we aren’t storing the password, that means that we don’t know it. This means that we can use that as the user’s personal key for encrypting and decrypting the user’s information. I’m going to use Sodium as the underlying cryptographic library because that is well known, respected and audited. I’m using the Sodium.Core NuGet package for my code samples. Our task is to be able to store sensitive data about the user (in this case, the credit card information, but can really be anything) without being able to access it unless the user is there.
A user is identified using a password, and we use Argon2id to create the password hash. This ensures that you can’t brute force the password. So far, this is fairly standard. However, instead of asking Argon2 to give us a 16 bytes key, we are going to ask it to give us a 48 bytes key. There isn’t really any additional security in getting more bytes. Indeed, we are going to consider only the first 16 bytes that were returned to us as important for verifying the password. We are going to use the remaining 32 bytes as a secret key. Let’s see how this looks like in code:
Here is what we are doing here. We are getting 48 bytes from Argon2id using the password. We keep the first 16 bytes to authenticate the user next time. Then we generate a random 256 bits key and encrypt that using the last part of the output of the Argon2id call. The function returns the generated config and the encryption key. You can now encrypt data using this key as much as you want. But while we assume that the CryptoConfig is written to a persistent storage, we are not keeping the encryption key anywhere but memory. In fact, this code is pretty cavalier about its usage. You’ll typically store encryption keys in locked memory only, wipe them after use, etc. I’m skipping these steps here in order to get to the gist of things.
Once we forget about the encryption key, all the data we have about the user is effectively random noise. If we want to do something with it, we have to get the user to give us the password again. Here is what the other side looks like:
We authenticate using the first 16 bytes, then use the other 32 to decrypt the actual encryption key and return that. Without the user’s password, we are blocked from using their data, great!
You’ll also notice that the actual key we use is random. We encrypt it using the key derived from the user’s password but we are using a random key. Why is that? This is to enable us to change passwords. If the user want to change the password, they’ll need to provide the old password as well as the new. That allows us to decrypt the actual encryption key using the key from the old password and encrypt it again with the new one.
Conversely, resetting a user’s password will mean that you can no longer access the encrypted data. That is actually a feature. Leaving aside the issue of warrants for data seizure, consider the case that we use this system to encrypt credit card information. If the user reset their password, they will need to re-enter their credit card. That is great, because that means that even if you managed to reset the password (for example, by gaining access to their email), you don’t get access tot he sensitive information.
With this kind of system in place, there is one thing that you have to be aware of. Your code needs to (gracefully) handle the scenario of the data not being decryptable. So trying to get the credit card information and getting an error should be handled and not crash the payment processing system . It is a different mindset, because it may violate invariants in the system. Only users with a credit card may have a pro plan, but after a password reset, they “have” a credit card, in the sense that there is data there, but it isn’t useful data. And you can’t check, unless you had the user provide you with the password to get the encryption key.
It means that you need to pay more attention to the data model you have. I would suggest not trying to hide the fact that the data is encrypted behind a lazily decryption façade but deal with it explicitly.