Using client side encrypted fields in RavenDB
Sometimes, you need to hold on to data that you really don’t want to have access to. A great example may be that your user will provide you with their theme color preference. I’m sure you can appreciate the sensitivity of preferring a light or dark theme when working in the IDE.
At any rate, you find yourself in an interesting situation, you have a piece of data that you don’t want to know about. In other words, the threat model we have to work with is that we protect the data from a malicious administrator. This may seem to be a far-fetched scenario, but just today I was informed that my email was inside the 200M users leak from Twitter. Having an additional safeguard ensures that even if someone manages to lay their hands on your database, there is little that they can do about it.
RavenDB supports Transparent Data Encryption. In other words, the data is encrypted on disk and will only be decrypted while there is an active transaction looking at it. That is a server-side operation, there is a single key (not actually true, but close enough) that is used for all the data in the database. For this scenario, that is not good enough. We need to use a different key for each user. And even if we have all the data and the server’s encryption key, we should still not be able to read the sensitive data.
How can we make this happen? The idea is that we want to encrypt the data on the client, with the client’s own key, that is never sent to the server. What the server is seeing is an encrypted blob, basically. The question is, how can we make it work as easily as possible. Let’s look at the API that we use to get it working:
As you can see, we indicate that the value is encrypted using the Encrypted<T> wrapper. That class is a very simple wrapper, with all the magic actually happening in the assigned JSON converter. Before we’ll look into how that works, allow me to present you with the way this document looks like to RavenDB:
As you can see, we don’t actually store the data as is. Instead, we have an object that stores the encrypted data as well as the authentication tag. The above document was generated from the following code:
The JSON document holds the data we have, but without knowing the key, we do not know what the encrypted value is. The actual encrypted value is composed of three separate (and quite important) fields:
- Tag – the authentication tag that ensures that the value we decrypt is indeed the value that was encrypted
- Data – this is the actual encrypted value. Note that the size of the value is far larger than the value we actually encrypted. We do that to avoid leaking the size of the value.
- Nonce – a value that ensures that even if we encrypt similar values, we won’t end up with an identical output. I talk about this at length here.
Just storing the data in the database is usually not sufficient, mind. Sure, with what we have right now, we can store and read the data back, safe from data leaks on the server side. However, we have another issue, we want to be able to query the data.
In other words, the question is how, without telling the database server what the value is, can we query for matching values? The answer is that we need to provide a value during the query that would match the value we stored. That is typically fairly obvious & easy. But it runs into a problem when we have cryptography. Since we are using a Nonce, it means that each time we’ll encrypt the value, we’ll get a different encrypted value. How can we then query for the value?
The answer to that is something called DAE (deterministic authenticated encryption). Here is how it works: instead of generating the nonce using random values and ensuring that it is never repeated, we’ll go the other way. We’ll generate the nonce in a deterministic manner. By effectively taking a hash of the data we’ll encrypt. That ensures that we’ll get a unique nonce for each unique value we’ll encrypt. And it means that for the same value, we’ll get the same encrypted output, which means that we can then query for that.
Here is an example of how we can use this from RavenDB:
And with that explanation out of the way, let’s see the wiring we need to make this happen. Here is the JsonConverter implementation that makes this possible:
There is quite a lot that is going on here. This is a JsonConverter, which translates the in-memory data to what is actually sent over the wire for RavenDB.
On read, there isn’t much that is going on there, we pull the individual fields from the JSON and pass them to the DeterministicEncryption class, which we’ll look at shortly. We get the plain text back, read the JSON we previously stored, and translate that back into a .NET object.
On write, things are slightly more interesting. We convert the object to a string, and then we write that to an in memory stream. We ensure that the stream is always aligned on 32 bytes boundary (to avoid leaking the size). Without that step, you could distinguish between “Dark” and “Light” theme users simply based on the length of the encrypted value. We pass the data to the DeterministicEncryption class for actual encryption and build the encrypted value. I choose to use a complex object, but we could also put this into a single field just as easily.
With that in place, the last thing to understand is how we perform the actual encryption:
There is actually very little code here, which is pretty great. The first thing to note is that we have GetCurrentKey, which is a delegate you need to provide to find the current key. You can have a global key for the entire application or for the current user, etc. This key isn’t the actual encryption key, however. In the DerivedKeys function, we use the Blake2b algorithm to turn that 32 bytes key into a 64 bytes value. We then split this into two 32 bits keys. The idea is that we separate the domains, we have one key that is used for computing the SIV and another for the actual encryption.
We use HMAC-Blake2b using the SIV key to compute the nonce of the value in a deterministic manner and then perform the actual encryption. For decryption, we go in reverse, but we don’t need to derive a SIV, obviously.
With this in place, we have about 100 lines of code that add the ability to store client-side encrypted values and query them. Pretty neat, even if I say so myself.
Note that we can store the encrypted value inside of RavenDB, which the database have no way of piercing, and retrieve those values back as well as query them for equality. Other querying capabilities, such as range or prefix scans are far more complex and tend to come with security implications that weaken the level guarantees you can provide.
Doesn't it defeat the whole point of using a nonce?
I was wondering the same thing as Thomas - doesn't having a calculable nonce increase susceptibility to dictionary attacks, especially if you are using a global, rather than per user, key? In the global key case, any two users encrypting the same value would have both the same nonce and encrypted value. This would be eliminated in the per user key case, unless one user encrypted the same value multiple times, which doesn't seem very likely.
Not necessarily. If the attacker only knows that two values are the same (i.e. same nonce and hash), but doesn't know what that value is, then there's no additional information gained about the key that encrypted the value; this is because all you have is the information that the same value was encrypted twice.
Where the risk comes is if the same nonce is used for different values, because two encrypted values that both contain the same plaintext (the nonce) could reveal information about the key. That's less of a risk with modern encryption algorithms, but still a best practice to avoid.
In this case, I don't think there's any security risk. The only increased risk here is potentially some PII in comparing two users against each other and being able to group them, even if the value itself is unknown.
No, since the nonce for the same value is the same, the only thing that you leak is the equality of the values. You don't leak anything else.
In other words, given two values encrypted using the same key, you'll get the same value. But you can't do anything else with that.
Given two identical values encrypted with different keys they (and the nonce), will be different.
You are protected against the catastrophe of nonce reuse.
Note that this is pretty much mandatory if you want to be able to query on the data.
Chris, If you want to query the data, you have to deal with this issue, I'm afraid.Case in point, you consider the "theme selection" to be sensitive, and encrypt that using a global key.You then need to query "all users with Dark theme". At that point, you need to be able to do that. Without the key, you'll be able to find all the users with the same theme, but not know what it is. That assumes that you'll use the same key globally, of course, and that the sensitive information is repeated.I think that both of those are false, by the way. Since most encrypted information is unique (SSN, Credit Card, etc). On the other hand, IP may be encrypted, and that would allow you correlate requests (but not know which is which). That is just something that you have to do to ensure that you can handle queries on the the data. Alternatively, you can just not use this and generate a random nonce, instead. You'll lose querying capabilities, but be more secured.
Stuart, FWIW, nonce reuse is an absolute catastrophe for modern encryption systems.Both AES-GCM and Chacha-Poly will die horribly in this case. The scheme I have in this post, however, is actually AES-GCM-SIV , basically. So it isn't something that I just came up with, and it is shown to be resilient to the problem, with the exception of being able to detect duplicate items.That is a desirable property on our end, of course.
Interesting, I knew that nonce reuse on different data could be leveraged, but I didn't realize it was that bad. Good to know!
And further to the point of nonce reuse on the same data, this is how MSSQL does client-encrypted searchable columns as well, I believe. Though MS is very clear that it is only equality searchable, not sortable/comparable in anyway, for which should be obvious reasons.
Yes, I wrote about it here, including some examples: https://ayende.com/blog/196481-A/badly-implementing-encryption-part-v-nonce-reuse
It shows (very badly) how you can crack messages that have nonce reuse.
As for MSSQL - yes, that is probably the only way. There is a notion of order preserving encryption, but that has its own set of problems. There is an interesting source for that here: https://eprint.iacr.org/2016/786.pdf
Basically, it is a far weaker form and subject o plenty of abuse potential