Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 209 words

imageThis issue in the RavenDB Security Report is pretty simple, when we generate a certificate, we need to generate a certificate serial number. We were using a random number that is 64 bits in length, but that is too small. The problem is the birthday attack. For a 64 bits number, you only need about 5 billion attempts to generate a collision. In modern cryptography, that is actually a very low security threshold.

So we fixed it and used a random value that is 20 bytes in length. Or so we thought. This single issue is worth the trouble of publicly discussing the security report. As it turned out, I didn’t read the API docs properly and used this construction:

new BigInteger(20, random);

Where the random is a cryptographically secured random number generator. The problem here is that this BigInteger constructor uses bits length, not bytes length. And that resulted in a security “fix” that actually much worse than the previous situation (you only need a bit over a thousand tries to generate a collision). This has already been fixed, obviously, but I’m very happy that it was caught.

time to read 3 min | 548 words

imageThe RavenDB Security Report most significant finding is something that cannot be fixed. Let me try to explain the core of this issue.

We want RavenDB to be secured, and we have chosen to use the well known (and trusted) TLS infrastructure. This means that we can use HTTPS, client certificate authentication and TLS 1.2. Basically, this means that we have a very high degree security and we use a common (and trusted) methods for both trust and encryption on the wire.  That does leave us with the problem of where to get the certificates from. Browsers has been tightening security for a while now, and the kind of alerts you get for self signed certificates are too scary to show by default.

So we need a solution that will be trusted. One option is to generate and install a root certificate when installing RavenDB. I don’t really like this option, to start with, installing a root certificate seems like an invasive action, even if it was generated locally. But this doesn’t solve the problem of accessing the server remotely. The root certificate will be installed on the server, not the client. So that isn’t a good option for us.

Enter Let’s Encrypt and the ability to generate certificates for free. That is a perfect solution for the problem. It is possible to generate them during installation, it is trusted by all major browsers and voila, we are there. Except there is still one issue in place. In order to get the certificate, we need to prove to Let’s Encrypt that we own the domain. But we can’t expect every user to configure DNS or setup routing properly during installation. So instead of making the user do the work, the automatic Let’s Encrypt installation is going to do that using a domain that RavenDB controls (ravendb.community, development.run, ravendb.run, etc). As part of the installation, the local RavenDB instance will talk to our cloud API to complete the Let’s Encrypt challenge. Each user gets their own subdomain under one of the root domains we use and the certificate is being generate locally (the cloud API is involved only for setting up the DNS entries).

This is perfect, because it means that you can very easily get a secured cluster (with URLs such as https://a.oren.development.run) which will just work.

However, from the point of view of the customer, there is an issue. The customer doesn’t own these domains, they are owned by Hibernating Rhinos. This means that technically,  we can issue additional certificates for the cluster domain and even update the DNS records to point to another server. This is something that we will never do, but it is a concern that should be raised during security reviews. For production usage, we expect operators to use their own certificates and domains to ensure that they have full control of their environment.

This is the only issue in the security review that we couldn’t fix and had to document as a warning to users, because it is too convenient a feature and the expected usage scenario (development and quick setup mode) are not likely to concern themselves with the full blown process of defining DNS and certificates.

time to read 1 min | 151 words

imageThe RavenDB Security Report called out the fact that we were using 2048 bits RSA keys when we were generating certificates. RavenDB generates certificates during automatic setup and when you want to generate client certificates directly from RavenDB.

Now, 2048 bits RSA has no known attacks, it seems that there wouldn’t be any shock and awe at the cryptographic community if it would be broken at sometimes in the future.

Because of that, the general recommendation is to use at least 3072 bits, but I don’t like that number, so RavenDB is now using 4096 bits RSA keys when it needs to generate a certificate. This significantly increases the certificate generation time (to the point where it is humanly observable!), but that is a very rare operation, so we don’t really care.

time to read 3 min | 426 words

imageThe RavenDB security report pointed out that we weren’t consistent in our usage of the Master Encryption Key. As a result, we changed things in a few locations, and we ended up never using the Master Encryption Key to encrypt anything in RavenDB.

If you aren’t familiar with encryption, that might raise a few eyebrows. If we aren’t using an encryption key to encrypt, what are we using? And what is the Master Encryption Key (and with Capitals, too) all about?

This is all part of the notion of defense in depth. A database has the Master Encryption Key. This is the key that open all the gates, but we never actually use this key to encrypt anything. Instead, we use it to generate keys. This is what the KDF (Key Derivation Function) comes into play. We start from the assumption that we have an attacker that was able to get us into a Bad State. For example, maybe we had nonce reuse (even though we already eliminated that), or maybe they have a team of Hollywood cryptographers that can crack encryption in under 30 seconds (if they have a gun to their head).

Regardless of the actual reason, we assume that an attacker has found a way to get the encryption key from the data on disk. Well, that wouldn’t really help them too much. Because that encryption key they got isn’t the key to the entire kingdom, it is the key for a very specific cupboard inside a specific room into a specific house in that kingdom. The idea is that whenever we need to encrypt a particular piece of data, we’ll use:

pageKey = KDF(MasterEncryptionKey, “Pages”, PageNumber);

And then we’ll use the pageKey to actually encrypt the page itself. Even if an attacker somehow managed to crack the encryption key on the page, all that gave them is the page (typically 8KB). They don’t get full access.

In similar vein, we also use the notion of different domains (“Pages, “Transactions”, “Indexes”, etc) to generate different keys for the same numeric value. This will generate a different key to encrypt any one of these values. So even if we have to encrypt Page 55 and Transaction 55, they would use a different derived key.

This is not needed assuming all else is well, but we don’t assume that, we actually assume Bad Stuff, and try to get ahead of the game so even then, we’re still safe.

time to read 3 min | 506 words

imageThe issue of authentication was brought up twice in the RavenDB security report. But what does this means?

Usually when talking about authentication we think about how we authenticate a user, but in this case, we refer to authenticating the encryption itself. You might consider that this is something that a cryptographer might need to do to prove a new algorithm, but it actually refers to something quite different.

Consider the following encrypt cookie: {"Qdph":"Ruhq","Dgplq":"Q"}

This was encrypted using Caesar’s cypher, with the secret key 3. Because it is encrypted, no one can figure out what is written inside it (let’s assume that this is the case and this is actually a high security methods, showing how things actually works with bits is too cumbersome).

The problem is that we handled an opaque block to the user (who is not to be trusted) and we will get it back at some later point in time. This is great, except for the part where the user might modify the data. Now, sure, they don’t know what the encryption key is, but let’s assume that they have good idea about the structure of the data, which something like:

{“Name”: <user name>, “Admin”: <N / Y> }

Given this knowledge, I can start mutating the end of the encrypted buffer. Because the decryption of the data is a pure transformation function, it doesn’t matter to it that the data has changed, and it will “decrypt” it just fine.

Now, in many cases that would decrypt to something totally wrong. Changing the encrypted value to be: {"Qdph":"Ruhq","Dgplq":"R"} will give us a decrypted value of “Admin”: “O”, which is obviously not valid and will cause an error. But all I have to do is keep trying until I get to the point where I send a modified encrypted value where decrypting “Admin”: “Y”.

This is because in many cases, developers assume that if the value was properly decrypted and has the proper format it is known to be valid. This is not the case and there have been many real world attacks on such systems.

The solution to that is to add, as part of the encryption algorithm itself, a part where we verify a signature on the data. This signature is also signed with the secret key, so the idea is that if the data was modified, if you don’t have the secret key, you’ll not be able to fix the signature. The decryption process will fail. In other words, we authenticated that the value was indeed encrypted using the secret key, and wasn’t modified by a 3rd party somewhere along the way.

There has been a case where we wrote to a temporary file without also doing authenticated encryption and a case where we validated a hash manually while also using authenticated encryption. Unfortunately, they did not balance each other out, so we had to fix it. Luckily, it was a pretty easy fix.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}