Badly implementing encryptionPart I

time to read 4 min | 770 words

One of the important adages in computer programming is: “Thou shall not write your own encryption”. That is because building your own encryption system is fraught will all sorts of really subtle details that are easy to get wrong, which will entirely eliminate any security that you have in place. For that matter, professional cryptographers are getting those details wrong often enough that WEP has been completely broken. Do not make use of any code that I’m showing in this series of posts to do anything except expand your horizons. This is incredibly insecure. If you want to use encryption, go and use something like Sodium, which is a well reviewed and thought-out encryption library.

With that said, let’s see what kind of encryption I can implement. My math skills are lousy, so there is going to be absolutely no math whatsoever in this series. That sounds like it makes it tough to implement encryption, but it isn’t so bad.

The absolute best encryption method, in terms of secrecy, is  the One Time Pad. There is a proof that it is unbreakable, which I linked but won’t discuss (nor do I claim to understand). The idea is simple, given a random set of bits, we can XOR that with our message, and be sure that there is no way to derive what the original message was. All possible values are equally likely. While One Time Pad is ideal, its use in practice has severe limitations:

The key problem with One Time Pad (pun intended) is how you distributed the pad material. Your security is only as valid as the pad transfer mechanism. And if you want to exchange complex messages, you need big pads. The key to modern encryption is that while we can’t mathematically prove that our method is as secured as One Time Pad, we can become confident enough in our methods to make practical use of that.

A One Time Pad is a shared source of truly random data. If we had a way to create a random stream of bits for a smaller shared state, that might do it, right? It wouldn’t be true randomness, of course, but if there is no way to predict what the output should be without knowing a shared secret, that should be enough, we hope.

How can we generate some random data from a shared secret? Well, I’m going to use the MD5 primitive as the key function here. That is a cryptographic hash function that you can feed it some buffer and it will compute a cryptographic hash of the buffer. The output of the hash function is pseudo random, so that might be a good building block to create a stream of random bits.

I’m using MD5 intentionally here. I assume that by now anyone that pays any attention to cryptography will know that MD5 is considered broken. I don’t want anyone to use this method.

In general, the concept that I’m going for is something like this:

In other words, I’m generating a stream of bits by continuously feeding the output of Md5 into itself, starting from the provided key.

Again, this is not secured, I’ll touch on exactly why in a future post.

A small optimization that I can make is that I don’t actually need to store all the full key bits-stream ahead of time, I can generate that on the fly, as we get data to encrypt. Here is what the encryption algorithm looks like, we generate the key stream and compute the XOR over the provided data:

Note that we don’t have a distinction here between encryption and decryption. We generate the key stream and XOR it with the caller’s data. If the data is plain text, it is encrypted, if it is encrypted text, it is decrypted.

Here is how this will work:

Key (base64)

+oupDG0cMVQr7hWqctWZEA==

Text attack at dawn!
Encrypted (hex) 4771EC89753EEB16A899E2F79DE9D6

And the code to make it happen:

We mutate the data in place, I should note. Given that we are not changing the size of the data, that is the simplest way to write the API.

This “encryption” method works, and we go from a reasonable looking plain text to something that certainly looks like it is encrypted. This is also a fairly horrible encryption scheme, of course, but I’ll touch on that in my next post.

More posts in "Badly implementing encryption" series:

  1. (24 Feb 2022) Part X-Additional data
  2. (23 Feb 2022) Part IX–SIV
  3. (22 Feb 2022) Part VIII–timings attacks and side channels
  4. (21 Feb 2022) Part VII–implementing authenticated encryption
  5. (18 Feb 2022) Part VI–malleable encryption
  6. (17 Feb 2022) Part V–nonce reuse
  7. (16 Feb 2022) Part IV–keyed hash function
  8. (15 Feb 2022) Part III–breaking your encryption apart
  9. (14 Feb 2022) Part II–breaking the code
  10. (11 Feb 2022) Part I