Badly implementing encryptionPart II–breaking the code

time to read 3 min | 574 words

In my previous post, I implemented an “encryption system” using a stream cipher based on top of Md5. The idea is to start with a given key (128 bits in size) and feed it to Md5 recursively, leading to a random looking set of bits that we can generate for both encryption and decryption.

I mentioned, but it is worth repeating, this is not a secure scheme, you should not be using this for anything except (maybe) to learn.

As a reminder, here is what the result of this algorithm looks like when we run it:

Key Plain text Encrypted text
attack at dawn!
attack at dusk!

You might notice something pretty awful about the outputs that we see here. Take a look at the encrypted text, do you see the problem?

Here is what happens when I diff the output of those two strings:


Do you see the problem?

In my encryption algorithm, there is a huge hole. If I have the ability to ask you to encrypt a text that I control, I can compare that to other encrypted texts that I have. And in many cases, it is fairly easy to get the other side to encrypt a value that I control.

For that matter, using this approach, I can also simply iterate over the values one byte at a time, until it matches what I expect. In this manner, I can figure out the plain text of the messages very quickly.

This isn’t the sole attack that we have on the system, however. Even if I don’t have the ability to choose the plain text, I have other options available to me. I can inspect traffic and see that those two messages are very similar. If I correlate them to the times of attack, I can watch out for the next time that I get this message and know what it means, even if I wasn’t able to decrypt it.

To resolve this issue, we need to ensure that two messages, encrypted with the same key, will never look the same, even if they have repeated sequences. How do we do that? We introduce a random variable (which is public), that will ensure that we’ll generate a new message each time.

Here is the code:

In order to encrypt, I’m generating a nonce (number only used once) randomly. That number is 128 bits in length and will make sure that even if we encrypt the same value twice, we’ll always get a different encrypted string. Here are the results:

Key Plain text Nonce Encrypted text
+oupDG0cMVQr7hWqctWZEA== attack at dusk! 91AA2DB93DE35785D7FC1D5394F524C4 FC8638CA6BDEEDEE7F79D1BF3F72D6
+oupDG0cMVQr7hWqctWZEA== attack at dusk! 16C915439BA0EC862307D091293B0D7E 2080498822D299FDEE6B2C31C1AE87

It is quite apparent that even though we encrypted the same string, the output is entirely different from one another.

One thing to point out, I’m using 128 bits nonce here, but I didn’t bother to actually check what level of security adding a nonce of this size provides. It permutes the output of the encrypted text, but to what level, you’ll need an actual cryptographer to tell you. I’m simply using 128 bits as a nice value.

The text attack at dawn! on the other hand, is encrypted to BF5946EAC0C7BA5EDF611D440F7223 using the nonce: DE296C6916183A5B38480E971DDEF48C. This is radically different from the previous example, and does not provide us with a place to start analyzing the encrypted text for similarities between messages.

I’m happy, since we dealt with one (of the many) weaknesses in the encryption algorithm. In my next post, I’m going to look at a few more.

More posts in "Badly implementing encryption" series:

  1. (24 Feb 2022) Part X-Additional data
  2. (23 Feb 2022) Part IX–SIV
  3. (22 Feb 2022) Part VIII–timings attacks and side channels
  4. (21 Feb 2022) Part VII–implementing authenticated encryption
  5. (18 Feb 2022) Part VI–malleable encryption
  6. (17 Feb 2022) Part V–nonce reuse
  7. (16 Feb 2022) Part IV–keyed hash function
  8. (15 Feb 2022) Part III–breaking your encryption apart
  9. (14 Feb 2022) Part II–breaking the code
  10. (11 Feb 2022) Part I