Badly implementing encryptionPart IV–keyed hash function

time to read 3 min | 536 words

I’m trying to not get too deep into the theory of encryption. I’m happy to say that so far I was able to avoid any math whatsoever and hopefully this is an interesting series. I do have to touch on an important topic.

I’m using MD5 here for the purpose of generating a random bitstream to be used as a stream cypher. In the previous post, we looked into a key issue. If we don’t do things properly, we can easily get to the point where a single 16 bytes block that we guess can allow us to decrypt the entire message.

I “solved” that in the previous post by adding the key back again into the MD5 computation. That works, but it isn’t ideal. There are all sorts of subtle issues that you have to take into account (length extensions, for example) and probably other stuff that I’m not even aware of. There is the HMAC family of functions. That is a keyed hash function that has far stronger security properties. Wikipedia does a great job explaining it. Note that there is a cost, HMAC is more expensive than the underlying hash function it uses.

The best practical explanation, by the way, I found here. Our previous method of adding a key to the mix was to concatenate it in front of the message. The problem is that if md5(msgA) == md5(msgB), then md5(key || msgA) == md5(key || msgB). And that isn’t something that we want. I (personally) can’t think of a way to abuse that property to get something nasty going on with the way we use it in this encryption algorithm, but I’m very much not an expert. The HMAC model, on the other hand, would use: md5( key1 || md5(key2 || msgA)). And there is no way to get that to collide, even if the messages generate collisions for MD5.

Let’s see what we need to do to switch to HMAC-MD5 instead of MD5. Here is what the code now looks like:

There are a few things that are worth noting here. We changed the size of the key (since HMAC-MD5 uses 32 bytes, not 16 bytes). Again, I have no comment on the actual security of such a scheme, mostly because I wouldn’t know where to even begin doing this analysis.

We are also no longer using the previous block as the input to the next block. Instead, we use a counter mode, where we hash the nonce and an ever incrementing counter using the provided key. That gives us some additional safety from the previously seen issue where we could figure out what the rest of the key stream would be.

Of course, better than before doesn’t mean that this is actually good. There are several other problems that I (as a non cryptographer) still need to fix, and probably more that I’m not seeing.

Another aspect of using this sort of construction is the additional cost that is involved. The HMAC computation isn’t that much more expensive. Looking at some benchmarks, this is about 3% slower, which is quite reasonable.

Next, we are going to see how we can abuse the malleability of this encryption system for fun and nefarious people’s profit.

More posts in "Badly implementing encryption" series:

  1. (24 Feb 2022) Part X-Additional data
  2. (23 Feb 2022) Part IX–SIV
  3. (22 Feb 2022) Part VIII–timings attacks and side channels
  4. (21 Feb 2022) Part VII–implementing authenticated encryption
  5. (18 Feb 2022) Part VI–malleable encryption
  6. (17 Feb 2022) Part V–nonce reuse
  7. (16 Feb 2022) Part IV–keyed hash function
  8. (15 Feb 2022) Part III–breaking your encryption apart
  9. (14 Feb 2022) Part II–breaking the code
  10. (11 Feb 2022) Part I