Don’t assume the result of read()

time to read 2 min | 294 words

I read this post and it took me very little time to spot a pretty nasty bug. Here is the relevant section:

The bug is on line #4. The code assumes that a call to read() will return less than the requested number of bytes only at the end of the file. The problem with that approach is that this is explicitly documented to not work this way:

It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet. This may happen for example because fewer bytes are actually available right now (e. g. being close to end-of-file) or because read() was interrupted by a signal.

This is a super common error in many cases. And in the vast majority of the cases, that would work. Except when it wouldn’t.

The underlying implementation of File::read() will call read() or ReadFile(). ReadFile() (Windows) is documented to read as much as you requested, unless you hit the end of file. The read() call, on Unix, is documented to allow returning less than requested:

It is not an error if this number is smaller than the number of bytes requested

Aside from signals, the file system is free to do a partial read if it has some of the data in memory and some not. I’m not sure if this is implemented in this manner, but it is allowed to do so. And the results for the code above in this case are absolutely catastrophic (decryption will fail, encryption will emit partial information with no error, etc).

I’m writing this blog post because reading the code made the error jump at me. Was bitten by this assumption too many times.