Don’t assume the result of read()
I read this post and it took me very little time to spot a pretty nasty bug. Here is the relevant section:
The bug is on line #4. The code assumes that a call to read() will return less than the requested number of bytes only at the end of the file. The problem with that approach is that this is explicitly documented to not work this way:
It is not an error if the returned value
n
is smaller than the buffer size, even when the reader is not at the end of the stream yet. This may happen for example because fewer bytes are actually available right now (e. g. being close to end-of-file) or because read() was interrupted by a signal.
This is a super common error in many cases. And in the vast majority of the cases, that would work. Except when it wouldn’t.
The underlying implementation of File::read() will call read() or ReadFile(). ReadFile() (Windows) is documented to read as much as you requested, unless you hit the end of file. The read() call, on Unix, is documented to allow returning less than requested:
It is not an error if this number is smaller than the number of bytes requested
Aside from signals, the file system is free to do a partial read if it has some of the data in memory and some not. I’m not sure if this is implemented in this manner, but it is allowed to do so. And the results for the code above in this case are absolutely catastrophic (decryption will fail, encryption will emit partial information with no error, etc).
I’m writing this blog post because reading the code made the error jump at me. Was bitten by this assumption too many times.
Comments
Just to confirm that the code shown is the fixed version, yes? The linked article does not include the final
else
block. This block resolves the issue, right?Andy, no I don't think so. In the else block there's a break at the end, which means the code is assuming that if we didn't get all the bytes we requested that means we're done reading. Which isn't the case per the rest of the article.
Andy,
The code in the post is the probelmatic version, it isn't fixed. The code assumes that
read_count == BUFFER_LEN
means EOF, which is wrong in some (rare) casesComment preview