Don’t assume the result of read()

Jan 20 2022

Don’t assume the result of read()

time to read 2 min | 294 words

I read this post and it took me very little time to spot a pretty nasty bug. Here is the relevant section:

The bug is on line #4. The code assumes that a call to read() will return less than the requested number of bytes only at the end of the file. The problem with that approach is that this is explicitly documented to not work this way:

It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet. This may happen for example because fewer bytes are actually available right now (e. g. being close to end-of-file) or because read() was interrupted by a signal.

This is a super common error in many cases. And in the vast majority of the cases, that would work. Except when it wouldn’t.

The underlying implementation of File::read() will call read() or ReadFile(). ReadFile() (Windows) is documented to read as much as you requested, unless you hit the end of file. The read() call, on Unix, is documented to allow returning less than requested:

It is not an error if this number is smaller than the number of bytes requested

Aside from signals, the file system is free to do a partial read if it has some of the data in memory and some not. I’m not sure if this is implemented in this manner, but it is allowed to do so. And the results for the code above in this case are absolutely catastrophic (decryption will fail, encryption will emit partial information with no error, etc).

I’m writing this blog post because reading the code made the error jump at me. Was bitten by this assumption too many times.

Tweet Share Share 3 comments

Tags:

Comments

20 Jan 2022
15:35 PM

Andy

Just to confirm that the code shown is the fixed version, yes? The linked article does not include the final else block. This block resolves the issue, right?

20 Jan 2022
18:41 PM

Eluvatar

Andy, no I don't think so. In the else block there's a break at the end, which means the code is assuming that if we didn't get all the bytes we requested that means we're done reading. Which isn't the case per the rest of the article.

20 Jan 2022
19:55 PM

Oren Eini

Andy,

The code in the post is the probelmatic version, it isn't fixed. The code assumes that read_count == BUFFER_LEN means EOF, which is wrong in some (rare) cases

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB