Challenges: Spot the bug in the stream–answer

architecture (615) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1087) rss
raven (1456) rss
ravendb.net (540) rss
reviews (184) rss

2025
- July (6)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

Jan 03 2020

ChallengesSpot the bug in the stream–answer

time to read 2 min | 252 words

In my previous post, I asked you to find the bug in the following code:

This code looks okay, at a glance, but it turns out that this is a really nasty data corruption bug waiting to happen. Here is what the problematic usage looks like:

Do you see the error now?

If the operation will time out, an exception will be raised, but the underlying operation isn’t over. We are using a shared pool, so the buffer we use may be handed over to someone else. At this point, we do something with the buffer, but the pending I/O operation will read data into this buffer, meaning that this is probably going to be garbage in it when we actually use it.

To actually happen, you need to have a timeout operation, reuse of the buffer and the I/O operation completing at just the wrong time. So a sequence of highly unlikely events that would assuredly happen within an hour of pushing something like that to production. For fun, this will reliably happen the moment you have some network issues. So imagine that you have a slow node, which then cause memory corruption, which end up being a visible bug (instead of maybe aborted request) very rarely, and with no indication on how this happened.

How do you fix this? Like this:

This will use a cancellation token, which will cause the operation to be aborted at the stream level, meaning that we can safely reuse values that we passed the underlying stream.

Tweet Share Share 8 comments

Tags:

challanges

Comments

03 Jan 2020
15:41 PM

peter

I assume this isn't a theoretical question, but an actual issue you encountered... within an hour of being used

05 Jan 2020
08:08 AM

Oren Eini

Peter,

We run into this in code review.

05 Jan 2020
11:11 AM

Alois Kraus

I think this still has a race condition. At least the default implementation of Stream will only check before it calls BeginRead if the operation has been cancelled. Later cancel operations via the linked CancellatonTokenSource will not cause any side effect to the already running read operation e.g. a slow network read.

public virtual System.Threading.Tasks.Task<int> ReadAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken)
{
    if (!cancellationToken.IsCancellationRequested)
    {
        return this.BeginEndReadAsync(buffer, offset, count);
    }
    return System.Threading.Tasks.Task.FromCanceled<int>(cancellationToken);
}

The code above looks better but it will perform equally bad as the original version if the underlying stream was not overriden with another implementation. Cancelling is always asynchronous and you never know when you have reached a safe state except if you wait for the operation to complete and time out properly.

05 Jan 2020
11:22 AM

Oren Eini

Alois,

If there are calls that need to be cancelled, the underlying stream likely already handles them. For example, when using NetworkStream, you'll end up here:

https://github.com/dotnet/runtime/blob/4f9ae42d861fcb4be2fcd5d3d55d5f227d30e723/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs#L233 https://github.com/dotnet/runtime/blob/4f9ae42d861fcb4be2fcd5d3d55d5f227d30e723/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs#L796

If the underlying stream impl can't be cancelled, you can't safely abort the operation anyway, so nothing really can be done.

05 Jan 2020
12:49 PM

Alois Kraus

Yes NetworkStream seems to handle this properly. Thanks for the NetworkStream links. Async code is much harder to understand and riddled with race conditions. Personally I am still not convinced that the everything is async is the right thing to do. If you have chained thousands of async operations which work nicely in a good working system everything is fine but how do you handle in every layer a sudden slowdown because e.g. Network is slow or file operations take longer than before? In that case thousands of tasks pile up and will eventually hit the Threadpool Worker Thread limit leading to threadpool starvation and a sudden drop in throughput. If you have dependant tasks then you can even get a deadlock by awaiting tasks which are never scheduled to run because of the worker thread limit. That are hard to debug issues. You need to reverse engineer all of the task dependencies to get things working again which is no fun. Are there any libraries which make such things easier you would recommend?

06 Jan 2020
08:33 AM

Oren Eini

Alois,

The key here is that you don't try to handle them all at a granular level. That is why the CanecllationToken abstraction is so great.You get the ability to set policy at the highest level.

06 Jan 2020
22:11 PM

Alois Kraus

Then you must define when you want to cancel. Is the system just slow or did it deadlock? How would you decide to cancel running operations? Cancel with a timeout as the code above implies? On a slow machine timing is very different compared to a big server machine. In principle it is easy but in practice it is still a hard problem.

07 Jan 2020
08:34 AM

Oren Eini

Alois,

We aren't trying to provide a generic overall system. That is typically not viable. We know that the underlying stream is going to be either a file or network stream, that is it. And there are mechanisms to cancel that sort of I/O ops.

Deciding to cancel is typically easy:

User request
Timeout
Request aborted

Those sort of things, which give me something to react to and then cancel the operation and cleanup the rest of the system.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

ChallengesSpot the bug in the stream–answer

More posts in "Challenges" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Challenges" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication