ChallengeThe code review bug that gives me nightmares–the issue
In my previous post, I discussed a bug that brought up in code review, that bug made me go into near panic mode. Here is the issue:
In order to understand this bug, you have to take into account multiple concurrent threads at the same time. Look at the ComputeHashAndPutInCache() method, where we register an eviction callback for the item in the cache. When we evict the item, we return the buffer to the buffer pool.
We want to avoid allocating memory, so that is certainly something desirable, no? However, consider what happens if I have a thread in ComputeHash(), getting a value from the cache. Before I have a chance to look at the value, however, the cache will decide to evict it. At which point the eviction callback will run.
We’ll return the buffer back to the buffer pool, where it may be used again by something else. I am also using this buffer to do other things from the caller of the ComputeHash() call. This is a somewhat convoluted use after free issue, basically.
And I find is super scary bug because of its affects:
- Randomly, and rarely, some buffer will contain the wrong data, leading to wrong results (but hard to track it down).
- Trying to find such a bug after the fact, however, is nearly impossible.
- Most of the debugging techniques (repeating the operation for a particular value) will make it go away (the cache will keep the value and not evict it).
In short, this is a recipe for a really nasty debug session and an impossible to resolve bug. All from code that looks very much like an innocent bystander.
Now, I can obviously fix it by not using the array pool, but that may cause me to allocate more memory than I should. How would you approach fixing this issue?
More posts in "Challenge" series:
- (01 Jul 2024) Efficient snapshotable state
- (13 Oct 2023) Fastest node selection metastable error state–answer
- (12 Oct 2023) Fastest node selection metastable error state
- (19 Sep 2023) Spot the bug
- (04 Jan 2023) what does this code print?
- (14 Dec 2022) What does this code print?
- (01 Jul 2022) Find the stack smash bug… – answer
- (30 Jun 2022) Find the stack smash bug…
- (03 Jun 2022) Spot the data corruption
- (06 May 2022) Spot the optimization–solution
- (05 May 2022) Spot the optimization
- (06 Apr 2022) Why is this code broken?
- (16 Dec 2021) Find the slow down–answer
- (15 Dec 2021) Find the slow down
- (03 Nov 2021) The code review bug that gives me nightmares–The fix
- (02 Nov 2021) The code review bug that gives me nightmares–the issue
- (01 Nov 2021) The code review bug that gives me nightmares
- (16 Jun 2021) Detecting livelihood in a distributed cluster
- (21 Apr 2020) Generate matching shard id–answer
- (20 Apr 2020) Generate matching shard id
- (02 Jan 2020) Spot the bug in the stream
- (28 Sep 2018) The loop that leaks–Answer
- (27 Sep 2018) The loop that leaks
- (03 Apr 2018) The invisible concurrency bug–Answer
- (02 Apr 2018) The invisible concurrency bug
- (31 Jan 2018) Find the bug in the fix–answer
- (30 Jan 2018) Find the bug in the fix
- (19 Jan 2017) What does this code do?
- (26 Jul 2016) The race condition in the TCP stack, answer
- (25 Jul 2016) The race condition in the TCP stack
- (28 Apr 2015) What is the meaning of this change?
- (26 Sep 2013) Spot the bug
- (27 May 2013) The problem of locking down tasks…
- (17 Oct 2011) Minimum number of round trips
- (23 Aug 2011) Recent Comments with Future Posts
- (02 Aug 2011) Modifying execution approaches
- (29 Apr 2011) Stop the leaks
- (23 Dec 2010) This code should never hit production
- (17 Dec 2010) Your own ThreadLocal
- (03 Dec 2010) Querying relative information with RavenDB
- (29 Jun 2010) Find the bug
- (23 Jun 2010) Dynamically dynamic
- (28 Apr 2010) What killed the application?
- (19 Mar 2010) What does this code do?
- (04 Mar 2010) Robust enumeration over external code
- (16 Feb 2010) Premature optimization, and all of that…
- (12 Feb 2010) Efficient querying
- (10 Feb 2010) Find the resource leak
- (21 Oct 2009) Can you spot the bug?
- (18 Oct 2009) Why is this wrong?
- (17 Oct 2009) Write the check in comment
- (15 Sep 2009) NH Prof Exporting Reports
- (02 Sep 2009) The lazy loaded inheritance many to one association OR/M conundrum
- (01 Sep 2009) Why isn’t select broken?
- (06 Aug 2009) Find the bug fixes
- (26 May 2009) Find the bug
- (14 May 2009) multi threaded test failure
- (11 May 2009) The regex that doesn’t match
- (24 Mar 2009) probability based selection
- (13 Mar 2009) C# Rewriting
- (18 Feb 2009) write a self extracting program
- (04 Sep 2008) Don't stop with the first DSL abstraction
- (02 Aug 2008) What is the problem?
- (28 Jul 2008) What does this code do?
- (26 Jul 2008) Find the bug fix
- (05 Jul 2008) Find the deadlock
- (03 Jul 2008) Find the bug
- (02 Jul 2008) What is wrong with this code
- (05 Jun 2008) why did the tests fail?
- (27 May 2008) Striving for better syntax
- (13 Apr 2008) calling generics without the generic type
- (12 Apr 2008) The directory tree
- (24 Mar 2008) Find the version
- (21 Jan 2008) Strongly typing weakly typed code
- (28 Jun 2007) Windsor Null Object Dependency Facility
Comments
Change the public interface to accept a span, and the caller can then manage memory as it wishes. However, that still wouldn't solve the issue because you still can have the race condition between the moment the cache expires but the array was returned and is copied to the span.
In this case the option might be not to solve it and simply allocate an array yourself. ArrayPools are primarily meant to resolve the issue of short-term buffer allocation, for instance when processing streams of information. In this case, you're holding onto the allocated buffers, possibly exhausting the pool (depending on the underlying implementation).
You can of course still use the ArrayPool while you're compiling the hash.
One option would be to wrap it in a class with a destructor and cache that class instead. But you have to be careful not to expose that array directly..
You can wrap it in a class/struct and add a usage counter. But you'll have to decrease counter once you're done with the hash value. Memory is released not on eviction, but when counter is 0, which could happen from either eviction or after manual release. There's a risk of memory leak if you forget to release it, but it's much better than the "ghost" bug you described.
An array with 32 bytes has its own overhead (object header + length). Depending on the evictions and load the first approach could be to turn it into an object with 4 longs that are sufficient to carry the payload but do not have the length overhead. Then once they are no longer used, just discard them.
Add a counter similar to the slot in the Concurrent queue for each hash value, that represents both the state and the epoch. Then, with volatile access, assert before accessing and after that the value that was read was ok. The read must be a copy to preserve the order and value. This could be done with the array (alignments) but with the helper class described above as well. With the latter, the caching would be for storing values.
Unless this is for building a db with hundreds of files or an upload service, probably I'd not cache it at all though.
@dejan Who and when bumps up the counter? How would you save it from ABA problem? It would be interesting if you could elaborate.
I see several potential fixes for this issue depending on the required optimization:
1) Make a copy of the byte[] array before putting it into cache or returning the cached version to the caller.
Pros: Minor changes to existing code
Cons: Extra allocations on every call
2) Change the signature of the ComputeHash to accept callback function accepting ReadOnlySpan (and returns void). Caller can not modify the array and can not take a reference to it. This also requires pausing evictions while inside this method.
Pros: Prevent extra allocations
Cons: More difficult to use. Pausing evictions might not be easy to implement.
3) Do not use ArrayPool but just store byte[] values in cache. Return ComputeHash would return the same byte[] array but wrapped in ReadOnlyMemory struct.
Pros: Only one allocation per file.
Cons: You still have one allocation per file instead of single arraypool.
@Scooletz
For start, I'm not sure entire idea with wrapper class is a good one in this context because of extra allocations. If we go that way IMO ComputeHash should be renamed to suggest that we're borrowing something which has to be released.
We can then bump the counter in the ComputeHash. We'll also need some ReleaseHash, and both that and evict methods will call some ReturnHashToPoolIfNotUsed which will decrease the counter. But we'll have to put a lock on _cache to prevent getting value while we'll releasing it. So, extra allocations + locking, not that great.
We can avoid locking and counters if we know for sure that borrows are short-lived, i.e. that whoever needs these hash values will complete its job with them in let's say under 1s. If this can be guaranteed, we can remove values from _cache on eviction, thus preventing new borrowers from getting them, but not return bytes to the pool immediately. We'll do actual return few seconds later. We'll just need a background thread for cleanup and some way to keep ReleaseTimestamps and byte array.
But I'd ask these questions first:
Sebastiaan,
The actual issue here is that I'm holding the arrays for a long time, so they are likely to be in Gen2. Not holding on to them would result in me putting stuff in Gen2 which will be rarely cleaned up. This is especially true when you are operating at or near the cache limits.
At that point, certain behaviors (access pattern that is greater than the cache) will mean that you keep filling and dropping from the cache. Just enough time to put that on Gen2, which will then make overall memory usage far more problematic.
Kaotis,
Yes, that is an option, for sure. However, that leads to a far more subtle issue.
Note that in this case, we are exposed to the JIT deciding that we aren't using the
ra
after the call toDoSomethingWithBuffer
and cleaning it up.That ties to your "not expose the buffer", but that is hard to do. The issue is that even if we make
DoSomethingWithBuffer
as an instance method ofReleaseArray
, the JIT may decide to inline it, resulting in the same behavior. Non trivial issue.Dejan,
That leaks to the
ABA
problem, however. Because the act of getting the instance to increment the counter may be racy with _decrementing the counter.Scooletz,
The scenario I actually have is caching of buffers that may be in the MB range, and I don't actually follow your approach. 4 longs vs. an array is the exact same scenario. You are now not using a buffer pool, and will have a lot of allocations. That is also something what we want to avoid.
Dalibor,
1 & 2 - those are really complex solutions. In both cases, you need a way to avoid evictions while we are either copying the values or calling the callback. That means that you are paying the cost always, which is something that we want to avoid.
3 - That would expose us to a situation where the working set is greater than the cache size, which will mean that the cache will just make things a LOT harder for us, instead of better.
I did say locking would be needed, and more specifically locking on _cache object. Although, my sixth sense always tells "danger" when locking is involved.
lock( _cache ) { .... _cache.Get(file) ... counter++ } ...
ReturnHashToPoolIfNotUsed( OurHashClass ... ) { lock(_cache) { counter-- ... etc. } }
Dejan,
Yep, and now you are paying the locking cost on every access, which isn't nice to have.
Comment preview