Invisible race conditions: The cache has poisoned us

Jan 02 2018

Invisible race conditionsThe cache has poisoned us

time to read 2 min | 262 words

We got a memory corruption error one of those days that was quite interesting. It was in a place where we previous fixed a memory corruption error and was, at a glance, quite impossible.

The code would checkout an item from the cache and increment its ref count, which will keep it alive for as long as we were using it. But something made it fail, and quite horribly, too. We finally tracked the code down to this piece of code, which is run when we update the cache:

When the ref count goes to zero, we’ll release the memory, and _items is a Concurrent Dictionary.

Do you see the error?

The AddOrUpdate method will call the updateValueFactory when it needs to update a value, but it makes no promises with regards to its atomicity. In other words, if you have two threads calling this method, the update lambda will be called twice with the same item, resulting in early release of the value and hence memory corruption.

This can be seen here:

As you can see, we are looking at a loop that may be executed several times, as such the updateValueFactory can be called several times, and the only guarantee we have is that after the method has returned, the last value we were called with was the value that was in the cache and we replaced.

Here is the fix:

That was quite hard to figure out, because at a glance, this looks just fine.

Tweet Share Share 4 comments

Tags:

More posts in "Invisible race conditions" series:

(02 Jan 2018) The cache has poisoned us
(27 Dec 2017) The sometimes failing script
(26 Dec 2017) The async query

Comments

02 Jan 2018
15:07 PM

Julie Koubova

Hello 😊 what is your view on System.Collections.Immutable? I find them easier to reason about, but have no idea how un/performant they actually are… probably nothing to write home about?

02 Jan 2018
19:32 PM

Oren Eini

Julie, Conceptually, I find them wonderful. I wish we could use them. Performance wise, there are a no go for us. See the discussion around them here: https://ayende.com/blog/164739/immutable-collections-performance

And in the related posts.

03 Jan 2018
10:42 AM

Ian Yates

I see that sort of mistake commonly made with concurrent dictionaries and the conceptual "Add or get existing" call. Using a ConcurrentDictionary<K, Lazy<V>> tends to solve that but those who haven't read the docs closely don't realise that multiple callers are allowed to race when adding.

What's worse is that the MSDN docs at https://msdn.microsoft.com/en-us/library/ee378665%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396 don't really call out this race condition. The remarks found on a linked webpage for one of the method overloads does call this out... https://msdn.microsoft.com/en-us/library/ee378675(v=vs.110).aspx

If you call AddOrUpdate simultaneously on different threads, addValueFactory may be called multiple times, but its key/value pair might not be added to the dictionary for every call.

That's not the best wording but it gets the point across, but only mentions the addValueFactory.

Reading for the other overload - https://msdn.microsoft.com/en-us/library/ee378664(v=vs.110).aspx - finds a horrible example. It goes about calculating some concurrency level using the number of processors available. But then it does some for loops and regular single-threaded AddOrUpdate calls. It's like whoever wrote the docs didn't have their brain engaged that day???

The same bad code is copied across to the newer docs.microsoft.com site (https://docs.microsoft.com/en-au/dotnet/api/system.collections.concurrent.concurrentdictionary-2.addorupdate?view=netframework-4.7.1), although the example at the top at least does something in parallel.

It's interesting that the docs do not call out that updateFunc may be called twice. Certainly that section of code demonstrates this. But, without seeing the source, one could argue that perhaps the add Func is special because it allows race to initialisation, but once there's a key/value in the dictionary, there's at least something for the dictionary to serialise and thus the updates would be processed atomically.

I might make it a bit of a mission later this evening to submit an update to the new docs.microsoft.com site :)

18 Jan 2018
14:10 PM

Stig Christensen

Last paragraph here explains it pretty good.

https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/how-to-add-and-remove-items

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB