Public Service AnnouncementConcurrentDictionary.Count is locking
During a performance problem investigation, we discovered that the following innocent looking code was killing our performance.
This is part of a code that allow users to subscribe to changes in the database using a WebSocket. This is pretty rare, so we check that there aren’t any and skip all the work.
We had a bunch of code that run on many threads and ended up calling this method. Since there are not subscribers, this should be very cheap, but it wasn’t. The problem was that the call to Count was locking, and that created a convoy that killed our performance.
We did some trawling through our code base and through the framework code and came up with the following conclusions.
- Clear locks
- CopyTo, GetEnumerator, ToArray, Count creates a snapshot (consume more memory)
- TryPeek, IsEmpty are cheap
Here is a piece of problematic code, we are trying to keep the last ~25 items that we looked at:
The problem is that this kind of code ensures that there will be a lot of snapshots and increases memory utilization.
- Count, Clear, IsEmpty, ToArray - lock the entire thing
- TryAdd, TryUpdate, TryRemove – lock the bucket for this entry
- GetEnumerator does not lock
- Keys, Values both lock the table and forces a temporary collection
If you need to iterate over the keys of a concurrent dictionary, there are two options:
Iterating over the entire dictionary is much better then iterating over just the keys.
More posts in "Public Service Announcement" series:
- (11 Aug 2017) ConcurrentDictionary.Count is locking
- (28 Dec 2010) Git master repositories for the Rhino Tools projects
My first instinct would be - why is this not using
Any? For much the same reason I'd tell off someone using
COUNT(*)in SQL if all they care about is whether any row exists.
The enumerator is super friendly, using only volatile reads. No locking performance overhead. I've used the iterator over Keys/Count several times already. Not to mention, writing a custom structure if lookup/cache is needed.
@Damien: I think the comparison with sql's COUNT(*) is wrong. The database has to count the rows but the dictionary just looks it up.
@Oren, whats was the actual solutions here?
BTW found this blog recently, https://blogs.msdn.microsoft.com/dotnet/2017/06/07/performance-improvements-in-net-core/ Would be interesting to know if you're playing with .Net Core 2 yet? Looks like there's a load of performance improvements around collections
@TomCollins - except as already observed here, with the concurrent version it can't just "look it up". It takes locks, it has to iterate over its internal tables, etc.
Anyis semantically what's being asked for and should be the default unless or until it's proven to be a performance issue itself.
@Ian we are still not using CoreCLR 2.0, the work is under way and if everything goes well we should be able to have it running there within the coming weeks. There are improvements but the general locking mechanisms for concurrent data structures haven't changed. @Damien the problem with LINQ operations (in general) on hot spots (where this was used) is that the predicate requires a call. LINQ is essentially banned from the core of RavenDB (and afaik it is from other high throughput software like Roslyn) even if semantically is what you need.
core keeps the same behavior, it seems. https://github.com/dotnet/corefx/issues/3357
not sure this is the same issue, I didn't look into c# source if it escalates locks, but here is a java discussion: http://stackoverflow.com/a/22996395/1268003
And how is it fixed? Implemented your own 'Any' that uses the non-blocking GetEnumerator?
Ryan, We changed things so we are very careful about access
Values, etc. In some cases, we started tracking this value externally.