The design and implementation of a better ThreadLocal<T>

time to read 7 min | 1291 words

I talked about finding a major issue with ThreadLocal and the impact that it had on long lived and large scale production environments. I’m not sure why ThreadLocal<T> is implemented the way it does, but it seems to me that it was never meant to be used with tens of thousands of instances and thousands of threads. Even then, it seems like the GC pauses issue is something that you wouldn’t expect to see by just reading the code. So we had to do better, and this gives me a chance to do something relatively rare. To talk about a complete feature implementation in detail. I don’t usually get to do this, features are usually far too big for me to talk about in real detail.

I’m also interested in feedback on this post. I usually break them into multiple posts in a series, but I wanted to try putting it all in one location. The downside is that it may be too long / detailed for someone to read in one seating. Please let me know your thinking in the matter, it would be very helpful.

Before we get started, let’s talk about the operating environment and what we are trying to achieve:

Running on .NET core.
Need to support tens of thousands of instances (I don’t like it, but fixing that issue is going to take a lot longer).
No shared state between instances.
Cost of the ThreadLocal is related to the number of thread values it has, nothing else.
Should automatically clean up after itself when a thread is closed.
Should automatically clean up after itself when a ThreadLocal instance is disposed.
Can access all the values across all threads.
Play nicely with the GC.

That is quite a list, I have to admit. There are a lot of separate concerns that we have to take into account, but the implementation turned out to be relatively small. First, let’s show the code, and then we can discuss how it answer the requirements.

This shows the LightThreadLocal<T> class, but it is missing the CurrentThreadState, which we’ll discuss in a bit. In terms of the data model, we have a concurrent dictionary, which is indexed by a CurrentThreadState instance which is held in a thread static variable. The code also allows you to define a generator and will create a default value on first access to the thread.

The first design decision is the key for the dictionary, I thought about using Thread.CurrentThread and the thread id.Using the thread id as the key is dangerous, because thread ids may be reused. And that is a case of a nasty^nasty bug. Yes, that is a nasty bug raised to the power of nasty. I can just imagine trying to debug something like that, it would be a nightmare. As for using Thread.CurrentThread, we’ll not have reused instances, so that is fine, but we do need to keep track of additional information for our purposes, so we can’t just reuse the thread instance. Therefor, we created our own class to keep track of the state.

All instances of a LightThreadLocal are going to share the same thread static value. However, that value is going to be kept as small as possible, it’s only purpose is to allow us to index into the shared dictionary. This means that except for the shared thread static state, we have no interaction between different instances of the LightThreadLocal. That means that if we have a lot of such instances, we use a lot less space and won’t degrade performance over time.

I also implemented an explicit disposal of the values if needed, as well as a finalizer. There is some song and dance around the disposal to make sure it plays nicely with concurrent disposal from a thread (see later), but that is pretty much it.

There really isn’t much to do here, right? Except that the real magic happens in the CurrentThreadState.

Not that much magic, huh? Smile

We keep a list of the LightThreadLocal instance that has registered a value for this thread. And we have a finalizer that will be called once the thread is killed. That will go to all the LightThreadLocal instances that used this thread and remove the values registered for this thread. Note that this may run concurrently with the LightThreadLocal.Dispose, so we have to be a bit careful (the careful bit happens in the LightThreadLocal.Dispose).

There is one thing here that deserve attention, though. The WeakReferenceToLightThreadLocal class, here it is with all its glory:

This is basically wrapper to WeakReference that allow us to get a stable hash value even if the reference has been collected. The reason we use that is that we need to reference the LightThreadLocal from the CurrentThreadState. And if we hold a strong reference, that would prevent the LightThreadLocal instance from being collected. It also means that in terms of the complexity of the object graph, we have only forward references with no cycles, cross references, etc. That should be a fairly simple object graph for the GC to walk through, which is the whole point of what I’m trying to do here.

Oh, we also need to support accessing all the values, but that is so trivial I don’t think I need to talk about it. Each LightThreadLocal has its own concurrent dictionary, and we can just access that Values property and we get the right result.

We aren’t done yet, though. There are still certain things that I didn’t do. For example, if we have a lot of LightThreadLocalinstances, they would gather up in the thread static instances, leading to large memory usage. We want to be able to automatically clean these up when the LightThreadLocalinstance goes away. That turn out to be somewhat of a challenge. There are a few issues here:

We can’t do that from the LightThreadLocal.Dispose / finalizer. That would mean that we have to guard against concurrent data access, and that would impact the common path.
We don’t want to create a reference from the LightThreadLocal to the CurrentThreadState, that would lead to more complex data structure and may lead to slow GC.

Instead of holding references to the real objects, we introduce two new ones. A local state and a global state:

The global state exists at the level of the LightThreadLocal instance while the local state exists at the level of each thread. The local state is just a number, indicating whatever there are any disposed parents. The global state holds the local state of all the threads that interacted with the given LightThreadLocal instance. By introducing these classes, we break apart the object references. The LightThreadLocal isn’t holding (directly or indirectly) any reference to the CurrentThreadState and the CurrentThreadState only holds a weak reference for the LightThreadLocal.

Finally, we need to actually make use of this state and we do that by calling GlobalState.Dispose() when the LightThreadLocal is disposed. That would mark all the threads that interacted with it as having a disposed parents. Crucially, we don’t need to do anything else there. All the rest happens in the CurrentThreadState, in its own native thread. Here is what this looks like:

Whenever the Register method is called (which happens whenever we use the LightThreadLocal.Value property), we’ll register our own thread with the global state of the LightThreadLocal instance and then check whatever we have been notified of a disposal. If so, we’ll clean our own state in RemoveDisposedParents.

This close down all the loopholes in the usage that I can think about, at least for now.

This is currently going through our testing infrastructure, but I thought it is an interesting feature. Small enough to talk about, but complex enough that there are multiple competing requirements that you have to consider and non trivial aspects to work with.

Tweet Share Share 16 comments

Tags:

Comments

03 Feb 2020
13:11 PM

Patrick Huizinga

I'm surprised you dispose the values of a threadlocal when the threadlocal is finalized. That seems like a big no-no to me and I see no justification why you think it's worth it.

Also ThreadLocal.Dispose has a bug:

var copy = _values;
_values = null;
while (copy.Count > 0)

When Dispose is called a second time, copy will be null, resulting in a NRE in the while condition. Given that you make a weak attempt to handle the edge case of a value being added during the dispose (why else have the while loop), it is strange you don't have one for concurrent disposes.

And Interlocked.Add(ref parentsDisposed, -parentsDisposed);in RemoveDisposedParents doesn't look right either

03 Feb 2020
19:26 PM

Jason

I have notice one more null reference issue. Inside LightThreadLocal's get Value. The _values could be null. Where the code calls _values.TryGetValue. It should be _values?.TryGetValue(out _) == false. It might be typo, since you specifically use == false instead of !.

The other thing I have notice is ConcurrentDictionary<TKey, TValue>.Count. The count value will lock and loop through all entities. It would be better to use Any() instead. You even had an blog post talking about Count locking. Guess that's just make code smaller to read?

04 Feb 2020
09:29 AM

Paulo Morgado

You can release (set to null) the _generator after being used. It's GC friendly because not only reduces that memory and object count but because it might be releasing any closed over objects.

04 Feb 2020
11:25 AM

Pavel Tupitsyn

I have recently dealt with deterministic clean-up of thread locals in Apache Ignite.NET, the requirement was to have a callback when a thread is about to exit, and the callback must happen on the exiting thread itself (so I can clean up JNI locals).

This can probably simplify your code quite a bit: https://github.com/ptupitsyn/UnmanagedThreadUtils

04 Feb 2020
12:02 PM

Oren Eini

Patrick,

You are correct on the NRE, I have fixed that in the code itself. You can see the full thing here: https://github.com/ravendb/ravendb/blob/9aa3542919a736ca3286c3f30b1494235866dcf9/src/Sparrow/Threading/LightWeightThreadLocal.cs

As for the finalizer's dispose, it is a bit complex, but let me see if I can explain.

When the CurrentThreadState is disposing, it is removing the value from the parent dictionary. That is known to have a live instance, since it is held via a weak reference. So when we pull the value from the dictionary, we know it hasn't been finalized. At this point, we immediately dispose it to reduce the time it is held alive. Otherwise, we'll need another full GC / finalizer run to dispose it properly.

04 Feb 2020
12:03 PM

Oren Eini

Patrick,

About the Interlocked.Add call. This is because while we are handling the disposal of a parent in a thread, another parent may also die, so we'll need to do this again next time.

04 Feb 2020
12:05 PM

Oren Eini

Jason,

You are correct, the real code has a check to see if the value has been disposed, which prevent this issue. I'm not using Any() here because there is not supposed to be any cost here. During the dispose, the caller is supposed to ensure no one is using the instance.

04 Feb 2020
12:05 PM

Oren Eini

Paulo,

I don't think that I can release the generator, it isn't used once, it may be needed again to create the value from a different thread. Remember that each thread will need a separate instance.

04 Feb 2020
12:07 PM

Oren Eini

Pavel,

That assumes that I own all the threads, but I need to deal with calls from thread pool threads as well, which I don't create.

04 Feb 2020
12:08 PM

Oren Eini

Pavel,

Sorry, I just read more deeply into what you are doing, and it is really interesting, yes. You are registering a callback using the OS mechanisms, so that will be called regardless. Very nice!

04 Feb 2020
12:42 PM

Patrick Huizinga

@Oren,

I understand why you want to clean up other instances when the CurrentThreadState dies. What I don't understand is why you would Dipose values you have 'no' control over on the finalizer thread. 9 times out of 10 when I write a Dispose, I have no need for a finalizer.

Even worse is that ~LightThreadLocal indirectly calls item.Dispose on every value as well. At that point you 'know for sure' that if a LightThreadLocal is GCed, so must its _values, therefor so will all the values inside that dictionary.

As for the Interlocked.Add, it's working on a parameter, not a field.

04 Feb 2020
22:16 PM

Paulo Morgado

Right. My bad.

05 Feb 2020
15:43 PM

Oren Eini

Patrick,

Assume that I have a thread local value, for example, an open file that I use to write logs from a single thread.I store that in the thread local, and then the thread local is disposed. I want to immediately clean it up.However, what happens when the thread is just closed, but the parent instance is alive? I can do nothing, which will likely cause the finalizer to run eventually on the file instance. But I want to do that as soon as possible, instead of waiting for another GC cycle and finalizer run. The difference can be in _minutes_.

The other side of this is what happens when the ThreadLocal instance is discarded. The problem here is that it is very likely that the instance is still referenced from the thread static variable.I changed things to make sure that we are using weak reference, but didn't clean it up. I removed the thread local finalizer.

As for the Interlocked.Add, you are absoutely correct, there was supposed to be a ref there. Thanks for catching this.

10 Feb 2020
09:15 AM

Shad Storhaug

I have spotted a potential problem. There is a non-zero chance that the following sequence could occur:

Thread A calls _values.TryGetValue() in LightThreadLocal<T>.Value, gets a false value and enters the block
Thread B calls setter of LightThreadLocal<T>.Value and sets it to a specific value
Thread A calls _values[_state] = v; overwriting the value from Thread B

Seems like the most reasonable solution would be to use _values.GetOrAdd() in the getter. Unfortunately, it is missing an overload with an addCondition to eliminate the extra memory allocation for null values. But, that is still better than using a lock, which would defeat the purpose of using ConcurrentDictionary.

In our case, we could probably swap in LurchTable in place of ConcurrentDictionary and add the missing overload that is needed to skip the add operation when _generator is null. Although, in the long run it would probably be best to submit a PR to Microsoft to add the functionality to ConcurrentDictionary, since others are also experiencing the same issue.

10 Feb 2020
10:26 AM

Oren Eini

Shad,

But Thread A and Thread B have a different _state, so they are going to go to different locations in the _values, the _state isn't the same between them. Am I missing something?

10 Feb 2020
12:51 PM

Shad Storhaug

Ahh, okay. I guess that would make all the difference.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB