ThreadPool vs Pool<Thread>

time to read 3 min | 467 words

imageOne of the changes we made to RavenDB 4.0 as part of the production run feedback was to introduce the notion of a pool of threads. This is quite distinct from the notion of a thread pool, and it deserves it own explanation.

A thread pool is something that is very commonly used in server applications. Instead of spawning a thread per task (a really expensive process), the system keeps a pool of threads around and provide some way to queue tasks for them to do. Such tasks are typically expected to be short and transient. They should also not have any expectation about the state of the thread nor should they modify any thread state.

In .NET, all async work will typically go through the system thread pool. When you are processing a request in ASP.Net, you are running on a thread pool thread and it is in heavy use in any sort of server environment.

RavenDB is also making heavy use of the thread pool, to service requests and handle most operations. But it also has the need to process several long terms tasks (which can run for days or more). Because of that,  and because we need both fine grained control and the ability to inspect the state of the system easily, we typically spawn a new, dedicated, thread for such tasks. As it turns out, under high memory load, this is a dangerous thing to do. The thread might try to commit some stack space, but there is no memory in the system to do so, resulting in a fatal stack overflow.

I should note that the kind of tasks we use a dedicated thread for are pretty rare and long lived, they also do things like mutate the thread state (changing the priority, for example), for example.

Because of that, we can’t just use the thread pool, nor do we want a similar abstraction. Instead, we created a pool of threads. A job can request a thread to run on, and it will get its own thread to run and do with as it pleases. When it is done running, which can be in a minute or in a week’s time, it will return the thread to the pool, where it will remain until another job needs it.

In this way, under high memory usage, we’ll not be creating new threads all the time, and the threads’ stack are likely to be already committed and available to the process.

Update: To clear things up. Even if we do need to create a new thread, we now have control over that, in a single place. If there isn't enough memory available to actually use the new thread stack, we'll refuse to create it.