What is new in RavenDB 3.5My thread pool is smarter
The .NET thread pool is a really amazing piece of technology, and it is suitable for a wide range of usages. RavenDB has been making use of it for almost of all concurrent work since the very beginning.
In RavenDB 3.5, we have decided to change that. RavenDB have a lot of parallel execution requirements, but most of them have unique characteristics that we can express better with our own thread pool.
To start with, unlike the normal thread pool, we aren’t registering just a delegate and some state for it to execute, we are always registering a list of items to process, and a delegate that takes either a single item from that list or a section of that list. This let us do a much better job at work stealing. Because we have a lot more context about the actual operation. We know that when we are done with executing a particular delegate, we get to run the same delegate on the next available item in the list that it got passed it. That give us higher locality of code, because we are always executing the same task, as long as we have tasks for that in the pool.
We often have nested operations, a parallel task (execute indexing work) that spawn additional parallel work (index the following documents). By basing this all on our custom thread pool, we can perform those operations in a way that doesn’t involve waiting for that work to be done. Instead, the thread pool thread that we run on is able to “wait” by executing the work that we are waiting for. We have no blocked threads, and in many cases we can avoid getting any context switches.
Under load, that means that threads won’t put a lot of work on the thread pool and then have to fight with each other over who will finish its work first, it means that we get to run our own tasks, and only when there are enough threads available for other word will we spread for additional threads.
Speaking of load, the new thread pool also have dynamic load balancing feature. Because we know that RavenDB will use the thread pool for background work only, we can prioritize things accordingly. RavenDB is trying to keep the CPU usage in the 60% – 80% range by default. And if we detect that we have a higher CPU usage, we’ll start decreasing the background work we are doing, to make sure that we aren’t impacting front row work (like serving requests). We’ll start doing that by changing the priority of the background threads, and eventually just stop processing work in most of the background threads (we always have a minimum number of threads that will remain working, of course).
Another fun thing that the thread pool can do is to detect and handle slowpokes. A common example is an index that is taking a long time to run. Significantly more than all the other indexes. The thread pool can release all the other indexes, and let the calling code know that this particular task has been left to run on its own. RavenDB will then split the indexing work so the slow index will not slow all of the rest of the indexing.
And having split the thread pools between front row work (the standard .NET thread pool) doing request processing and the background pool (which is our own custom impl), we get a lot more predictability in the environment. We don’t have to worry about indexing jobs taking over the threads required to serve requests, or for requests on the server to impact the loading of a new database, etc.
And finally, like every other feature in RavenDB nowadays, we have a rich set of debug endpoints that can tell us in details exactly what is going on. That is crucial when we are talking about systems that run for months and years or when we are trying to troubleshoot a problematic server.
More posts in "What is new in RavenDB 3.5" series:
- (12 Aug 2015) Monitoring support
- (11 Aug 2015) Monitoring active I/O operations
- (10 Aug 2015) Filters & transformers with RavenDB Replication
- (06 Aug 2015) Collection Specific Replication
- (15 Jul 2015) Exploring data in the dark
- (14 Jul 2015) My thread pool is smarter
- (10 Jul 2015) Smuggling data across servers
Comments
Did you decide to increase the idle time default value for unloading databases or does this new design increase performance enough to fix "The case of the hung over server" completely?
Eli, The new design prevent this issue
Comment preview