RavenDB 4.0 Unsung HeroesThe indexing threads

time to read 3 min | 473 words

wire-33134_640A major goal in RavenDB 4.0 is to eliminate as much as possible complexity from the codebase. One of the ways we did that is to simplify thread management. In RavenDB 3.0 we used the .NET thread pool and in RavenDB 3.5 we implemented our own thread pool to optimize indexing based on our understanding of how indexing are used. This works, is quite fast and handles things nicely as long as everything works. When things stop working, we get into a whole different story.

A slow index can impact the entire system, for example, so we had to write code to handle that, and noisy indexing neighbors can impact overall indexing performance  and tracking costs when the indexing work is interleaved is anything but trivial. And all the indexing code must be thread safe, of course.

Because of that, we decided we are going to dramatically simplify our lives. An index is going to use a single dedicated thread, always. That means that each index gets their own thread and are only able to interfere with their own work. It also means that we can have much better tracking of what is going on in the system. Here are some stats from the live system.

image

And here is another:

image

What this means is that we have fantastically detailed view of what each index is doing, in terms of CPU, memory and even I/O utilization is needed. We can also now define fine grained priorities for each index:

image

The indexing code itself can now assume that it single threaded, which free a lot of complications and in general make things easier to follow.

There is the worry that a user might want to run 100 indexes per database and 100 databases on the same server, resulting in a thousand of indexing threads. But given that this is not a recommended configuration and given that we tested it and it works (not ideal and not fun, but works), I’m fine with this, especially given the other alternative that we have today, that all these indexes will fight over the same limited number of threads and stall indexing globally.

The end result is that thread per index allow us to have fine grained control over the indexing priorities, account for memory and CPU costs as well simplify the code and improve the overall performance significantly. A win all around, in my book.

More posts in "RavenDB 4.0 Unsung Heroes" series:

  1. (30 Oct 2017) automatic conflict resolution
  2. (05 Oct 2017) The design of the security error flow
  3. (03 Oct 2017) The indexing threads
  4. (02 Oct 2017) Indexing related data
  5. (29 Sep 2017) Map/reduce
  6. (22 Sep 2017) Field compression