Fine grained work control
With RavenDB 3.5, we are focusing on performance as one of the key features. I’ve already spoken at length about the kind of changes that we had made to improve performance. A few percentage points here and there end up being quite significant when you add them all together.
But just sanding over the rough edges isn’t quite enough for us. We want to have a major impact, not just an avalanche of small improvements. In order to handle that, we needed to be much more aware of how we are making use of resources in the system.
The result of several months of work is now ready to enter into performance testing, and I’m quite excited about it. But before I show you the results, what is it? Well, RavenDB does quite a lot in the background, to avoid holding up a request thread when you are calling RavenDB. This mean that we have a lot of background work, indexing, map/reduce, etc.
We have been using the default .NET thread pool for a long time to do that, and it has served us very well. But it is a generic construct, without awareness of the unique needs that RavenDB has. Therefor, we worked for quite some time to create our own Thread Pool that match what we do.
The major changes with the RavenThreadPool (RTP from now) are:
- There is a fixed (and dedicated) number of threads that will do the work, sharing (and stealing) work among themselves.
- Indexes tasks are continuous and shared, so a big indexing work will spread across all threads, but with a preference for locality of work if we have a lot of stuff to parallel.
- A slow index doesn’t stop us from working on other indexes, we’ll let it process on its own, and let the other indexes run forward without it.
- Dynamic adjusting of the amount of work that is allowed for the indexes means that under load, we can dynamically and rapidly reduce the amount of work we are doing to allow more resources for processing requests.
There are other stuff, but they are mostly of interest for the people who work on RavenDB, not on those who use it.
And the results, they are pretty good. Here is the before and after sample.
Note that we have a mix here of various types of indexes. The X axis is time, and the Y axis is the number of documents indexed.
As you can see, in the before (without RTP), we are processing all indexes roughly on the same course, with a pretty flat growth over time. However, with RTP, the situation is different. You can see that very quickly the fast indexes are starting to outpace both their version without RTP and the slower indexes.
That, in turn, means that they complete much faster. In the case of the Simple Map index, it complete indexing roughly 50% faster than without RTP. But even more interesting is what happens globally, because we are able to complete indexing of the fast indexing fast, it means that we can process the slow index (HeavyMapReduce) with more resources. So even this slowpoke completes about 15% faster with RTP than without it.
We are still running tests, but even so, this is quite exciting.
Comments
This is indeed great news and a big step in making RavenDb a fast solution.
Not sure if it fits your usage patterns, but if you are interested I've ported the LongAdder from java.utils.concurrent.atomic to .net (https://github.com/etishor/ConcurrencyUtilities). It provides a very fast way of incrementing a counter concurrently without introducing contention. In my tests it is at least 10x faster than using Interlocked.*
Lulian, We haven't got to the point with Interlocked calls (which we don't have too many of) are causing issues
Howdy,
Firstly congrats on Tamar! Its a project that was estimated at 9 months, but takes 20 years :-)
Second, the Google OpenID isn't working too well, got sent here : https://support.google.com/accounts/answer/6206245?p=openid&rd=1
And finally, how did it perform against TPL and the native thread pool?
Best wishes to all!
Hi, We'll have a new blog version soon.
And the benchmark isn't against the thread pool, it is against the scheduling inside it.
Those 3D barcharts make it look like the RTP versions have a lower max throughput...
/whiner mode
Also, will there be a nuget package for the RTP alone? (pretty please?)
Remco, A standalone RTP is very unlikely. It is really tied to the kind of workloads and requirements that we have in RavenDB (batchmode, distinct and separate workloads per item that can be processed concurrently and in groups)
Comment preview