The RavenDB indexing processOptimization–De-parallelizing work
One of the major dangers in doing perf work is that you have a scenario, and you optimize the hell out of that scenario. It is actually pretty easy to do without even noticing it. The problem is that when you do things like that, you are likely to be optimizing a single scenario to perform really well, but you are hurting the overall system performance.
In this example, we have moved heaven and earth to make sure that we are indexing things as fast as possible, and we tested with 3 indexes, on an 4 cores machine. As it turned out, we actually had improved things, for that particular scenario.
Using the same test case on a single core machine was suddenly far more heavy weight, because we were pushing a lot of work at the same time. More than the machine could process. The end result was that it actually got there, but much more slowly than if we would have run things sequentially.
Of course, I give you the outliers, but those are good indicators for what we found out. Initially, we thought that we could resolve that by using the TPL’s MaxDegreeOfParallelism, but it turned out to be more complex than that. We have IO bound and we have CPU bound tasks that we need to execute, and trying to execute IO heavy tasks with this would actually cause issues in this scenario.
We had to manually throttle things ourselves, both to ensure limited number of parallel work, and because we have a lot more information about the actual tasks than the TPL have. We can schedule them in a way that is far more efficient because we can tell what is actually going on.
The end result is that we are actually using less parallelism, overall, but in a more efficient manner.
In my next post, I’ll discuss the auto batch tuning support, which allows us to do some really amazing things from the point of view of system performance.
More posts in "The RavenDB indexing process" series:
- (24 Apr 2012) Optimization–Tuning? Why, we have auto tuning
- (23 Apr 2012) Optimization–Getting documents from disk
- (20 Apr 2012) Optimization–De-parallelizing work
- (19 Apr 2012) Optimization–Parallelizing work
- (18 Apr 2012) Optimization
Comments
What actual API you used for parallel computations? Parallel.ForEach? It's not suitable for IO-bound concurrency. For IO-bound you should use Tasks.
Enjoying these posts on the ongoing development of RavenDB. Could you imagine if the SQL Server or Oracle devs did posts like this?
Performance tuning on a line of business app is expensive and labor-intensive to get right: you really need a comprehensive suite of load tests on the same hardware profile as production using an equivalent network profile - easy-peasy. I can only imagine the headache involved with a more general-purpose tool like Raven. Like Daniel above, I'm enjoying the peek into your world.
Sorry if this question is too naive, but is the indexing primarily cpu bound, memory bound or i/o bound? Would it be helpful or possible to use a cloud computing to create indexes in a speedy fashion?
I've just heard that slow indexing speed is a major drawback of doc dbs, a prime reason why reporting etc needs to be done on sql... just wondering if you can throw a little cloud money at the problem to get faster turnaround on ad hoc reporting or index fixes.
Matthew, It is using a lot of CPU for full text indexing, it requires a lot of memory and it writes a lot to disk. It wouldn't be workable to do this on the cloud, because the cost of actually sending the data up there and then getting it back would be too high.
Madhav, RavenDB is DivanDB
A nice thing about indexing in Raven is that usually you have all recently modified documents in memory so you can index them without reading from the storage. You will not have such luxury when the lucene index is external to the application
Have you implemented your own Task Scheduler for the Task library?
Frank, No, we didn't do that. We handle the control in a much simple concept by partitioning the work before starting the parallel work
The wonderful thing about that scenario, is if the region of code you are optimising is modular (which I'm sure it is), the problem space is not variable once the software is installed. Hence, you could provide two indexing modules, one designed for single-core, and one for multi-core parallelism.
Of course you would increase your code maintenance, but thats just another decision...
Comment preview