Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 82

filter by tags archive

RavenDB indexing optimizations, Step I–dynamic batches

time to read 2 min | 350 words

One of the major features in RavenDB was significant improvements to indexing speed. I thought that this would be a good idea to discuss this in detail.

Here is a very simple example of how RavenDB used to handle indexing.

The blue boxes are inserts, and the green and red ones are showing the indexing work done.

image

 

This is simplified, of course, but that is a good way to show exactly what is going on there. In particular, you can see that it takes quite a long time for us to index everything.

The good thing here is that the actual cost we have is per index, so we want to batch things properly. Note that this behavior is already somewhat optimized. Most of the time, you don’t have a few calls with thousands of documents. Usually we are talking about many calls each saving a small number of documents. This approach would balance those things out, because we can merge many save calls into a single indexing run.

However, that still took too much time, so we introduced the idea of dynamically change the batch size (the reason we have batches is to limit the amount of RAM we use, and allow us to respond more quickly in general).

So we changed things to do this:

image

Note that we increased the batch size as we note that we have more things to index. The batch size will automatically grow all the way to 128K docs, depending on a whole host of factors (load, speed, memory, number of indexes, etc).

Since the cost is most in per batch, we actually got a not insignificant improvement from this approach.  But we can do better, as we will see in our next post.


Comments

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. The insidious cost of allocations - 8 hours from now
  2. Buffer allocation strategies: A possible solution - 3 days from now
  3. Buffer allocation strategies: Explaining the solution - 4 days from now
  4. Buffer allocation strategies: Bad usage patterns - 5 days from now
  5. The useless text book algorithms - 6 days from now

And 1 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    03 Sep 2015 - The industry at large
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats