Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 10 | Comments: 37

filter by tags archive

Shiny features in the depth: New index optimization

time to read 2 min | 305 words

One of the nice features in RavenDB 3.0 is optimizing the process of creating a new index. In particular, we want to optimize it when you create a new index on a small collection in a large database.

If you have a small database, you don’t care, it is going to complete quickly anyway. If you are creating an index on a collection that compose a significant amount of the documents in the database, you don’t care, you are going to have to do a lot of work anyway. But the common case for a big database is that you usually have one very big collection, and much smaller collections for everything else.

In RavenDB 2.x, you still have to pay the full price for indexing everything, but that isn’t the case in RavenDB 3.0. What we have done is to effectively optimize the process so that in this case, we will preload all of the documents taking part in the relevant collection, and send them directly to be indexed.

We do this by utilizing the Raven/DocumentsByEntityName index. Which has already indexed everything in the database anyway. This is a nice little feature, because it allows us to really take advantage of the work we already did long ago. Using one index to pre-populate another is a neat trick, and one that I am very happy about.

Because this is a new code path, it also means that it is actually executed outside of standard indexing. And that in turn means that adding a new index will not impact other indexes at all.

This is a small feature, but it does address a common pain point our users have with working in RavenDB in production.

Reminder, we have our upcoming RavenDB Conference in April, where we’ll discuss other stuff in the 3.0 release.


Comments

Jeff Harris

Sounds great...when I've run into this problem, I've stopped to ask myself whether I really have the same sharding/replication requirements for these kinds of documents. I've found myself deciding to create separate DBs most of the time.

Igor Nosyryev

It will be great to allow indexing of another index. Especially for Map Reduce. Now in 2.xx I have to use SIR to save the first index as documents and then build another index on them. Looks like it has to allocate space twice: once for the index and then to save it as documents.

Ayende Rahien

Igor, And we don't allow to index off another index, we make use of an index that is likely already there.

Igor Nosyryev

SIR is ScriptedIndexResults. I catch updates in the first index and change documents, which represent entries in the index. The second index is based on these documents. In this way, I can do reduce of already reduced data. I can't come up with other approach.

Ayende Rahien

Igor, Yes, that is the appropriate way to handle this.

Igor Nosyryev

Yeah, it works well on a small dataset. I didn't try it yet on the production, which has about 3M items. The only my concern is efficiency. Items in Map Reduce index are already stored in the index. When I save them as documents, the same data is stored twice, so it needs 2 times more space. Also it's an unnecessary performance impact on the database. I think it will be a nice feature to expose items from Map Reduce indexes as documents and allow building other indexes (both regular and M/R) on them.

Ayende Rahien

Igor, That wouldn't really have an impact, even on large databases.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The case of the memory eater and high load - about one day from now
  2. Production postmortem: The case of the lying configuration file - 2 days from now
  3. Production postmortem: The industry at large - 3 days from now
  4. The insidious cost of allocations - 4 days from now
  5. Find the bug: The concurrent memory buster - 5 days from now

And 4 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats