Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,971 | Comments: 44,508

filter by tags archive

Rob’s SprintQuery optimizer jumped a grade


RavenDB’s query optimizer is pretty smart, it knows how to find the appropriate index for your queries, and even create a new index to match your query if it didn’t exist. But that was the limits of its abilities. A human could still go into the database and say, look at those:

image

Those all operate on Posts, and you should be able to merge them all into a single index. Reducing the number of indexes is a good thing, as it reduces the amount of IO on the system, which is typically our limiting factor.

Now, there was no real reason why we couldn’t actually tell the query optimizer that it should be smart enough that when it creates a new index, it will use all of the properties that have been previously indexed.

However, doing so would actually make no difference to us. Because until now, we didn’t have a way to stop an index. With the new index idling feature, we can now have the query optimizer create a new merged index, and then the database will just mark the extra index as idle after a while.

Almost, there is still another issue that we have to resolve. What happens when we have a big database, and we introduce a new (and wider) index? By default, all matching queries would actually hit that index, and not the previously existing index. That is great, except… the new index is stale, and might remain stale for a few minutes. During that time, we have a perfectly servicable index that is just sitting there.

The query optimizer can now take into account the staleness level of an index as well when selecting it, meaning that there should be no interruption from the point of view of other queries. The new index will be introduced, go through all the documents, and then take over as the serving index for all queries. The existing index will wither away and die.

More posts in "Rob’s Sprint" series:

  1. (08 Mar 2013) The cost of getting data from LevelDB
  2. (07 Mar 2013) Result Transformers
  3. (06 Mar 2013) Query optimizer jumped a grade
  4. (05 Mar 2013) Faster index creation
  5. (04 Mar 2013) Indexes and the death of temporary indexes
  6. (28 Feb 2013) Idly indexing

Comments

Afif

from our experience of ravendb in production when you have a few million records, a new index is stale for hours. worse still many times when the index goes stale, it returns no results, as opposed to always returning what is already indexed.

Chad T

Which build will these new index related features be in?

Matt Warren

@Afif

The index definately should return results whilst it is stale. It will however not return results for the most recently inserted docs as it indexes in insert order.

What build are you using? You might want to post an issue on the mailing list, https://groups.google.com/forum/#!forum/ravendb

Also with the latest builds the indexing times have fallen, see http://ayende.com/blog/160033/what-is-up-with-ravendb-2-0-performance for instance

Ayende Rahien

Afif, We improved on the "new index" story as well. But now that the QO takes this into account, if there is an existing index that can serve ,it will use that.

Ayende Rahien

Chad, The 2.5 experimental builds.

Afif

Matt, We are using build 1.0.960.

configurator

Suppose I have query that uses values A and B; it will create an auto index on A and B. I then change my query to use values B and C instead - the index will be expanded to index all of A, B and C. But if I never query by A again, isn't this a bit of a waste?

Ayende Rahien

Configurator, Not really, no. The cost of indexing another field vs. the cost of maintaining another index is several orders of magnitudes.

 configurator

Sure, but with constantly changing requirements you could end up with dozens of fields being indexed.

Ayende Rahien

Configurator, In constantly changing requirement, you are likely to need those fields later on, no? And even so, you aren't likely to get that in actual production system.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Paying the rent online - 2 days from now

There are posts all the way to Aug 03, 2015

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats