Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,400

filter by tags archive

Rob’s SprintIdly indexing


During Rob Ashton’s visit to our secret lair, we did some work on hard problems. One of those problems was the issue of index prioritization. As I have discussed before, this is something that isn’t really easy to do, because of the associated IO costs with not indexing properly.

With Rob’s help, we have the defined the following:

  • An auto index can be set to idle if it hasn’t been queried for a time.
  • An index can be forced to be idle by the user.
  • An index that was automatically set to idle will be set to normal on its first query.

What are the implications for that? And idle index will not be indexed by RavenDB during the normal course of things. Only when the database is idle for a period of time (by default, about 10 minutes with no writes) will we actually get it indexing.

Idle indexing will continue indexing as long as there is no other activity that require their resources. When that happens, they will complete their current run and continue to wait for the database to become idle again.

But wait, there is more. In addition to introducing the notion of idle indexes, we have also created another two types of indexes. The first is pretty obvious, the disabled index will use no system resources and will never take part in indexing. This is mostly there so you can manually shut down a single index. For example, maybe it is a very expensive one and you want to stop it while you are doing an import.

More interesting, however, is the concept on an abandoned index. Even idle indexes can take some system resources, so we have added another level beyond that, an abandoned index is one that hasn’t been queried in 72 hours. At that point, RavenDB is going to avoid indexing it even during idle periods. It will still get indexed, but only if there has been a long enough time passed since the last time it was indexed.

Next, we will discuss why this feature was a crucial step in the way to killing temporary indexes.

More posts in "Rob’s Sprint" series:

  1. (08 Mar 2013) The cost of getting data from LevelDB
  2. (07 Mar 2013) Result Transformers
  3. (06 Mar 2013) Query optimizer jumped a grade
  4. (05 Mar 2013) Faster index creation
  5. (04 Mar 2013) Indexes and the death of temporary indexes
  6. (28 Feb 2013) Idly indexing

Comments

Patrik Potocki

Cool,

When will you push it into the unstable branch so we can test it out?

configurator

"an abandoned index is one that hasn’t been queried in 72 hours" - so a weekly report will never be up to date?

Also, why do idle indexes wait for 10 minutes of inactivity instead of just working only when all other indexes are up to date?

Chris Marisic

"An index that was automatically set to idle will be set to normal on its first query."

What if you want the index to always be an idle index? Like a reporting index that pulls tons of things together, or a crazy reporting map/reduce that is not relevant to OLTP functionality at all?

Rob Ashton

Chris - while not covered explicitly in the entry above, there is a flag to "force idle" and this will be exposed in the studio

Alex Spence

Can we get a way to set these flags on the index creators as well?

Ayende Rahien

Patrik, This is already available at: http://hibernatingrhinos.com/builds/ravendb-unstable-v2.5

Ayende Rahien

Configurator, You can force an index to not go into idle / abandoned mode. But in general, if you have an index that is queried weekly, you can afford to wake it up and then wait for it to catch up.

Ayende Rahien

Configurator, And the reason we wait for 10 minutes on inactivity is that we don't want to get into: "we have 1 second of rest, let us start indexing all the idle indexes, which can be VERY expensive".

Ayende Rahien

Alex, No, you can't do that at creation, but you can do that immediately after.

Alex Spence

In my still limited experience with Raven, specifically trying to work with bundles like replication and versioning. I have noticed that its not very straightforward to accomplish certain functionality without using the studio.

This specific feature is not that big of a deal to us, but we would really love to see functionality like this be configurable without going through the UI.

Ayende Rahien

Alex, ALL of RavenDB functionality is exposed via REST interface, and you can do absolutely everything the studio does. After all, the studio just uses HTTP to talk to RavenDB himself, it is not a privileged client.

Ayende Rahien

Alex, In other words, anything that you can do through the UI can be done in code, and pretty easily, at that.

Damian Hickey

RavenDB already caches compiled indexes ( https://github.com/ayende/ravendb/blob/master/Raven.Database/Linq/QueryParsingUtils.cs#L334 , discussion https://groups.google.com/d/msg/ravendb/hsMc4lLnaXU/h0WRLOYog9EJ ) which makes second and subsequent test runs that use create that index much faster.

I'm wondering if it would be possible to configure the indexes to be lazily compiled? That is, compiled and loaded when first queried?

Am currently doing system acceptance tests where we have an increasing number of indexes and am experiencing some time pain (20-30s +) on single test runs.

Ayende Rahien

Damian, There is really no cost in doing the compilation (it happens once, and that is it.)

Ayende Rahien

Oh, you are talking about the cost per test run, right? I was thinking about production runs, actually. In that case, can't you handle this via the index compilation caching that we already have?

Damian Hickey

Yes, the cost per test run, where I am run one test at a time, in the usual TDD(-ish) scenario. The index compilation caching (which is great) only kicks in when I run 2 or more tests per session. http://i.imgur.com/38DF0fc.png - second test benefits from the caching.

My other approach is to be able to supply a predicate to my application so the test fixture can configure it to only create indexes that are going to be used. But that means my acceptance test fixtures need to know what indexes may be required which I find to be leaky. (I take a different approach with my unit tests, no problems there)

Yes, it's a development pain and not a production issue. I may be an edge case though.

Ayende Rahien

Damian, In that case, how about implementing on disk caching for this?

Damian Hickey

Yeah, that sounds good too. Generate a hash from the source, use it as the CompilerParameters.OutputAssemblyName and if the assembly already exists on disk (in a location that will exist between test sessions i.e. users temp dir) load it.

Or something like that :)

Damian Hickey

Actually, that may be a nice-to-have from a production pov. An index that is deleted and then re-created, assuming it is exactly the same, would be slightly faster. Don't know how often that would happen though really.

Ayende Rahien

Damian, We have 2K+ tests, most of them with some form of indexes. We run them a LOT. any saving there would be useful in general.

Damian Hickey

Cool. Created the issue: http://issues.hibernatingrhinos.com/issue/RavenDB-969

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats