Ayende @ Rahien

It's a girl

Rob’s Sprint: Idly indexing

During Rob Ashton’s visit to our secret lair, we did some work on hard problems. One of those problems was the issue of index prioritization. As I have discussed before, this is something that isn’t really easy to do, because of the associated IO costs with not indexing properly.

With Rob’s help, we have the defined the following:

  • An auto index can be set to idle if it hasn’t been queried for a time.
  • An index can be forced to be idle by the user.
  • An index that was automatically set to idle will be set to normal on its first query.

What are the implications for that? And idle index will not be indexed by RavenDB during the normal course of things. Only when the database is idle for a period of time (by default, about 10 minutes with no writes) will we actually get it indexing.

Idle indexing will continue indexing as long as there is no other activity that require their resources. When that happens, they will complete their current run and continue to wait for the database to become idle again.

But wait, there is more. In addition to introducing the notion of idle indexes, we have also created another two types of indexes. The first is pretty obvious, the disabled index will use no system resources and will never take part in indexing. This is mostly there so you can manually shut down a single index. For example, maybe it is a very expensive one and you want to stop it while you are doing an import.

More interesting, however, is the concept on an abandoned index. Even idle indexes can take some system resources, so we have added another level beyond that, an abandoned index is one that hasn’t been queried in 72 hours. At that point, RavenDB is going to avoid indexing it even during idle periods. It will still get indexed, but only if there has been a long enough time passed since the last time it was indexed.

Next, we will discuss why this feature was a crucial step in the way to killing temporary indexes.

Comments

Patrik Potocki
02/28/2013 12:12 PM by
Patrik Potocki

Cool,

When will you push it into the unstable branch so we can test it out?

configurator
02/28/2013 02:32 PM by
configurator

"an abandoned index is one that hasn’t been queried in 72 hours" - so a weekly report will never be up to date?

Also, why do idle indexes wait for 10 minutes of inactivity instead of just working only when all other indexes are up to date?

Chris Marisic
02/28/2013 02:45 PM by
Chris Marisic

"An index that was automatically set to idle will be set to normal on its first query."

What if you want the index to always be an idle index? Like a reporting index that pulls tons of things together, or a crazy reporting map/reduce that is not relevant to OLTP functionality at all?

Rob Ashton
02/28/2013 03:42 PM by
Rob Ashton

Chris - while not covered explicitly in the entry above, there is a flag to "force idle" and this will be exposed in the studio

Alex Spence
02/28/2013 11:33 PM by
Alex Spence

Can we get a way to set these flags on the index creators as well?

Ayende Rahien
03/01/2013 12:04 AM by
Ayende Rahien

Patrik, This is already available at: http://hibernatingrhinos.com/builds/ravendb-unstable-v2.5

Ayende Rahien
03/01/2013 12:05 AM by
Ayende Rahien

Configurator, You can force an index to not go into idle / abandoned mode. But in general, if you have an index that is queried weekly, you can afford to wake it up and then wait for it to catch up.

Ayende Rahien
03/01/2013 12:06 AM by
Ayende Rahien

Configurator, And the reason we wait for 10 minutes on inactivity is that we don't want to get into: "we have 1 second of rest, let us start indexing all the idle indexes, which can be VERY expensive".

Ayende Rahien
03/01/2013 12:07 AM by
Ayende Rahien

Alex, No, you can't do that at creation, but you can do that immediately after.

Alex Spence
03/01/2013 01:18 AM by
Alex Spence

In my still limited experience with Raven, specifically trying to work with bundles like replication and versioning. I have noticed that its not very straightforward to accomplish certain functionality without using the studio.

This specific feature is not that big of a deal to us, but we would really love to see functionality like this be configurable without going through the UI.

Ayende Rahien
03/01/2013 07:18 AM by
Ayende Rahien

Alex, ALL of RavenDB functionality is exposed via REST interface, and you can do absolutely everything the studio does. After all, the studio just uses HTTP to talk to RavenDB himself, it is not a privileged client.

Ayende Rahien
03/01/2013 07:18 AM by
Ayende Rahien

Alex, In other words, anything that you can do through the UI can be done in code, and pretty easily, at that.

Damian Hickey
03/01/2013 11:48 AM by
Damian Hickey

RavenDB already caches compiled indexes ( https://github.com/ayende/ravendb/blob/master/Raven.Database/Linq/QueryParsingUtils.cs#L334 , discussion https://groups.google.com/d/msg/ravendb/hsMc4lLnaXU/h0WRLOYog9EJ ) which makes second and subsequent test runs that use create that index much faster.

I'm wondering if it would be possible to configure the indexes to be lazily compiled? That is, compiled and loaded when first queried?

Am currently doing system acceptance tests where we have an increasing number of indexes and am experiencing some time pain (20-30s +) on single test runs.

Ayende Rahien
03/01/2013 11:51 AM by
Ayende Rahien

Damian, There is really no cost in doing the compilation (it happens once, and that is it.)

Ayende Rahien
03/01/2013 11:52 AM by
Ayende Rahien

Oh, you are talking about the cost per test run, right? I was thinking about production runs, actually. In that case, can't you handle this via the index compilation caching that we already have?

Damian Hickey
03/01/2013 12:16 PM by
Damian Hickey

Yes, the cost per test run, where I am run one test at a time, in the usual TDD(-ish) scenario. The index compilation caching (which is great) only kicks in when I run 2 or more tests per session. http://i.imgur.com/38DF0fc.png - second test benefits from the caching.

My other approach is to be able to supply a predicate to my application so the test fixture can configure it to only create indexes that are going to be used. But that means my acceptance test fixtures need to know what indexes may be required which I find to be leaky. (I take a different approach with my unit tests, no problems there)

Yes, it's a development pain and not a production issue. I may be an edge case though.

Ayende Rahien
03/01/2013 12:24 PM by
Ayende Rahien

Damian, In that case, how about implementing on disk caching for this?

Damian Hickey
03/01/2013 12:37 PM by
Damian Hickey

Yeah, that sounds good too. Generate a hash from the source, use it as the CompilerParameters.OutputAssemblyName and if the assembly already exists on disk (in a location that will exist between test sessions i.e. users temp dir) load it.

Or something like that :)

Damian Hickey
03/01/2013 12:41 PM by
Damian Hickey

Actually, that may be a nice-to-have from a production pov. An index that is deleted and then re-created, assuming it is exactly the same, would be slightly faster. Don't know how often that would happen though really.

Ayende Rahien
03/01/2013 01:05 PM by
Ayende Rahien

Damian, We have 2K+ tests, most of them with some form of indexes. We run them a LOT. any saving there would be useful in general.

Damian Hickey
03/01/2013 02:27 PM by
Damian Hickey

Cool. Created the issue: http://issues.hibernatingrhinos.com/issue/RavenDB-969

Comments have been closed on this topic.