Feature dependencies chains

Aug 13 2012

Feature dependencies chains

time to read 2 min | 299 words

I wanted to improve the RavenDB query optimizer, so it can merge similar indexes into a single index.

It is actually fairly simple to do (conceptually). But that leaves us with the problem of what to do with the indexes that we just superseded.

The whole point here is to reduce the number of indexes, not to create an index that will never be used.

So the next step was to make sure that we can cleanup unused auto indexes. That is great…

Except that we don’t have any way of tracking unused indexes, so before we could do that, we have to add tracking for query times.

Okay, let us do that.

Wait, we don’t have a place to put those in, and anyway, we don’t want to save it all the time, that would mean IO for every query, and what about concurrent queries?

So we need to add a way to save this, and make sure that there is some batching going on in there, so we won’t go to the disk all the time. And wait, there is this other feature over there that needs something similar, so we might as well solve both problems at once…

Okay, let us add that.

Now, where were we again?

That annoying thing about this? You only realize that those are actually problems when you run some deep analysis on the actual feature request. And usually you have to have multiple cycles to get it all out.

It is also very easy to go down one branch, all the way to the bottom, then go up to another branch, which was opened up because of the work you just did. Make for a good workday, but make it harder to predict what you are going to do today.

Tweet Share Share 5 comments

Tags:

raven

Comments

13 Aug 2012
09:19 AM

Ali Kheyrollahi

+1. We developers live by the code but die with poor problem analysis.

13 Aug 2012
12:37 PM

Rasmus Schultz

I find that, when this happens, sometimes it helps to back up and think about the problem you started out with initially. You're arriving at the "perfect" solution, but sometimes the problem can be simplified if you insist on thinking about the problem itself, resisting the temptation to go "down the rabbit-hole".

In this particular case, dropping unused indexes - the core problem is to identify indexes that haven't been used "in a while"... how accurate does this need to be? It doesn't sound like you actually need a very high degree of precision, just an indicator.

The solution could be as simple as a timer that gets triggered every hour - production systems where there's a substantial benefit to dropping unused indexes, probably run for longer periods without being restarted. Simply track the names of indexes that have been used, in memory. On the hour, increase a counter (in the metadata) on all indexes - then reset the counter on the indexes that were actually used (from the list in memory), then clear the list. Now you have a "staleness" indicator - in effect, a number indicating how many consecutive hours in which an index was unused.

I would not personally suggest that something like this be fully automated, because some systems will have indexes that are only used once a year to run a specific annual report - that doesn't mean those indexes are unimportant, it could be a monster query that makes the system inaccessible to users for hours. Instead, I would suggest allowing somebody to manually issue a command that drops all indexes with a staleness-value greater than x.

13 Aug 2012
17:47 PM

Gene Hughson

I'm working with limited info about your internals, so forgive me if it's a dumb question: rather than creating a new index to merge similar ones, could you detect existing ones that could serve before creating a new one?

14 Aug 2012
04:26 AM

Ayende Rahien

Gene, If you have an index that covers 2 fields, and you have a query that requires 5, you can't use the existing one. You have to create a new one, which supercedes the old one.

14 Aug 2012
12:48 PM

Gene Hughson

I see...I was thinking the opposite case.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB