Ayende @ Rahien

Refunds available at head office

Raccoon Blog and RavenDB–One month later

One of the fun parts about RavenDB is that it will self optimize itself for you depending on how you are using your data.

With this blog, I decided when going live with RavenDB that I would not follow the best practices of ensuring static indexes for everything, but would let it figure it out on its own.

Today, I got curious and decided to check up on that:

image

What you see is pretty interesting.

  • The first three indexes were automatically created by RavenDB in response to queries made on the database.
  • The Raven/* indexes are created by RavenDB itself, for the Raven Studio.
  • The MapReduce indexes are for statistics on the blog, and are the only two that were actually created by the application explicitly.
Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Ajai
06/24/2011 09:27 AM by
Ajai

Very nice!

porges
06/24/2011 09:42 AM by
porges

I've subscribed to your blog for a while now, mainly to read your posts about weird .NET behaviour/bugs... I've ignored all the RavenDB stuff for the most part - until now!

This is very cool, and I'll definitely be reading up on RavenDB because of it :)

Idsa
06/24/2011 09:52 AM by
Idsa

Very intersting. I am curious why relational databases ignore this feature.

Mani
06/24/2011 11:10 AM by
Mani

I would be a bit vary about this feature. Not really a control frea, but I won't be comfortable when a system is creating indexes on its own. What is the cost of maintaining this index and size requirements? How many indexes it will create over a period of time? Do we have option to switch these off?

Itamar
06/24/2011 11:16 AM by
Itamar

@Mani, if you queried the database you probably wanted some info from it, so an index will be created to satisfy your request.

It will be marked as Temp, not persisted to disk, and deleted after a while. Unless you keep querying it consistently for a configurable amount of times in a specified TimeSpan, in which case you probably do care about that index.

Anon
06/24/2011 01:04 PM by
Anon

@Idsa SQL Server has something similar by using its DMV's to suggest missing indexes.

Simon Bartlett
06/24/2011 01:21 PM by
Simon Bartlett

@Mani,

Raven DB requires an index to perform queries. Raven DB can not perform queries with an index.

Simon Bartlett
06/24/2011 01:21 PM by
Simon Bartlett

I meant to say "Raven DB can not perform queries WITHOUT an index."

Ayende Rahien
06/24/2011 04:57 PM by
Ayende Rahien

Idsa, Because with RDBMS, there is a non trivial cost of adding new indexes, in ravendb, we have clients running on systems with 500 indexes with no issues.

Alexander
06/24/2011 05:17 PM by
Alexander

@Ayende, But was is so different between relation and RavenDb indexes?

Ayende Rahien
06/24/2011 09:19 PM by
Ayende Rahien

Alexander, RavenDB makes indexing inexpensive, by moving them to the background. That means that it can add indexes on the fly and self optimize itself

Frans Bouma
06/25/2011 09:05 AM by
Frans Bouma

Inexpensive, yes, but also less ideal: after the transaction has been completed, it's unclear whether the updated data is in the index to use, as it has to be updated in the background (which might take 'some time', e.g. longer than the next query took). This breaks consistency requirements: to get the data which was just updated to be included in the next query, it either has to match a row in the index, or a table / tree scan has to be performed (which mitigates the purpose of an index). RavenDB can't guarantee the data which was just inserted/updated through a transaction is available in the query following directly after the transaction: to be able to do that, it has to update indexes proper to commit, like RDBMSs do.

This is why RDBMSs do it directly, instead of postponing it. Both are fine though, it's just you make it seem like your approach has only advantages and no downsides.

Besides, RDBMSs use statistics as well to optimize queries.

Itamar
06/25/2011 06:02 PM by
Itamar

@Frans, in the case you described, the appropriate indexes will be marked as stale, and RavenDB will make sure to let you know of that. This is by design, and definitely better than how RDBMSes do this.

Ayende Rahien
06/26/2011 11:17 AM by
Ayende Rahien

Frans, Yep, exactly. That is how RavenDB works, this is by design. You can see the previous post about ways to avoid this (at extra cost), but most times, you don't really care about this, so why pay the price?

Mani
06/27/2011 10:32 AM by
Mani

Thanks all, (@Itamar, @Simon) I think we are so used to Relational DB (i.e. creating index and its cost), it needs some time to get the paradigm shift to how NoSql works. But you guys are doing great job

Daniel Lang
07/14/2011 11:33 AM by
Daniel Lang

Ayende,

I've read several times, that it is not advised to user dynamic queries of statically indexed queries. But how big is the actual performance difference in run-time? Did you do any benchmarks?

Ayende Rahien
07/14/2011 10:44 PM by
Ayende Rahien

Daniel, Dynamic queries will go to the best index that matches them, via the query optimizer. The main difference is that for a dynamic query that doesn't have an index, one will be created, and may be stale initially.

Comments have been closed on this topic.