Raccoon Blog and RavenDB–One month later
One of the fun parts about RavenDB is that it will self optimize itself for you depending on how you are using your data.
With this blog, I decided when going live with RavenDB that I would not follow the best practices of ensuring static indexes for everything, but would let it figure it out on its own.
Today, I got curious and decided to check up on that:
What you see is pretty interesting.
- The first three indexes were automatically created by RavenDB in response to queries made on the database.
- The Raven/* indexes are created by RavenDB itself, for the Raven Studio.
- The MapReduce indexes are for statistics on the blog, and are the only two that were actually created by the application explicitly.
Comments
Very nice!
I've subscribed to your blog for a while now, mainly to read your posts about weird .NET behaviour/bugs... I've ignored all the RavenDB stuff for the most part - until now!
This is very cool, and I'll definitely be reading up on RavenDB because of it :)
Very intersting. I am curious why relational databases ignore this feature.
I would be a bit vary about this feature. Not really a control frea, but I won't be comfortable when a system is creating indexes on its own. What is the cost of maintaining this index and size requirements? How many indexes it will create over a period of time? Do we have option to switch these off?
@Mani, if you queried the database you probably wanted some info from it, so an index will be created to satisfy your request.
It will be marked as Temp, not persisted to disk, and deleted after a while. Unless you keep querying it consistently for a configurable amount of times in a specified TimeSpan, in which case you probably do care about that index.
@Idsa SQL Server has something similar by using its DMV's to suggest missing indexes.
@Mani,
Raven DB requires an index to perform queries. Raven DB can not perform queries with an index.
I meant to say "Raven DB can not perform queries WITHOUT an index."
Idsa, Because with RDBMS, there is a non trivial cost of adding new indexes, in ravendb, we have clients running on systems with 500 indexes with no issues.
@Ayende, But was is so different between relation and RavenDb indexes?
Alexander, RavenDB makes indexing inexpensive, by moving them to the background. That means that it can add indexes on the fly and self optimize itself
Inexpensive, yes, but also less ideal: after the transaction has been completed, it's unclear whether the updated data is in the index to use, as it has to be updated in the background (which might take 'some time', e.g. longer than the next query took). This breaks consistency requirements: to get the data which was just updated to be included in the next query, it either has to match a row in the index, or a table / tree scan has to be performed (which mitigates the purpose of an index). RavenDB can't guarantee the data which was just inserted/updated through a transaction is available in the query following directly after the transaction: to be able to do that, it has to update indexes proper to commit, like RDBMSs do.
This is why RDBMSs do it directly, instead of postponing it. Both are fine though, it's just you make it seem like your approach has only advantages and no downsides.
Besides, RDBMSs use statistics as well to optimize queries.
@Frans, in the case you described, the appropriate indexes will be marked as stale, and RavenDB will make sure to let you know of that. This is by design, and definitely better than how RDBMSes do this.
Frans, Yep, exactly. That is how RavenDB works, this is by design. You can see the previous post about ways to avoid this (at extra cost), but most times, you don't really care about this, so why pay the price?
Thanks all, (@Itamar, @Simon) I think we are so used to Relational DB (i.e. creating index and its cost), it needs some time to get the paradigm shift to how NoSql works. But you guys are doing great job
Ayende,
I've read several times, that it is not advised to user dynamic queries of statically indexed queries. But how big is the actual performance difference in run-time? Did you do any benchmarks?
Daniel, Dynamic queries will go to the best index that matches them, via the query optimizer. The main difference is that for a dynamic query that doesn't have an index, one will be created, and may be stale initially.
Comment preview