Ayende @ Rahien

It's a girl

Raven’s dynamic queries

One of the pieces of feedback that we got from people frequently enough to be annoying is that the requirement to define indexes upfront before you can query is annoying. For a while, I thought that yes, it is annoying, but so is the need to sleep. And that there isn’t anything much you can do about it.

Then Rob and asked, why do we require users to define indexes when we can gather enough information to do this ourselves? And since I couldn’t think of a good reason why, he went ahead and implemented this.

var activeUsers = from user in session.Query<User>()
                  where user.IsActive == true
                  select user;

You don’t have to define anything, it just works.

There are a few things to note, though.

First, unindexed queries are notorious for working on small data sets and killing systems in production. That was one of the main reasons that I didn’t want to run

Moreover, since RavenDB philosophy is that it should make it pretty hard to shoot yourself in the foot, we figured out how to make this efficient.

  • RavenDB will look at the query, see that there is no matching index, and create one for you.
    • Currently, it will create an index per each query type. In the future (as in, by the time you read this, most probably) we will have a query optimizer that can deal with selecting the appropriate index.
  • Those indexes are going to be temporary indexes, maintained for a short amount of time.
  • However, if RavenDB notice that you are making a large number of queries to a particular index, it will materialize it and turn it into a permanent index.

In other words, based on your actual system behavior, RavenDB will optimize itself for you :-) !

This feature actually had us stop and think, because it fundamentally changed the way that you work with RavenDB. We actually had to stop and think why you would want to create indexes manually.

As it turned out, there are still a number of reasons why you would want to do that, but they become far more rare:

  • You want to do apply complex filtering logic or do something to the index output. (For example, you may be interested in aggregating several fields into one searchable item)
  • You want to use RavenDB’s spatial support.
  • Aggregations still require defining a map/reduce query.
  • You want to use the Live Projections feature, which I’ll discuss in my next post.
Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Richard Wilde
10/20/2010 01:02 PM by
Richard Wilde

This all sounds great, one quick question though:-

If a temporary index is created does it not take time to index all the documents hence you may not get the desired results straight away.

Rob Ashton
10/20/2010 01:20 PM by
Rob Ashton

It will wait until it has filled up 'a page' of information and then return, the temporary index WILL stay there though, and subsequent queries will get fresher information.

thus, if you really care about fresh data, you can do WaitForNonStaleResultsAsOfNow() on the client and wait for the index to fully take place.

Because new systems tend to have little data this shouldn't be a problem (and if it is, you can still always just use a pre-written index)

Peter Morris
10/20/2010 01:33 PM by
Peter Morris

1: You'd want to be able to define indexes in advance if you already know it will be used heavily later. No point in letting the app decide later to permanently index it.

2: I thought it was bad practise to index booleans in a db? To be honest it doesn't make sense to me that it would be, but I do remember reading it in a book once.

Ayende Rahien
10/20/2010 02:01 PM by
Ayende Rahien

Richard,

That is the fun with RavenDB's stale-by-default.

Yes, it may take a while, but you know that because the answer is stale so you can either wait for it to be non stale or use the potentially stale results.

Ayende Rahien
10/20/2010 02:06 PM by
Ayende Rahien

Peter,

1) Not really, no. We actually recommend using the dynamic indexes by default, and letting RavenDB do that for you. Defining indexes yourself if for more complex issues.

2) You are talking about sparse indexes, and that is an implementation detail for the DB.

Peter Morris
10/20/2010 02:08 PM by
Peter Morris

Oren:

If this is a massive query that is run once per month for example as a payroll processing task then the app is really going to suffer isn't it?

Ayende Rahien
10/20/2010 02:10 PM by
Ayende Rahien

Peter,

Actually, no.

It is BETTER to make this a temporary index, because after a while, it gets cleared out, and doesn't takes resources.

And indexing is always done on the background, anyway.

Peter Morris
10/20/2010 02:13 PM by
Peter Morris

I'm not sure why you would think that. If this query runs once per month and returns data from a table with millions of rows that match a specific criteria then it is going to have to read all rows to create the indexes and then select the data, this would be slower than just selecting without the index there at all; especially as the index will then be dropped before the query is run again next month.

Richard Wilde
10/20/2010 02:16 PM by
Richard Wilde

@Ayende - waiting for the "first page to be filled" answers my query. I understand the staleness versus fresh debate and have no problems with it

Ayende Rahien
10/20/2010 02:44 PM by
Ayende Rahien

Not really, no.

Work is done on the background, so you aren't really paying all that much for it.

Especially since in Raven, readers don't block the writer.

And you CAN do an effectively unidnexed query with Raven, if you want.

Peter Morris
10/20/2010 02:49 PM by
Peter Morris

Here are the scenarios as I see them

A] Allowing a pre-defined index...

Inserts/updates/deletes on that table take a little longer, but the large infrequent query is very fast.

B] Not allowing a pre-defined index...

Inserts/updates/deletes on that table are a bit quicker, but the large infrequent query is much slower because it is running off unindexed data. An index is also created in the background which will never be used.

Why I am wrong?

Ayende Rahien
10/20/2010 02:53 PM by
Ayende Rahien

Peter,

Because with Raven, those things happen in the background.

You aren't paying for indexes.

In fact, one of RavenDB users has > 500 users and millions of documents, without any ill effect.

Yitzchok
10/20/2010 03:14 PM by
Yitzchok

Wow this is very cool feature Oren. I think it will help people get into RavenDB much faster.

Stevo
10/20/2010 03:27 PM by
Stevo

Is there any plan on having some sort of royalty-free distribution license (commercial)? Currently the lack of this keeps us from using the software.

Peter Morris
10/20/2010 03:58 PM by
Peter Morris

I think you are missing my point. If there is no index when the query runs then the user is already paying the price, when Raven then indexes the data in the background after the query has run the server pays the extra cost of generating an index that will not be used.

I just think it would be wise to also let people define indexes.

Rob Ashton
10/20/2010 04:11 PM by
Rob Ashton

Peter - this functionality doesn't prevent people from defining indexes up front, the choice is entirely yours

Jakub Borys
10/20/2010 04:39 PM by
Jakub Borys

Will temporary indexes be created for Lucene queries as well?

Nick Aceves
10/20/2010 05:00 PM by
Nick Aceves

This is probably Raven's single most awesome feature. And that's saying something, because Raven has a boatload of awesome features.

Rob Ashton
10/20/2010 06:34 PM by
Rob Ashton

Jakub - yes

You can even query collection properties like

Tags,Name:Fish

Where , denotes that Tags is a collection and Name is the property on the items in the collection you want to query

The only difference is that you include the full path to properties you want to query

User.Address.FirstLine

josh
10/20/2010 06:54 PM by
josh

This is the sort of improvement that will keep devs like me interested in the .Net world.

Nice Job, Oren & Rob!

Ayende Rahien
10/20/2010 08:55 PM by
Ayende Rahien

Stevo,

Yes, there is the OEM licensing.

Ayende Rahien
10/20/2010 08:57 PM by
Ayende Rahien

Peter,

We didn't take away indexes, they are still there.

We are simply giving the option to query without an index and have that done in an efficient manner.

As for querying without an index, you can do that too, with no index created. Not really useful, to tell you the truth, but you can

Ayende Rahien
10/20/2010 08:57 PM by
Ayende Rahien

Jakub,

Yes, you can do that using Lucene as well

Peter Morris
10/20/2010 09:01 PM by
Peter Morris

That's what I suggested in the first place, why are you arguing with me? :-)

Steve Gentile
10/23/2010 01:03 AM by
Steve Gentile

Great stuff Ayende - I really like seeing how Raven is maturing

scooletz
10/23/2010 04:05 PM by
scooletz

Finally! Will you provide some kind of strategies, which can be used to configure Raven to determine a lifespan of a created temporary index?

Ayende Rahien
10/23/2010 05:48 PM by
Ayende Rahien

scooletz ,

Yes, there are already there.

BrightSoul
10/31/2010 12:07 PM by
BrightSoul

Hello, this is awesome stuff, but can I leverage dynamic queries If I don't have server-side entities?

You see, I'm working with JObjects, storing and retrieving them with DatabaseCommands.Put and DatabaseCommands.Set.

session.Query <type() is not what I'm looking for since my asp.net application has no knowledge of "Type".

Then I found the method DatabaseCommands.Query but it's expecting an index name as one of its parameters. I want to avoid defining indexes upfront, if prossible.

What I'd need is an overload of DatabaseCommands.Query that takes no parameters and returns an IRavenQueryable <jobject. That would allow me to do Linq queries using a fluent interface.

Is this possible? Thanks for your answers.

Ayende Rahien
10/31/2010 12:16 PM by
Ayende Rahien

BirghtSoul,

You can use Advanced.LuceneQuery to do so

BrightSoul
10/31/2010 12:17 PM by
BrightSoul

uhm, angled brackets are not htmlencoded... just to clarify on my previous comment, I meant IRavenQueryable(Of JObject) as the return type of the DatabaseCommands.Query overload; and session.Query(Of Type) as the method I can't use.

BrightSoul
10/31/2010 01:29 PM by
BrightSoul

Thanks, Ayende, for such a quick response!

I still don't get it thought, hehe :D Sorry to bother you... I think I need a line of code to understand.

LuceneQuery is generic method, so I still have to provide a type, which I can't do. Assuming I have to retrieve all the Orders from Italy, I would do the following:

session.Advanced.LuceneQuery(Of Order)().Where("Country:Italy")

But "Order" is a type that doesn't exist in my application. The domain model is actually in another application I know nothing about.

Then, I gave it a shot trying JObject as the type.

session.Advanced.LuceneQuery(Of JObject)().Where("country:Italy")

But as expected, this query didn't return any result since I couldn't instruct it that my documents are inside docs.Orders. The following is the content of the temp index that's being created:

Temp/JObjects/ByCountry

from doc in docs.JObjects

select new { Country = doc.Country }

Correct me if I'm wrong but, the AbstractIndexCreationTask

is generating the frament "docs.JObjects" looking at the Type name, pluralized.

What I need here is the ability to use a string (ie. "docs.Orders"), not a type, to instruct the index creator where my documents exactly are.

Ayende Rahien
10/31/2010 01:31 PM by
Ayende Rahien

BrightSoul,

Use LuceneQuery(of Object) for now, it should respect the JObject / dynamic, but it doesn't at the moment.

Ayende Rahien
10/31/2010 01:32 PM by
Ayende Rahien

And it would be better to have this discussion in the mailing list

Comments have been closed on this topic.