Raven’s dynamic queries

time to read 3 min | 534 words

One of the pieces of feedback that we got from people frequently enough to be annoying is that the requirement to define indexes upfront before you can query is annoying. For a while, I thought that yes, it is annoying, but so is the need to sleep. And that there isn’t anything much you can do about it.

Then Rob and asked, why do we require users to define indexes when we can gather enough information to do this ourselves? And since I couldn’t think of a good reason why, he went ahead and implemented this.

var activeUsers = from user in session.Query<User>()
                  where user.IsActive == true
                  select user;

You don’t have to define anything, it just works.

There are a few things to note, though.

First, unindexed queries are notorious for working on small data sets and killing systems in production. That was one of the main reasons that I didn’t want to run

Moreover, since RavenDB philosophy is that it should make it pretty hard to shoot yourself in the foot, we figured out how to make this efficient.

RavenDB will look at the query, see that there is no matching index, and create one for you.

Currently, it will create an index per each query type. In the future (as in, by the time you read this, most probably) we will have a query optimizer that can deal with selecting the appropriate index.

Those indexes are going to be temporary indexes, maintained for a short amount of time.
However, if RavenDB notice that you are making a large number of queries to a particular index, it will materialize it and turn it into a permanent index.

In other words, based on your actual system behavior, RavenDB will optimize itself for you :-) !

This feature actually had us stop and think, because it fundamentally changed the way that you work with RavenDB. We actually had to stop and think why you would want to create indexes manually.

As it turned out, there are still a number of reasons why you would want to do that, but they become far more rare:

You want to do apply complex filtering logic or do something to the index output. (For example, you may be interested in aggregating several fields into one searchable item)
You want to use RavenDB’s spatial support.
Aggregations still require defining a map/reduce query.
You want to use the Live Projections feature, which I’ll discuss in my next post.

Tweet Share Share 33 comments

Tags:

Raven

Comments

20 Oct 2010
13:02 PM

Richard Wilde

This all sounds great, one quick question though:-

If a temporary index is created does it not take time to index all the documents hence you may not get the desired results straight away.

20 Oct 2010
13:20 PM

Rob Ashton

It will wait until it has filled up 'a page' of information and then return, the temporary index WILL stay there though, and subsequent queries will get fresher information.

thus, if you really care about fresh data, you can do WaitForNonStaleResultsAsOfNow() on the client and wait for the index to fully take place.

Because new systems tend to have little data this shouldn't be a problem (and if it is, you can still always just use a pre-written index)

20 Oct 2010
13:33 PM

Peter Morris

1: You'd want to be able to define indexes in advance if you already know it will be used heavily later. No point in letting the app decide later to permanently index it.

2: I thought it was bad practise to index booleans in a db? To be honest it doesn't make sense to me that it would be, but I do remember reading it in a book once.

20 Oct 2010
14:01 PM

Ayende Rahien

Richard,

That is the fun with RavenDB's stale-by-default.

Yes, it may take a while, but you know that because the answer is stale so you can either wait for it to be non stale or use the potentially stale results.

20 Oct 2010
14:06 PM

Ayende Rahien

Peter,

1) Not really, no. We actually recommend using the dynamic indexes by default, and letting RavenDB do that for you. Defining indexes yourself if for more complex issues.

2) You are talking about sparse indexes, and that is an implementation detail for the DB.

20 Oct 2010
14:08 PM

Peter Morris

Oren:

If this is a massive query that is run once per month for example as a payroll processing task then the app is really going to suffer isn't it?

20 Oct 2010
14:10 PM

Ayende Rahien

Peter,

Actually, no.

It is BETTER to make this a temporary index, because after a while, it gets cleared out, and doesn't takes resources.

And indexing is always done on the background, anyway.

20 Oct 2010
14:13 PM

Peter Morris

I'm not sure why you would think that. If this query runs once per month and returns data from a table with millions of rows that match a specific criteria then it is going to have to read all rows to create the indexes and then select the data, this would be slower than just selecting without the index there at all; especially as the index will then be dropped before the query is run again next month.

20 Oct 2010
14:16 PM

Richard Wilde

@Ayende - waiting for the "first page to be filled" answers my query. I understand the staleness versus fresh debate and have no problems with it

20 Oct 2010
14:44 PM

Ayende Rahien

Not really, no.

Work is done on the background, so you aren't really paying all that much for it.

Especially since in Raven, readers don't block the writer.

And you CAN do an effectively unidnexed query with Raven, if you want.

20 Oct 2010
14:49 PM

Peter Morris

Here are the scenarios as I see them

A] Allowing a pre-defined index...

Inserts/updates/deletes on that table take a little longer, but the large infrequent query is very fast.

B] Not allowing a pre-defined index...

Inserts/updates/deletes on that table are a bit quicker, but the large infrequent query is much slower because it is running off unindexed data. An index is also created in the background which will never be used.

Why I am wrong?

20 Oct 2010
14:53 PM

Ayende Rahien

Peter,

Because with Raven, those things happen in the background.

You aren't paying for indexes.

In fact, one of RavenDB users has > 500 users and millions of documents, without any ill effect.

20 Oct 2010
15:14 PM

Yitzchok

Wow this is very cool feature Oren. I think it will help people get into RavenDB much faster.

20 Oct 2010
15:27 PM

Stevo

Is there any plan on having some sort of royalty-free distribution license (commercial)? Currently the lack of this keeps us from using the software.

20 Oct 2010
15:58 PM

Peter Morris

I think you are missing my point. If there is no index when the query runs then the user is already paying the price, when Raven then indexes the data in the background after the query has run the server pays the extra cost of generating an index that will not be used.

I just think it would be wise to also let people define indexes.

20 Oct 2010
16:11 PM

Rob Ashton

Peter - this functionality doesn't prevent people from defining indexes up front, the choice is entirely yours

20 Oct 2010
16:39 PM

Jakub Borys

Will temporary indexes be created for Lucene queries as well?

20 Oct 2010
17:00 PM

Nick Aceves

This is probably Raven's single most awesome feature. And that's saying something, because Raven has a boatload of awesome features.

20 Oct 2010
18:34 PM

Rob Ashton

Jakub - yes

You can even query collection properties like

Tags,Name:Fish

Where , denotes that Tags is a collection and Name is the property on the items in the collection you want to query

The only difference is that you include the full path to properties you want to query

User.Address.FirstLine

20 Oct 2010
18:54 PM

josh

This is the sort of improvement that will keep devs like me interested in the .Net world.

Nice Job, Oren & Rob!

20 Oct 2010
20:55 PM

Ayende Rahien

Stevo,

Yes, there is the OEM licensing.

20 Oct 2010
20:57 PM

Ayende Rahien

Peter,

We didn't take away indexes, they are still there.

We are simply giving the option to query without an index and have that done in an efficient manner.

As for querying without an index, you can do that too, with no index created. Not really useful, to tell you the truth, but you can

20 Oct 2010
20:57 PM

Ayende Rahien

Jakub,

Yes, you can do that using Lucene as well

20 Oct 2010
21:01 PM

Peter Morris

That's what I suggested in the first place, why are you arguing with me? :-)

23 Oct 2010
01:03 AM

Steve Gentile

Great stuff Ayende - I really like seeing how Raven is maturing

23 Oct 2010
16:05 PM

scooletz

Finally! Will you provide some kind of strategies, which can be used to configure Raven to determine a lifespan of a created temporary index?

23 Oct 2010
17:48 PM

Ayende Rahien

scooletz ,

Yes, there are already there.

31 Oct 2010
12:07 PM

BrightSoul

Hello, this is awesome stuff, but can I leverage dynamic queries If I don't have server-side entities?

You see, I'm working with JObjects, storing and retrieving them with DatabaseCommands.Put and DatabaseCommands.Set.

session.Query <type() is not what I'm looking for since my asp.net application has no knowledge of "Type".

Then I found the method DatabaseCommands.Query but it's expecting an index name as one of its parameters. I want to avoid defining indexes upfront, if prossible.

What I'd need is an overload of DatabaseCommands.Query that takes no parameters and returns an IRavenQueryable <jobject. That would allow me to do Linq queries using a fluent interface.

Is this possible? Thanks for your answers.

31 Oct 2010
12:16 PM

Ayende Rahien

BirghtSoul,

You can use Advanced.LuceneQuery to do so

31 Oct 2010
12:17 PM

BrightSoul

uhm, angled brackets are not htmlencoded... just to clarify on my previous comment, I meant IRavenQueryable(Of JObject) as the return type of the DatabaseCommands.Query overload; and session.Query(Of Type) as the method I can't use.

31 Oct 2010
13:29 PM

BrightSoul

Thanks, Ayende, for such a quick response!

I still don't get it thought, hehe :D Sorry to bother you... I think I need a line of code to understand.

LuceneQuery is generic method, so I still have to provide a type, which I can't do. Assuming I have to retrieve all the Orders from Italy, I would do the following:

session.Advanced.LuceneQuery(Of Order)().Where("Country:Italy")

But "Order" is a type that doesn't exist in my application. The domain model is actually in another application I know nothing about.

Then, I gave it a shot trying JObject as the type.

session.Advanced.LuceneQuery(Of JObject)().Where("country:Italy")

But as expected, this query didn't return any result since I couldn't instruct it that my documents are inside docs.Orders. The following is the content of the temp index that's being created:

Temp/JObjects/ByCountry

from doc in docs.JObjects

select new { Country = doc.Country }

Correct me if I'm wrong but, the AbstractIndexCreationTask

is generating the frament "docs.JObjects" looking at the Type name, pluralized.

What I need here is the ability to use a string (ie. "docs.Orders"), not a type, to instruct the index creator where my documents exactly are.

31 Oct 2010
13:31 PM

Ayende Rahien

BrightSoul,

Use LuceneQuery(of Object) for now, it should respect the JObject / dynamic, but it doesn't at the moment.

31 Oct 2010
13:32 PM

Ayende Rahien

And it would be better to have this discussion in the mailing list

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB