Raven’s dynamic queries
One of the pieces of feedback that we got from people frequently enough to be annoying is that the requirement to define indexes upfront before you can query is annoying. For a while, I thought that yes, it is annoying, but so is the need to sleep. And that there isn’t anything much you can do about it.
Then Rob and asked, why do we require users to define indexes when we can gather enough information to do this ourselves? And since I couldn’t think of a good reason why, he went ahead and implemented this.
var activeUsers = from user in session.Query<User>() where user.IsActive == true select user;
You don’t have to define anything, it just works.
There are a few things to note, though.
First, unindexed queries are notorious for working on small data sets and killing systems in production. That was one of the main reasons that I didn’t want to run
Moreover, since RavenDB philosophy is that it should make it pretty hard to shoot yourself in the foot, we figured out how to make this efficient.
- RavenDB will look at the query, see that there is no matching index, and create one for you.
- Currently, it will create an index per each query type. In the future (as in, by the time you read this, most probably) we will have a query optimizer that can deal with selecting the appropriate index.
- Those indexes are going to be temporary indexes, maintained for a short amount of time.
- However, if RavenDB notice that you are making a large number of queries to a particular index, it will materialize it and turn it into a permanent index.
In other words, based on your actual system behavior, RavenDB will optimize itself for you :-) !
This feature actually had us stop and think, because it fundamentally changed the way that you work with RavenDB. We actually had to stop and think why you would want to create indexes manually.
As it turned out, there are still a number of reasons why you would want to do that, but they become far more rare:
- You want to do apply complex filtering logic or do something to the index output. (For example, you may be interested in aggregating several fields into one searchable item)
- You want to use RavenDB’s spatial support.
- Aggregations still require defining a map/reduce query.
- You want to use the Live Projections feature, which I’ll discuss in my next post.
Comments
This all sounds great, one quick question though:-
If a temporary index is created does it not take time to index all the documents hence you may not get the desired results straight away.
It will wait until it has filled up 'a page' of information and then return, the temporary index WILL stay there though, and subsequent queries will get fresher information.
thus, if you really care about fresh data, you can do WaitForNonStaleResultsAsOfNow() on the client and wait for the index to fully take place.
Because new systems tend to have little data this shouldn't be a problem (and if it is, you can still always just use a pre-written index)
1: You'd want to be able to define indexes in advance if you already know it will be used heavily later. No point in letting the app decide later to permanently index it.
2: I thought it was bad practise to index booleans in a db? To be honest it doesn't make sense to me that it would be, but I do remember reading it in a book once.
Richard,
That is the fun with RavenDB's stale-by-default.
Yes, it may take a while, but you know that because the answer is stale so you can either wait for it to be non stale or use the potentially stale results.
Peter,
1) Not really, no. We actually recommend using the dynamic indexes by default, and letting RavenDB do that for you. Defining indexes yourself if for more complex issues.
2) You are talking about sparse indexes, and that is an implementation detail for the DB.
Oren:
If this is a massive query that is run once per month for example as a payroll processing task then the app is really going to suffer isn't it?
Peter,
Actually, no.
It is BETTER to make this a temporary index, because after a while, it gets cleared out, and doesn't takes resources.
And indexing is always done on the background, anyway.
I'm not sure why you would think that. If this query runs once per month and returns data from a table with millions of rows that match a specific criteria then it is going to have to read all rows to create the indexes and then select the data, this would be slower than just selecting without the index there at all; especially as the index will then be dropped before the query is run again next month.
@Ayende - waiting for the "first page to be filled" answers my query. I understand the staleness versus fresh debate and have no problems with it
Not really, no.
Work is done on the background, so you aren't really paying all that much for it.
Especially since in Raven, readers don't block the writer.
And you CAN do an effectively unidnexed query with Raven, if you want.
Here are the scenarios as I see them
A] Allowing a pre-defined index...
Inserts/updates/deletes on that table take a little longer, but the large infrequent query is very fast.
B] Not allowing a pre-defined index...
Inserts/updates/deletes on that table are a bit quicker, but the large infrequent query is much slower because it is running off unindexed data. An index is also created in the background which will never be used.
Why I am wrong?
Peter,
Because with Raven, those things happen in the background.
You aren't paying for indexes.
In fact, one of RavenDB users has > 500 users and millions of documents, without any ill effect.
Wow this is very cool feature Oren. I think it will help people get into RavenDB much faster.
Is there any plan on having some sort of royalty-free distribution license (commercial)? Currently the lack of this keeps us from using the software.
I think you are missing my point. If there is no index when the query runs then the user is already paying the price, when Raven then indexes the data in the background after the query has run the server pays the extra cost of generating an index that will not be used.
I just think it would be wise to also let people define indexes.
Peter - this functionality doesn't prevent people from defining indexes up front, the choice is entirely yours
Will temporary indexes be created for Lucene queries as well?
This is probably Raven's single most awesome feature. And that's saying something, because Raven has a boatload of awesome features.
Jakub - yes
You can even query collection properties like
Tags,Name:Fish
Where , denotes that Tags is a collection and Name is the property on the items in the collection you want to query
The only difference is that you include the full path to properties you want to query
User.Address.FirstLine
This is the sort of improvement that will keep devs like me interested in the .Net world.
Nice Job, Oren & Rob!
Stevo,
Yes, there is the OEM licensing.
Peter,
We didn't take away indexes, they are still there.
We are simply giving the option to query without an index and have that done in an efficient manner.
As for querying without an index, you can do that too, with no index created. Not really useful, to tell you the truth, but you can
Jakub,
Yes, you can do that using Lucene as well
That's what I suggested in the first place, why are you arguing with me? :-)
Great stuff Ayende - I really like seeing how Raven is maturing
Finally! Will you provide some kind of strategies, which can be used to configure Raven to determine a lifespan of a created temporary index?
scooletz ,
Yes, there are already there.
Hello, this is awesome stuff, but can I leverage dynamic queries If I don't have server-side entities?
You see, I'm working with JObjects, storing and retrieving them with DatabaseCommands.Put and DatabaseCommands.Set.
session.Query <type() is not what I'm looking for since my asp.net application has no knowledge of "Type".
Then I found the method DatabaseCommands.Query but it's expecting an index name as one of its parameters. I want to avoid defining indexes upfront, if prossible.
What I'd need is an overload of DatabaseCommands.Query that takes no parameters and returns an IRavenQueryable <jobject. That would allow me to do Linq queries using a fluent interface.
Is this possible? Thanks for your answers.
BirghtSoul,
You can use Advanced.LuceneQuery to do so
uhm, angled brackets are not htmlencoded... just to clarify on my previous comment, I meant IRavenQueryable(Of JObject) as the return type of the DatabaseCommands.Query overload; and session.Query(Of Type) as the method I can't use.
Thanks, Ayende, for such a quick response!
I still don't get it thought, hehe :D Sorry to bother you... I think I need a line of code to understand.
LuceneQuery is generic method, so I still have to provide a type, which I can't do. Assuming I have to retrieve all the Orders from Italy, I would do the following:
session.Advanced.LuceneQuery(Of Order)().Where("Country:Italy")
But "Order" is a type that doesn't exist in my application. The domain model is actually in another application I know nothing about.
Then, I gave it a shot trying JObject as the type.
session.Advanced.LuceneQuery(Of JObject)().Where("country:Italy")
But as expected, this query didn't return any result since I couldn't instruct it that my documents are inside docs.Orders. The following is the content of the temp index that's being created:
Temp/JObjects/ByCountry
from doc in docs.JObjects
select new { Country = doc.Country }
Correct me if I'm wrong but, the AbstractIndexCreationTask
is generating the frament "docs.JObjects" looking at the Type name, pluralized.
What I need here is the ability to use a string (ie. "docs.Orders"), not a type, to instruct the index creator where my documents exactly are.
BrightSoul,
Use LuceneQuery(of Object) for now, it should respect the JObject / dynamic, but it doesn't at the moment.
And it would be better to have this discussion in the mailing list
Comment preview