Relational searching sucks, don’t try to replicate it
This question on Stack Overflow is a fairly common one. Here is the data:
And the question was about how to get RavenDB to create an index that would have the following results:
|
|
|
Now, if you try really hard, you can probably try to get something like that, but that is the wrong way to go about this in RavenDB.
Instead, we can write the following index:
Note that this index is a simple multi map index, it isn’t a multi map/reduce index. There is no need.
This index can return one of three types.
- Car – just show the car to the user
- Person – now that we have a person, we have the id, and we can query for that:
- Unit – now that we have a unit, we have the id, and we can query for that:
This method means that we have to generate an additional query for some cases, but it has a lot of advantages. It is simple. It requires very little work from both client and server and it doesn’t suffer from the usual issues that you run into when you attempt to query over multiple disjointed data sets.
Now, the bad thing about this is that this won’t allow me to query for cross entity values, so it would be hard for me to query for the cars in Hadera owned by Ayende. But in most cases, that isn’t really a requirement. We just want to be able to search by either one of those, not all of them.
Comments
This blog post could use some relevant examples of your Session.Query usage in this context.
Chris, Huh? Those are the exact same queries that you would run.
This is a very simple solution that addresses a very simple data-access pattern. But as you pointed out, this won't let you query for cars in Hadera owned by Ayende - that doesn't sound like an unlikely query at all.
In most of the applications I build, the client expects lots of query freedom across many entities. I'm frequently asked to build multi-tab search-forms, allowing you to filter (and sort) by dozens of related entities, at the same time.
Maybe in your next post, you could show how to query by make, model, name and address, at the same time? That would be much more likely to happen with the clients I work with.
Data access-patterns within many parts of an application may be as simple as what you demonstrated here, but in my experience, they're never that simple throughout the application.
Ayende, could you please explain better what effect is happening when your map returns an array? Usually, the examples we see with indexes have a map that returns a single object. The [] brackets are throwing me off. Thanks.
Rasmus, Actually, I would say that having search forms with that many items is a pretty bad idea. See here for the details: http://ayende.com/blog/152833/orders-search-in-ravendb
And if you wanted to do a query on all the properties at the same time, I would use multi map reduce:
http://ayende.com/blog/89089/ravendb-multi-maps-reduce-indexes
Matt, Because we return an array,we basically tell RavenDB: "Ignore the field names, just index everything under a field named 'Query'" This way, we can get disjointed data set and query over it all.
Every attempt I see to avoid the use of relational databases and SQL just ends up resulting in far more work and effort than would be needed had a relational DB just been used in the first place.
This is a perfect example of how something simple ends up quite convoluted in the NoSQL world. And like your last paragraph states, there are many gotchas that make certain queries difficult, if not impossible.
@Ronaldo - A poor model is going to be difficult to query regardless of your storage medium. While I think there's something to be said for the flexibility a relational database provides when working with models that aren't a natural fit for their problem space, there are many other costs/benefits that need to be taken into consideration. Saying "every attempt" at an alternative storage medium results in "far more work and effort" is using way too broad a brush.
Ronaldo, Please go ahead and try to write this sort of query using a relational DB and your OR/M of choice Then try to make it run fast enough to actually be viable in production.
As I explicitly pointed out in the post, this is a good idea if what you are trying to do is to search for cars based on some piece of data. What is actually much more common for us to do is instead of giving the user dozens of search fields, we give him one, and then allow him to narrow the search easily using Google like search engine capabilities and facets. You'll be surprised how user friendly and easy to use those are, and they are efficient ways to search large amount of data.
I looked at your other post - sure, in this case, the "google metaphor" works well. What you get is a very convenient search - you don't have to think about where to put your search-term. I do usually consider more than one design-approach, and suggest alternatives to clients - I too am a big fan of simple things that are easy to use.
But in some cases, you can't (or shouldn't) simplify things - I can think of two recent projects that had extremely complex search forms, and neither one lends itself to this kind of simplification. One is actually a data-vendor, who sells subscription access to a large dataset via search interfaces that must be extremely flexible and provide very accurate results - getting first and last names or e-mail addresses confused in the results, for example, is totally unacceptable.
Don't get me wrong, but the fact that you can design around problems, doesn't tell me anything about RavenDB. Some problems are not simple, and can't be solved by simplifying... I'm interested in seeing RavenDB stand up to those.
Oren, You mentioned for more complex queries, one should go the multi map with reduce route. The original question on stack overflow is about errors encountered while implementing a multi map reduce. Last I checked no one posted a solution to resolving those errors. Can you please explain what was wrong in that multi map reduce code, why does it throw errors?
+1 to Rasmus's request!
Rasmus, You seem to be confusing two problems. RavenDB can handle multi entity search quite nicely, including scenarios like the one you are describing. But I think that the major problem is that you seem to want to go that way by default. In most cases, this is a remnant of the way things were because that is the only way that you could handle that.
If you care for distinguishing between first / last name - make them distinct. If you want to have a free form search, then don't.
I am presenting options here, don't mistake that to say that those are the only options.
Ayende, I'm not Ronaldo, but in my ORM of choice (BLToolkit) it would look like this: var views = db.Cars.Select(c => new { Car = c, Unit = c.Unit, Person = c.Person });
That should work pretty fast, as you imagine, and also much easier to implement. Not undermining RavenDB though, I've been using it for some time and I love it.
Mihhail, Um, nope. In a relational database, that wouldn't actually work. Translate that to SQL and you can see why it would be expensive. You have to do three or five joins, and then do searches on all those values? And then take into account cartesian product.
Try writing the actual SQL and consider the implicatons
What you showed here can be done in SQL with a series of or-clauses, and I have done so in some cases. I would like you to assume that I have done the design (properly) and focus on the technical solution - I know there is a design-philosophy to which RavenDB is geared, but I am more interested in seeing how you tackle problems with the tool, rather than with design.
+1 +1 to Rasmus's requests. I'm finding similar issues with my migration to RavenDB even with a relatively simple problem domain. Rather than need to change things to suit RavenDB (which is fine to a point - you're probably already designing things around relational database limitations whether it's a conscious thing or not) I'm really interested in seeing RavenDB at work against more real-world problems where you aren't able to change things around the data storage limitations.
"Actually, I would say that having search forms with that many items is a pretty bad idea" Perhaps from the perspective of a developer, but sometimes the problem demands it (even if part of that problem is what the user wants, which is a valid problem).
Rasmus, Try writing the SQL for this query, then try to make it run fast, then try it on a real world data set. I have built similar stuff several times, it is not trivial at all.
In particular, it is NOT just a series of or-clauses.
And given a problem, I usually give a solution, but I would also consider the problem first.
Ferret, Please present your own problem, it is much more interesting than trying to solve an abstract one.
I was referring to the "google" style search you mentioned in your other post, not the one on this page. I don't know what kind of performance-challenges you've faced, but a series of or-clauses has worked for me many times, performed fine, and was trivial to implement. I haven't done any applications with FaceBook, Twitter or Google-level traffic - I would use a very different approach if I was faced with that sort of challenge.
I can appreciate your preference towards solving problems with design first, if possible - that's my personal preference. But I'm afraid some people read your posts and are left with the impression that the hard problems have to be solved with design, in order to make the most of RavenDB - I believe there are some of us who need to see technical solutions to an identified problem, much more so than trying to simplify the problem.
You're the RavenDB guy! I know you're also an architect, but I wish some of your articles would emphasize technical aspects of using RavenDB, and less on the design/philosophy aspects.
In some ways, these articles demonstrate your architectural skills more than they demonstrate the software. And since I'm here to learn about the software, not to hire an architect, I'd prefer to see you attack some of the difficult problems head-on, rather than teaching us how to avoid them by applying design. Even after you've done careful design, I'm sure you're sometimes left with complex problems that can't be reduced any further?
Rasmus, The JOB of an architect is to reduce all problems to annoying little O(n) vs O(1) issues for the devs. It is to simplify things to the utmost possible level.
When asked about a scenario with ravendb, I often find myself having to deal with a lot of preconceptions with regards to the general approach. It is often far simpler to solve the actual problem, rather than give the proposed solution.
I get what you're saying - if I was strictly an architect, that would be a wonderful and fun job. Unfortunately I am also a programmer, and often the client just wants what the client wants, despite my best attempts to sell them a cheaper, simpler, better, faster, more elegant solution. Such is life with clients ;-)
Oren what I stated earlier on this post was you don't show your readers the usage for Session.Query CarSearchIndex
Comment preview