Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 65

filter by tags archive

Searching ain’t simple: solution

time to read 5 min | 845 words

On my last post, I descried the following problem:

image_thumb

And stated that the following trivial solution is the wrong approach to the problem:

select d.* from Designs d 
 join ArchitectsDesigns da on d.Id = da.DesignId
 join Architects a on da.ArchitectId = a.Id
where a.Name = @name

The most obvious reason is actually that we are thinking too linearly. I intentionally showed the problem statement in terms of UI, not in terms of a document specifying what should be done.

The reason for that is that in many cases, a spec document is making assumptions that the developer should not. When working on a system, I like to have drafts of the screens with rough ideas about what is supposed to happen, and not much more.

In this case, let us consider the problem from the point of view of the user. Searching by the architect name makes sense to the user, that is usually how they think about it.

But does it makes sense from the point of view of the system? We want to provide good user experience, which means that we aren’t just going to provide the user with a text box to plug in some values. For one thing, they would have to put in the architect full name as it is stored in our system. That is going to be a tough call in many cases. Ask any architect what the first name of Gaudi is, and see what sort of response you’ll get.

Another problem is how to deal with misspelling, partial names, and other information. What if we actually have the architect id, and are used to type that? I would much rather type 1831 than Mies Van Der Rohe, and most users that work with the application day in and day out would agree.

From the system perspective, we want to divide the problem into two separate issues, finding the architect and finding the appropriate designs. From a user experience perspective, that means that the text box is going to be an ajax suggest box, and the results would be loaded based on valid id.

Using RavenDB and ASP.Net MVC, we would have the following solution. First, we need to define the search index:

image

This gives us the ability to search across both name and id easily, and it allows us to do full text searches as well. The next step is the actual querying for architect by name:

image

Looks complex, doesn’t it? Well, there is certainly a lot of code there, at least.

First, we look for an a matching result in the index. If we find anything, we send just the name and the id of the matching documents to the user. that part is perfectly simple.

The interesting bits happen when we can’t find anything at all. In that case, we ask RavenDB to find us results that might be the things that the user is looking for. It does that by running a string distance algorithm over the data in the database already and providing us with a list of suggestions about what the user might have meant.

We take it one step further. If there is just one suggestion, we assume that this is what the user meant, and just return the results for that value. If there is more than that, we sent an empty result set to the client along with a list of alternatives that they can suggest to the user.

From here, the actual task of getting the designs for this architect becomes as simple as:

image

And it turns out that when you think about it right, searching is simple.


Comments

peter

When you say "It does that by running a string distance algorithm" do you mean you already have this capability implemented in RavenDB? Is it something like this one (what we ended up using inhouse):

https://github.com/lorenzo-stoakes/spell-correct

His implementation discussed here: http://www.codegrunt.co.uk/2010/11/02/C-Sharp-Norvig-Spelling-Corrector.html

Ayende Rahien

Peter, That is done inside RavenDB, yes.

Jason Meckley

"And it turns out that when you think about it right, searching is simple." A large portion of this is because RavenDB does all the heavy lifting for us. All we need to do is map the results from RavenDB to our view model. Zero Friction FTW!

Chris

I think there is a typo in the code second code sample. I'm not sure what "Results = new NameAndId[0]" is.

Please forgive me if I am being dense, but why is it necessary to specify both the query type and the result type in Session.Query<ArchitectsSearch.Result, ArchitectsSearch>? It seems that ArchitectsSearch.Result could be inferred since ArchitectsSearch is declared to be AbstractIndexCreationTask<Architect, ArchitectsSearch.Result>. Is it possible for Session.Query to be called as Session.Query<SomethingThatIsNot_ArchitectsSearch.Result, ArchitectsSearch>?

Ayende Rahien

Chris, Results = new NameAndId[0] -- Create a new empty array of NameAndId

The reason that we have two generic params is that it is NOT possible for us to infer the first parameter from the second, and yes, there are reasons why you would have different values there.

peter

I see, you are using lucene SpellChecker. Can I assume it is using statistics from the indexed ravendb documents to determine order of suggested terms?

Ayende Rahien

Peter, Yes, that is part of that.

Karep

I'd refactor that code. I don't like the return at the end of ArchitectsByName. If there are result return immedately. Guard condition.

Phillip

I don't like the name of the action when it can be searched by id as well as name... nit-picking tho.

I've written something similar for my project, this stuff took way too long to do with a relational database, it really is frictionless.

Paulo

Sorry for the side question but... these screen caps aren't from Visual Studio IDE, are they?

Bill

@Paulo, that looks like Sublime Text 2 - http://www.sublimetext.com/2

Paulo

@Bill, thank you! I'll take a look at it!

Martin Doms

Is that q.Suggest() method provided by RavenDB, or is that an extension method that the application developer would implement? Is it the "string distance algorithm" you mentioned?

Vitaliy

I may be missing the whole idea of the post, but what should be the solution if the label was "diagrams for architects"?

Ofer

Two typo corrections:

  1. Change "descried" to "described"
  2. Change "archiect" to "architect" (in the balsamiq GIF)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 13 hours from now
  2. Production postmortem: The case of the lying configuration file - about one day from now
  3. Production postmortem: The industry at large - 3 days from now
  4. The insidious cost of allocations - 4 days from now
  5. Buffer allocation strategies: A possible solution - 7 days from now

And 4 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    31 Aug 2015 - The case of the memory eater and high load
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats