Ayende @ Rahien

It's a girl

Searching ain’t simple: solution

On my last post, I descried the following problem:

image_thumb

And stated that the following trivial solution is the wrong approach to the problem:

select d.* from Designs d 
 join ArchitectsDesigns da on d.Id = da.DesignId
 join Architects a on da.ArchitectId = a.Id
where a.Name = @name

The most obvious reason is actually that we are thinking too linearly. I intentionally showed the problem statement in terms of UI, not in terms of a document specifying what should be done.

The reason for that is that in many cases, a spec document is making assumptions that the developer should not. When working on a system, I like to have drafts of the screens with rough ideas about what is supposed to happen, and not much more.

In this case, let us consider the problem from the point of view of the user. Searching by the architect name makes sense to the user, that is usually how they think about it.

But does it makes sense from the point of view of the system? We want to provide good user experience, which means that we aren’t just going to provide the user with a text box to plug in some values. For one thing, they would have to put in the architect full name as it is stored in our system. That is going to be a tough call in many cases. Ask any architect what the first name of Gaudi is, and see what sort of response you’ll get.

Another problem is how to deal with misspelling, partial names, and other information. What if we actually have the architect id, and are used to type that? I would much rather type 1831 than Mies Van Der Rohe, and most users that work with the application day in and day out would agree.

From the system perspective, we want to divide the problem into two separate issues, finding the architect and finding the appropriate designs. From a user experience perspective, that means that the text box is going to be an ajax suggest box, and the results would be loaded based on valid id.

Using RavenDB and ASP.Net MVC, we would have the following solution. First, we need to define the search index:

image

This gives us the ability to search across both name and id easily, and it allows us to do full text searches as well. The next step is the actual querying for architect by name:

image

Looks complex, doesn’t it? Well, there is certainly a lot of code there, at least.

First, we look for an a matching result in the index. If we find anything, we send just the name and the id of the matching documents to the user. that part is perfectly simple.

The interesting bits happen when we can’t find anything at all. In that case, we ask RavenDB to find us results that might be the things that the user is looking for. It does that by running a string distance algorithm over the data in the database already and providing us with a list of suggestions about what the user might have meant.

We take it one step further. If there is just one suggestion, we assume that this is what the user meant, and just return the results for that value. If there is more than that, we sent an empty result set to the client along with a list of alternatives that they can suggest to the user.

From here, the actual task of getting the designs for this architect becomes as simple as:

image

And it turns out that when you think about it right, searching is simple.

Comments

peter
03/23/2012 01:59 PM by
peter

When you say "It does that by running a string distance algorithm" do you mean you already have this capability implemented in RavenDB? Is it something like this one (what we ended up using inhouse):

https://github.com/lorenzo-stoakes/spell-correct

His implementation discussed here: http://www.codegrunt.co.uk/2010/11/02/C-Sharp-Norvig-Spelling-Corrector.html

Ayende Rahien
03/23/2012 02:00 PM by
Ayende Rahien

Peter, That is done inside RavenDB, yes.

Jason Meckley
03/23/2012 02:07 PM by
Jason Meckley

"And it turns out that when you think about it right, searching is simple." A large portion of this is because RavenDB does all the heavy lifting for us. All we need to do is map the results from RavenDB to our view model. Zero Friction FTW!

Chris
03/23/2012 02:44 PM by
Chris

I think there is a typo in the code second code sample. I'm not sure what "Results = new NameAndId[0]" is.

Please forgive me if I am being dense, but why is it necessary to specify both the query type and the result type in Session.Query<ArchitectsSearch.Result, ArchitectsSearch>? It seems that ArchitectsSearch.Result could be inferred since ArchitectsSearch is declared to be AbstractIndexCreationTask<Architect, ArchitectsSearch.Result>. Is it possible for Session.Query to be called as Session.Query<SomethingThatIsNot_ArchitectsSearch.Result, ArchitectsSearch>?

Ayende Rahien
03/23/2012 02:58 PM by
Ayende Rahien

Chris, Results = new NameAndId[0] -- Create a new empty array of NameAndId

The reason that we have two generic params is that it is NOT possible for us to infer the first parameter from the second, and yes, there are reasons why you would have different values there.

peter
03/23/2012 03:02 PM by
peter

I see, you are using lucene SpellChecker. Can I assume it is using statistics from the indexed ravendb documents to determine order of suggested terms?

Ayende Rahien
03/23/2012 03:03 PM by
Ayende Rahien

Peter, Yes, that is part of that.

Karep
03/23/2012 07:44 PM by
Karep

I'd refactor that code. I don't like the return at the end of ArchitectsByName. If there are result return immedately. Guard condition.

Phillip
03/23/2012 11:57 PM by
Phillip

I don't like the name of the action when it can be searched by id as well as name... nit-picking tho.

I've written something similar for my project, this stuff took way too long to do with a relational database, it really is frictionless.

Paulo
03/24/2012 01:33 PM by
Paulo

Sorry for the side question but... these screen caps aren't from Visual Studio IDE, are they?

Bill
03/24/2012 05:22 PM by
Bill

@Paulo, that looks like Sublime Text 2 - http://www.sublimetext.com/2

Paulo
03/24/2012 06:08 PM by
Paulo

@Bill, thank you! I'll take a look at it!

Martin Doms
03/24/2012 09:05 PM by
Martin Doms

Is that q.Suggest() method provided by RavenDB, or is that an extension method that the application developer would implement? Is it the "string distance algorithm" you mentioned?

Vitaliy
03/24/2012 09:51 PM by
Vitaliy

I may be missing the whole idea of the post, but what should be the solution if the label was "diagrams for architects"?

Ofer
03/26/2012 07:00 AM by
Ofer

Two typo corrections:

  1. Change "descried" to "described"
  2. Change "archiect" to "architect" (in the balsamiq GIF)
Comments have been closed on this topic.