Full Text Search takes you only so far

Oct 12 2011

Full Text Search takes you only so far

time to read 5 min | 965 words

A few weeks ago I had a really interesting engagement with a customer. They were using RavenDB to do some interesting searches, and eventually they hit a wall with what they were trying to do.

For simplicity sake, we will say that the customer wanted to allow users to search for books. The scenario is something like this (totally different domain, obviously) and the client isn’t Amazon, they are just a good place to get the screen shot from:

Sure, the suggest feature is really nice, but what the customer really cared about is being able to search on the whole set of options.

In their field, people usually write the book name using one of the following formats:

Author First Name, Author Last Name – Book Title, Year
Year, Book Title, Author Last Name
Author Last Name, Book Title, Year

And a bunch of other options.

Also, they want to offer a free text search option.

Also, it had to be fast. They already had an existing system that worked, but had unacceptably high latency for most queries and had… issues under load. The first approach they tried was just moving to RavenDB, enabling full text search and seeing what it got them. It got them something, but not nearly enough.

When I started looking at the problem, I had several recommendation, none of them had much of anything to do with full text search. They were mostly around just being smarter in understanding the user.

To start with, given that most of the information was in one of a small number of formats, there was really no reason not to build a parser for that information. When you actually know what fields you are looking for, you can provide much better information for the user, than if you are just doing brute force full text search.

So, instead of issuing a query like this:

RavenSession.Query<Books_FullText.Result, Books_FullText>()
   .Search(x=> x.Result, searchTermFromUser)
   .ToList();

Which can work, but can’t really take advantage of your knowledge of the domain and the users, you will do something like this:

var parseResult = new BooksQueryParser(Context).Parse(searchTermFromUser);
if( parseResult.Success )
{
  var q = RavenSession.Query<Books_FullText.Result, Books_FullText>()
  parseResule.ApplyOn(q);
  // would do things like
  // q.Search(x=>x.Title , "the lost fleet");
  // q.Search(x=>x.Author, "jack campbell");
  return q.ToList();
}
else // fall back, do a full text search, because there isn't anything else to do
{
  return RavenSession.Query<Books_FullText.Result, Books_FullText>()
   .Search(x=> x.Result, searchTermFromUser)
   .ToList();
}

RavenDB can’t do that for you. It can provide awesome full text support, but if you guide it in this manner, it would be tremendously more helpful.

The next stage is to actually learn from your users. Whenever you users make a search, you are going to record it. In fact, you are going to track the entire interaction. It will end up looking something like this:

{ // searchInteractions/4833424
  "User": "users/93432",
  "Terms" [
    "the last feetl",
    "the lost fast",
    "the lost fleet"
  ],
  "FollowedTo": "books/40273498723"

}

In this case, the sample data shows typos, but in the customer scenario, those would be the user trying different ways to format the actual valid search, to find something that the system recognizes.

What is important is that if you can’t find a search result with high enough ranking (for example, if you failed to parse the search terms), you can now do several fairly intelligent things.

You can search for similar searches made by other users, there is a high likelihood that the same search term was tried before, the user then corrected his typos / formatting errors and then found what they wanted. The next user that run into this can benefit from this experience. You can also suggest to the user “did you mean ?“ when you can’t find a good result for the search query.

Note that the interactions always ends when the user has selected an appropriate result. This is the user’s way of telling you, “this is what I meant”, you should learn from it.

In all, I don’t think that either suggestion is truly ground breaking, but together they can result in a huge leap for the usability of the search feature. And for that particular client, the search feature is Major.

Tweet Share Share 4 comments

Tags:

Design
Raven

Comments

12 Oct 2011
10:55 AM

Scooletz

"To start with, given that most of the information was in one of a small number of format, there was really no reason not to build a parser for that information." LinkedIn does pretty amazing stuff with analyzing queries before quering their indexes. Ok, they're big, but making the query semantic makes sense to me.

12 Oct 2011
12:14 PM

Frank Quednau

What happened to the Event Aggregation post? My newsreader already cached it, but here it is gone...

12 Oct 2011
12:42 PM

Ayende Rahien

Frank, We made a huge amount of changes in the architecture, it wasn't relevant any longer, so I removed it. I'll post more about the actual system architecture later.

12 Oct 2011
14:17 PM

configurator

Too easy to game this system, I think, making for example searching for "Worst software developer ever" Did you mean, "Ayende Rahien?"

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB