Ayende @ Rahien

Refunds available at head office

NuGet Perf, Part VI AKA how to be the most popular dev around

So far, we imported the NuGet data to RavenDB and seen how we can get it out for the packages page and then looked into how we can utilize RavenDB features to help us in package search. I think we did a good job there, but we can probably do better still. In this post, I am going to stop showing off things in the Studio and focus on code. In particular, advanced searching options.

We will start from the simplest search possible. Or not, because we are doing full text search and quite a few other things aside even in the base line search. Anyway, here is the skeleton program:

while (true)
{
    Console.Write("Search: ");
    var search = Console.ReadLine();
    if(string.IsNullOrEmpty(search))
    {
        Console.Clear();
        continue;
    }
    using (var session = store.OpenSession())
    {
        var q = session.Query<PackageSearch>("Packages/Search")
            .Search(x => x.Query, search)
            .Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.IsPrerelease == false)
            .As<Package>()
            .OrderByDescending(x => x.DownloadCount).ThenBy(x => x.Created)
            .Take(3);
        var packages = q.ToList();

        foreach (var package in packages)
        {
            Console.WriteLine("\t{0}", package.Id);
        }
    }
}

Now, we are going to run this and see what we get.

image

So far, so good. Now let us try to improve things. What happens when we search for “jquryt”? Nothing is found, and that is actually pretty sad, because to a human, it is obvious what you are trying to search on.

If you have fat fingers and have a tendency to creatively spell words, I am sure you can emphasize with this feeling. Luckily for us, RavenDB is going to help, let us see how:

image

What?!

How did it do that? Well, let us look at the changes in the code, shall we?

private static void PeformQuery(IDocumentSession session, string search, bool guessIfNoResultsFound = true)
{
    var packages = session.Query<PackageSearch>("Packages/Search")
        .Search(x => x.Query, search)
        .Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.IsPrerelease == false)
        .As<Package>()
        .OrderByDescending(x => x.DownloadCount).ThenBy(x => x.Created)
        .Take(3).ToList();

    if (packages.Count > 0)
    {
        foreach (var package in packages)
        {
            Console.WriteLine("\t{0}", package.Id);
        }
    }
    else if(guessIfNoResultsFound)
    {
        DidYouMean(session, search);
    }
    else
    {
        Console.WriteLine("\tNo search results were found");
    }
}

The only major change was the call to DidYouMean(), so let us see what is going on in there.

private static void DidYouMean(IDocumentSession session, string search)
{
    var suggestionQueryResult = session.Query<PackageSearch>("Packages/Search")
        .Search(x => x.Query, search)
        .Suggest();
    switch (suggestionQueryResult.Suggestions.Length)
    {
        case 0:
            Console.WriteLine("\tNo search results were found");
            break;
        case 1:
            // we may have it filtered because of the other conditions, don't recurse again
            Console.WriteLine("\tSearch corrected to: {0}", suggestionQueryResult.Suggestions[0]);
            Console.WriteLine();

            PeformQuery(session, suggestionQueryResult.Suggestions[0], guessIfNoResultsFound: false);
            break;
        default:
            Console.WriteLine("\tDid you mean?");
            foreach (var suggestion in suggestionQueryResult.Suggestions)
            {
                Console.WriteLine("\t - {0} ?", suggestion);
            }
            break;
    }
}

Here, we ask RavenDB, “we couldn’t find anything what we had, can you give me some other ideas?” RavenDB can check the actual data that we have on disk and suggest similar alternative.

In essence, we asked RavenDB for what is nearby, and it provided us with some useful suggestions. Because the suggestions are actually based on the data we have in the db, searches on that will produce correct results.

Note that we have three code paths here, if there is one suggestion, we are going to select that immediately. Let us see how this looks like in practice:

image

Users tend to fall in love with those sort of features, and with RavenDB you can provide them in just a few lines of code and absolutely no hassle.

In my next post (and probably the last in this series) we will discuss even more awesome search features Smile.

Comments

Andreas Kroll
09/04/2012 11:02 AM by
Andreas Kroll

How would you say it? "It just works!"

Awesome. Not only does the NOSQL approach bring massive performance gains, but also additional and useful benefits from using lucene as query language.

I don't know if the NuGet data is hosted on a single SQL machine, but I think you should also show how easy it would be to load-balance over multiple servers with replication. If I understood RavenDB correctly the replication is more or less a configuration item stored in the database itself.

Thanks for the "real life" example. I think this is where developers can see the real gain of using RavenDB.

Pure Krome
09/04/2012 11:41 AM by
Pure Krome

This needs a meme because RavenDb kick ass!

http://cdn.memegenerator.net/instances/400x/26194023.jpg

Simon Skov Boisen
09/04/2012 12:04 PM by
Simon Skov Boisen

Awesome stuff though it's worth mentioning that it is not so much RavenDB as it is Lucene that makes the search correction possible. Correct?

Ayende Rahien
09/04/2012 12:10 PM by
Ayende Rahien

Simon, We rely on Lucene for some stuff, in this case, it is actually RavenDB that submitted patches to get this feature working properly.

Simon Skov Boisen
09/04/2012 12:28 PM by
Simon Skov Boisen

Awesome that you helped improve Lucene in order bring great features to RavenDB, is it Lucene.net your using?

Felipe Fujiy Pessoto
09/04/2012 01:02 PM by
Felipe Fujiy Pessoto

Lucene.net accepts C# contrib? Or do you need to write in Java and send to original Lucene?

Lucene.net is still active? The last version is 2.9.4, and the original Lucene has 3.0 to 3.6 stable versions and a 4.0 Beta.

Ayende Rahien
09/04/2012 01:04 PM by
Ayende Rahien

Felipe, Lucene.NET is fairly active. It just went 3.0.3, I think. We didn't try to submit an upstream java patch, just one to the C# port.

Igor Kalders
09/04/2012 02:09 PM by
Igor Kalders

Watch your step. Since Google buys at least one company a week, you might be overtaken shortly. Does is actually make sense to compare BigTable to RavenDB?

Ayende Rahien
09/04/2012 02:11 PM by
Ayende Rahien

Igor, Totally different things all together.

Judah Gabriel Himango
09/04/2012 02:45 PM by
Judah Gabriel Himango

Oren,

Very cool! Feature indeed.

Can I ask why we require static indexes in order to use .Suggest?

I am giving a Raven talk at Twin Cities Code Camp, and I'm bummed that I have to first explain static vs dynamic indexes before ever touching on this cool .Suggest feature.

Ayende Rahien
09/04/2012 02:49 PM by
Ayende Rahien

Judah, Temp indexes can come & go. Also, temp indexes don't do analyzed, only full term comparisons.

Judah Gabriel Himango
09/04/2012 03:32 PM by
Judah Gabriel Himango

Fair enough.

Tangential: Did you see @kellabyte twitter feed last night? She doesn't like that Raven calls itself ACID. https://twitter.com/JudahGabriel/status/242880337410150400

dotnetchris
09/05/2012 03:05 PM by
dotnetchris

@Judah i responded to that tweet

Comments have been closed on this topic.