Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,410

filter by tags archive

NuGet Perf, Part VI AKA how to be the most popular dev around


So far, we imported the NuGet data to RavenDB and seen how we can get it out for the packages page and then looked into how we can utilize RavenDB features to help us in package search. I think we did a good job there, but we can probably do better still. In this post, I am going to stop showing off things in the Studio and focus on code. In particular, advanced searching options.

We will start from the simplest search possible. Or not, because we are doing full text search and quite a few other things aside even in the base line search. Anyway, here is the skeleton program:

while (true)
{
    Console.Write("Search: ");
    var search = Console.ReadLine();
    if(string.IsNullOrEmpty(search))
    {
        Console.Clear();
        continue;
    }
    using (var session = store.OpenSession())
    {
        var q = session.Query<PackageSearch>("Packages/Search")
            .Search(x => x.Query, search)
            .Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.IsPrerelease == false)
            .As<Package>()
            .OrderByDescending(x => x.DownloadCount).ThenBy(x => x.Created)
            .Take(3);
        var packages = q.ToList();

        foreach (var package in packages)
        {
            Console.WriteLine("\t{0}", package.Id);
        }
    }
}

Now, we are going to run this and see what we get.

image

So far, so good. Now let us try to improve things. What happens when we search for “jquryt”? Nothing is found, and that is actually pretty sad, because to a human, it is obvious what you are trying to search on.

If you have fat fingers and have a tendency to creatively spell words, I am sure you can emphasize with this feeling. Luckily for us, RavenDB is going to help, let us see how:

image

What?!

How did it do that? Well, let us look at the changes in the code, shall we?

private static void PeformQuery(IDocumentSession session, string search, bool guessIfNoResultsFound = true)
{
    var packages = session.Query<PackageSearch>("Packages/Search")
        .Search(x => x.Query, search)
        .Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.IsPrerelease == false)
        .As<Package>()
        .OrderByDescending(x => x.DownloadCount).ThenBy(x => x.Created)
        .Take(3).ToList();

    if (packages.Count > 0)
    {
        foreach (var package in packages)
        {
            Console.WriteLine("\t{0}", package.Id);
        }
    }
    else if(guessIfNoResultsFound)
    {
        DidYouMean(session, search);
    }
    else
    {
        Console.WriteLine("\tNo search results were found");
    }
}

The only major change was the call to DidYouMean(), so let us see what is going on in there.

private static void DidYouMean(IDocumentSession session, string search)
{
    var suggestionQueryResult = session.Query<PackageSearch>("Packages/Search")
        .Search(x => x.Query, search)
        .Suggest();
    switch (suggestionQueryResult.Suggestions.Length)
    {
        case 0:
            Console.WriteLine("\tNo search results were found");
            break;
        case 1:
            // we may have it filtered because of the other conditions, don't recurse again
            Console.WriteLine("\tSearch corrected to: {0}", suggestionQueryResult.Suggestions[0]);
            Console.WriteLine();

            PeformQuery(session, suggestionQueryResult.Suggestions[0], guessIfNoResultsFound: false);
            break;
        default:
            Console.WriteLine("\tDid you mean?");
            foreach (var suggestion in suggestionQueryResult.Suggestions)
            {
                Console.WriteLine("\t - {0} ?", suggestion);
            }
            break;
    }
}

Here, we ask RavenDB, “we couldn’t find anything what we had, can you give me some other ideas?” RavenDB can check the actual data that we have on disk and suggest similar alternative.

In essence, we asked RavenDB for what is nearby, and it provided us with some useful suggestions. Because the suggestions are actually based on the data we have in the db, searches on that will produce correct results.

Note that we have three code paths here, if there is one suggestion, we are going to select that immediately. Let us see how this looks like in practice:

image

Users tend to fall in love with those sort of features, and with RavenDB you can provide them in just a few lines of code and absolutely no hassle.

In my next post (and probably the last in this series) we will discuss even more awesome search features Smile.


Comments

Andreas Kroll

How would you say it? "It just works!"

Awesome. Not only does the NOSQL approach bring massive performance gains, but also additional and useful benefits from using lucene as query language.

I don't know if the NuGet data is hosted on a single SQL machine, but I think you should also show how easy it would be to load-balance over multiple servers with replication. If I understood RavenDB correctly the replication is more or less a configuration item stored in the database itself.

Thanks for the "real life" example. I think this is where developers can see the real gain of using RavenDB.

Pure Krome

This needs a meme because RavenDb kick ass!

http://cdn.memegenerator.net/instances/400x/26194023.jpg

Simon Skov Boisen

Awesome stuff though it's worth mentioning that it is not so much RavenDB as it is Lucene that makes the search correction possible. Correct?

Ayende Rahien

Simon, We rely on Lucene for some stuff, in this case, it is actually RavenDB that submitted patches to get this feature working properly.

Simon Skov Boisen

Awesome that you helped improve Lucene in order bring great features to RavenDB, is it Lucene.net your using?

Felipe Fujiy Pessoto

Lucene.net accepts C# contrib? Or do you need to write in Java and send to original Lucene?

Lucene.net is still active? The last version is 2.9.4, and the original Lucene has 3.0 to 3.6 stable versions and a 4.0 Beta.

Ayende Rahien

Felipe, Lucene.NET is fairly active. It just went 3.0.3, I think. We didn't try to submit an upstream java patch, just one to the C# port.

Igor Kalders

Watch your step. Since Google buys at least one company a week, you might be overtaken shortly. Does is actually make sense to compare BigTable to RavenDB?

Ayende Rahien

Igor, Totally different things all together.

Judah Gabriel Himango

Oren,

Very cool! Feature indeed.

Can I ask why we require static indexes in order to use .Suggest?

I am giving a Raven talk at Twin Cities Code Camp, and I'm bummed that I have to first explain static vs dynamic indexes before ever touching on this cool .Suggest feature.

Ayende Rahien

Judah, Temp indexes can come & go. Also, temp indexes don't do analyzed, only full term comparisons.

Judah Gabriel Himango

Fair enough.

Tangential: Did you see @kellabyte twitter feed last night? She doesn't like that Raven calls itself ACID. https://twitter.com/JudahGabriel/status/242880337410150400

dotnetchris

@Judah i responded to that tweet

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats