NuGet Perf, Part VI AKA how to be the most popular dev around
So far, we imported the NuGet data to RavenDB and seen how we can get it out for the packages page and then looked into how we can utilize RavenDB features to help us in package search. I think we did a good job there, but we can probably do better still. In this post, I am going to stop showing off things in the Studio and focus on code. In particular, advanced searching options.
We will start from the simplest search possible. Or not, because we are doing full text search and quite a few other things aside even in the base line search. Anyway, here is the skeleton program:
while (true) { Console.Write("Search: "); var search = Console.ReadLine(); if(string.IsNullOrEmpty(search)) { Console.Clear(); continue; } using (var session = store.OpenSession()) { var q = session.Query<PackageSearch>("Packages/Search") .Search(x => x.Query, search) .Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.IsPrerelease == false) .As<Package>() .OrderByDescending(x => x.DownloadCount).ThenBy(x => x.Created) .Take(3); var packages = q.ToList(); foreach (var package in packages) { Console.WriteLine("\t{0}", package.Id); } } }
Now, we are going to run this and see what we get.
So far, so good. Now let us try to improve things. What happens when we search for “jquryt”? Nothing is found, and that is actually pretty sad, because to a human, it is obvious what you are trying to search on.
If you have fat fingers and have a tendency to creatively spell words, I am sure you can emphasize with this feeling. Luckily for us, RavenDB is going to help, let us see how:
What?!
How did it do that? Well, let us look at the changes in the code, shall we?
private static void PeformQuery(IDocumentSession session, string search, bool guessIfNoResultsFound = true) { var packages = session.Query<PackageSearch>("Packages/Search") .Search(x => x.Query, search) .Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.IsPrerelease == false) .As<Package>() .OrderByDescending(x => x.DownloadCount).ThenBy(x => x.Created) .Take(3).ToList(); if (packages.Count > 0) { foreach (var package in packages) { Console.WriteLine("\t{0}", package.Id); } } else if(guessIfNoResultsFound) { DidYouMean(session, search); } else { Console.WriteLine("\tNo search results were found"); } }
The only major change was the call to DidYouMean(), so let us see what is going on in there.
private static void DidYouMean(IDocumentSession session, string search) { var suggestionQueryResult = session.Query<PackageSearch>("Packages/Search") .Search(x => x.Query, search) .Suggest(); switch (suggestionQueryResult.Suggestions.Length) { case 0: Console.WriteLine("\tNo search results were found"); break; case 1: // we may have it filtered because of the other conditions, don't recurse againConsole.WriteLine("\tSearch corrected to: {0}", suggestionQueryResult.Suggestions[0]); Console.WriteLine(); PeformQuery(session, suggestionQueryResult.Suggestions[0], guessIfNoResultsFound: false); break; default: Console.WriteLine("\tDid you mean?"); foreach (var suggestion in suggestionQueryResult.Suggestions) { Console.WriteLine("\t - {0} ?", suggestion); } break; } }
Here, we ask RavenDB, “we couldn’t find anything what we had, can you give me some other ideas?” RavenDB can check the actual data that we have on disk and suggest similar alternative.
In essence, we asked RavenDB for what is nearby, and it provided us with some useful suggestions. Because the suggestions are actually based on the data we have in the db, searches on that will produce correct results.
Note that we have three code paths here, if there is one suggestion, we are going to select that immediately. Let us see how this looks like in practice:
Users tend to fall in love with those sort of features, and with RavenDB you can provide them in just a few lines of code and absolutely no hassle.
In my next post (and probably the last in this series) we will discuss even more awesome search features .
Comments
How would you say it? "It just works!"
Awesome. Not only does the NOSQL approach bring massive performance gains, but also additional and useful benefits from using lucene as query language.
I don't know if the NuGet data is hosted on a single SQL machine, but I think you should also show how easy it would be to load-balance over multiple servers with replication. If I understood RavenDB correctly the replication is more or less a configuration item stored in the database itself.
Thanks for the "real life" example. I think this is where developers can see the real gain of using RavenDB.
This needs a meme because RavenDb kick ass!
http://cdn.memegenerator.net/instances/400x/26194023.jpg
Awesome stuff though it's worth mentioning that it is not so much RavenDB as it is Lucene that makes the search correction possible. Correct?
Simon, We rely on Lucene for some stuff, in this case, it is actually RavenDB that submitted patches to get this feature working properly.
Awesome that you helped improve Lucene in order bring great features to RavenDB, is it Lucene.net your using?
Simon, Yes.
Lucene.net accepts C# contrib? Or do you need to write in Java and send to original Lucene?
Lucene.net is still active? The last version is 2.9.4, and the original Lucene has 3.0 to 3.6 stable versions and a 4.0 Beta.
Felipe, Lucene.NET is fairly active. It just went 3.0.3, I think. We didn't try to submit an upstream java patch, just one to the C# port.
Watch your step. Since Google buys at least one company a week, you might be overtaken shortly. Does is actually make sense to compare BigTable to RavenDB?
Igor, Totally different things all together.
Oren,
Very cool! Feature indeed.
Can I ask why we require static indexes in order to use .Suggest?
I am giving a Raven talk at Twin Cities Code Camp, and I'm bummed that I have to first explain static vs dynamic indexes before ever touching on this cool .Suggest feature.
Judah, Temp indexes can come & go. Also, temp indexes don't do analyzed, only full term comparisons.
Fair enough.
Tangential: Did you see @kellabyte twitter feed last night? She doesn't like that Raven calls itself ACID. https://twitter.com/JudahGabriel/status/242880337410150400
@Judah i responded to that tweet
Comment preview