Ayende @ Rahien

Refunds available at head office

RavenDB new feature: Highlights

Before anything else, I need to thank Sergey Shumov for this feature. This is one of the features that we got as a pull request, and we were very happy to accept it.

What are highlights? Highlights are important when you want to give the user better search UX.

For example, let us take the Google Code data set and write the following index for it:\

public class Projects_Search : AbstractIndexCreationTask<Project, Projects_Search.Result>
{
    public class Result
    {
        public string Query { get; set; }
    }

    public Projects_Search()
    {
        Map = projects =>
              from p in projects
              select new
              {
                  Query = new[]
                  {
                      p.Name,
                      p.Summary
                  }
              };
        Store(x => x.Query, FieldStorage.Yes);
        Index(x=>x.Query, FieldIndexing.Analyzed);
    }
}

And now, we are going to search it:

using(var session = store.OpenSession())
{
    var prjs = session.Query<Projects_Search.Result, Projects_Search>()
        .Search(x => x.Query, q)
        .Take(5)
        .OfType<Project>()
        .ToList();

    var sb = new StringBuilder().AppendLine("<ul>");

    foreach (var project in prjs)
    {
        sb.AppendFormat("<li>{0} - {1}</li>", project.Name, project.Summary).AppendLine();
    }
    var s = sb
        .AppendLine("</ul>")
        .ToString();
}

The value of q is: source

Using this, we get the following results:

  • hl2sb-src - Source code to Half-Life 2: Sandbox - A free and open-source sandbox Source engine modification.
  • mobilebughunter - BugHunter Platfrom is am open source platform that integrates with BugHunter Platform is am open source platform that integrates with Mantis Open Source Bug Tracking System. The platform allows anyone to take part in the test phase of mobile software proj
  • starship-troopers-source - Starship Troopers: Source is an open source Half-Life 2 Modification.
  • static-source-analyzer - A Java static source analyzer which recursively scans folders to analyze a project's source code
  • source-osa - Open Source Admin - A Source Engine Administration Plugin

And this make sense, and it is pretty easy to work with. Except that it would be much nicer if we could go further than this, and let the user know why we selecting those results. Here is were highlights come into play. We will start with the actual output first, because it is more impressing:

  • hl2sb-src - Source code to Half-Life 2: Sandbox - A free and open-source sandbox Source engine modification.
  • mobilebughunter - open source platform that integrates with BugHunter Platform is am open source platform that integrates with Mantis Open Source
  • volumetrie - code source - Volumetrie est un programme permettant de récupérer des informations sur un code source - Volumetrie is a p
  • acoustic-localization-robot - s the source sound and uses a lego mindstorm NXT and motors to point a laser at the source.
  • bamboo-invoice-ce - The source-controlled Community Edition of Derek Allard's open source "Bamboo Invoice" project

And here is the code to make this happen:

using(var session = store.OpenSession())
{
    var prjs = session.Query<Projects_Search.Result, Projects_Search>()
        .Customize(x=>x.Highlight("Query", 128, 1, "Results"))
        .Search(x => x.Query, q)
        .Take(5)
        .OfType<Project>()
        .Select(x=> new
        {
            x.Name,
            Results = (string[])null
        })
        .ToList();

    var sb = new StringBuilder().AppendLine("<ul>");

    foreach (var project in prjs)
    {
        sb.AppendFormat("<li>{0} - {1}</li>", project.Name, string.Join(" || ", project.Results)).AppendLine();
    }
    var s = sb
        .AppendLine("</ul>")
        .ToString();
}

For that matter, here is me playing with things, searching for: lego mindstorm

  • acoustic-localization-robot - ses a lego mindstorm NXT and motors to point a laser at the source.
  • dpm-group-3-fall-2011 - Lego Mindstorm Final Project
  • hivemind-nxt - k for Lego Mindstorm NXT Robots
  • gsi-lego - with Lego Mindstorm using LeJos
  • lego-xjoysticktutorial - l you Lego Mindstorm NXT robot with a joystick

You can play around with how it highlight the text, but as you can see, I am pretty happy with this new feature.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Chanan Braunstein
01/25/2013 12:27 PM by
Chanan Braunstein

Very helpful feature! What is OfType, is that the same as AsProjection?

Ayende Rahien
01/25/2013 12:56 PM by
Ayende Rahien

Chanan, OfTyoe is the same as As.

Khalid Abuhakmeh
01/25/2013 12:57 PM by
Khalid Abuhakmeh

To play devil's advocate, why wouldn't you want to do this on the clientside with a JavaScript library?

It seems like you'd want to do more with the UI like tooltip hovers.

I still think it is cool and I'm not gonna complain about new features.

Ayende Rahien
01/25/2013 01:01 PM by
Ayende Rahien

Khalid, Let us assume that you are indexing a text field that is 2 KB in size. You don't want to send that 2KB times NumberOfDocs. This does this on the server side, and sent you only the snapshots.

Khalid Abuhakmeh
01/25/2013 01:26 PM by
Khalid Abuhakmeh

My apologies, I was unclear. I didn't mean by Ajax (request to the server). I meant there are libraries that will scrape your page clientside and put the highlighting in after the page has loaded. See the example below.

http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html

Certainly the way you described it would be a inefficient way to do it.

Ayende Rahien
01/25/2013 01:27 PM by
Ayende Rahien

Khalid, We aren't talking about doing this on a single document ,this is for you to be able to see the full search results with more data.

Sergey Shumov
01/25/2013 03:41 PM by
Sergey Shumov

Khalid, highlighting isn't only about putting html tags into a text. It also allows you to fetch snippets (short regions of text with matched tokens inside) and do it fast because all the necessary information (tokens offsets) is already contained in Lucene index.

Michael Carter
01/25/2013 08:47 PM by
Michael Carter

Khalid, if you look at the highlighted results, it doesn't look like it's displaying the entire project.Summary value. It's only showing the snippet of text that includes the highlighted term. That's pretty nice.

Matt Johnson
01/28/2013 03:53 PM by
Matt Johnson

What build should I look for this in?

Ayende Rahien
01/28/2013 05:21 PM by
Ayende Rahien

Matt, Any in the last week or so.

Ciel
01/29/2013 03:11 PM by
Ciel

So this is more a server / administration ui feature?

I guess I am just confused as to how this actually works.

Ayende Rahien
01/29/2013 03:22 PM by
Ayende Rahien

Ciel, This is a user facing feature. You would use it for your own search pages to give better UX

Comments have been closed on this topic.