Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,475

filter by tags archive

RavenDBIndex Boosting

time to read 6 min | 1084 words

Recently we added a really nice feature, boosting the results while indexing.

Boosting is a way to give documents or attributes in a document weights. Attribute level boosting is a way to tell RavenDB that a certain  attribute in a document is more important than the others, so it will show up higher in queries when other properties are involved in a query. A document level boosting means that a certain document is more important than another (when using multi maps).

Let us see a few examples where this is happening. The simplest scenario is when we have a multi field search, and we want one of the fields to be the more important one. For example, we decided that when you make a search for first name and last name, a match on the first name has higher relevance than a match on the last name. We can define this requirement with the following index:

public class Users_ByName : AbstractIndexCreationTask<User>
{
    public Users_ByName()
    {
        Map = users => from user in users
                       select new
                       {
                           FirstName = user.FirstName.Boost(3),
                           user.LastName
                       };
    }
}

And we can query the index using:

var matches = session.Query<User,UsersByName>()
      .Where(x=>x.FirstName == "Ayende" || x.LastName == "Eini")
      .ToList()

Assuming that we have a user with the first name “Ayende” and another user with the last name “Eini”, this will find both of them, but will rank the user with the name “Ayende” first.

Let us see another variant, we have a multi map index for users and accounts, both are searchable by name, but we want to ensure that accounts are more important than users. We can do that using the following index:

public class UsersAndAccounts : AbstractMultiMapIndexCreationTask
{
    public UsersAndAccounts()
    {
        AddMap<User>(users =>
                     from user in users
                     select new {Name = user.FirstName}
            );
        AddMap<Account>(accounts =>
                        from account in accounts
                        select new {account.Name}.Boost(3)
            );
    }
}

If we have query that has matches for users and accounts, this will make sure that the account comes first.

And finally, a really interesting use case is that based on the entity itself, you decide to rank it higher. For example, we want to rank customers that ordered a lot from us higher than other customers. We can do that using the following index:

public class Accounts_Search : AbstractIndexCreationTask<Account>
{
    public Accounts_Search()
    {
        Map = accounts =>
              from account in accounts
              select new
              {
                  account.Name
              }.Boost(account.TotalIncome > 10000 ? 3 : 1);
    }
}

This way, we get the more important customers first. And this is really one of those things that brings up the polish in the system, the things that makes the users sit up and take notice.

More posts in "RavenDB" series:

  1. (25 May 2016) Got anything to declare, ya smuggler?
  2. (23 May 2016) I'm no longer conflicted about this
  3. (19 May 2016) What did you subscribe to again?
  4. (17 May 2016) See here, I got a contract, I say!
  5. (13 May 2016) Deeper insights to indexing
  6. (11 May 2016) Digging deep into the internals
  7. (09 May 2016) I'll have the 3+1 goodies to go, please
  8. (04 May 2016) I’ll find who is taking my I/O bandwidth and they SHALL pay
  9. (02 May 2016) You want all the data, you can’t handle all the data
  10. (29 Apr 2016) A large cluster goes into a bar and order N^2 drinks
  11. (27 Apr 2016) I’m the admin, and I got the POWER
  12. (25 Apr 2016) Can you spare me a server?
  13. (21 Apr 2016) Configuring once is best done after testing twice
  14. (19 Apr 2016) Is this a cluster in your pocket AND you are happy to see me?

Comments

Lam Chan

That is very awesome. I'm looking forward to your feature set expansions!

Lam Chan

That is very awesome. I'm looking forward to your feature set expansions!

Jonty

How is this better than order by?

smartcaveman

Where is the source for the AbstractIndexCreationTask and AbstractMultiMapIndexCreationTask?

From this angle, your method for isolating query logic looks similar to the specification pattern implementation that you were less than enthusiastic about (see here: http://ayende.com/blog/4784/architecting-in-the-pit-of-doom-the-evils-of-the-repository-abstraction-layer). But, I'm guessing it just seems that way because I don't know what the base classes are actually doing.

smartcaveman

Nevermind (https://github.com/ravendb/ravendb/blob/master/Raven.Client.Lightweight/Indexes/AbstractIndexCreationTask.cs)

dotnetchris

How do you handle ordering when using boosting? Never use ordering? Will Skip/Take blow up for not being ordered?

Karep

Same question as Jonty. Also what if You decide that Firstname is more important then AccountName?

njy
njy

@Oren: is the value of the Boost available as a field in the result, or at least usable to specify a different ordering? Something like "order by boost desc" or something like that.

Ayende Rahien

Jonty, It is important because it makes the values not dependent on strict ordering, but their basic importance as it relates to your actual query.

Ayende Rahien

Njy, I don't understand the question

njy
njy

Using Boost gives more "importance" to some results, which actually means doing a (logical) additional "order by", is that correct?

If it is like so, i was wondering if it could be possible - for example - to search news articles by a search term and get the results ordered by the date (to get the most recent firsts) and then, if the date is the same, by "boost value". It's just an example, i hope to have clarified the hypothetical scenario.

Ayende Rahien

Njy, I actually think that this would work, yes. Itamar can probably answer this better, but I am pretty sure that the boost value would only be a factor inside the same orderby value.

njy
njy

Ok, thanks

Itamar Syn-Hershko

njy, the "score" will be computed using the sort by and the boost will be factored in. So yes - this is essentially what will happen.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - 10 hours from now
  2. The design of RavenDB 4.0: Making Lucene reliable - about one day from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 2 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 3 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 4 days from now

And 14 more posts are pending...

There are posts all the way to May 30, 2016

RECENT SERIES

  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats