Ayende @ Rahien

It's a girl

NHibernate Search

NHibernate Search is an extension to NHibernate that allows you to utilize Lucene.NET, a full text search engine as your query engine, instead of putting additional load on the database itself. In a sense, this is a good way outsource your queries from the database.

This has several chief advantages:

  • Your database is now mostly about performing queries by primary key (fast) and handling data storage, transactional semantics, etc. All of which should be very fast.
  • Your costly queries can now run on a different (cheap) machine, and a long query isn’t going to take locks in the database (slowing everything down).
  • Lucene.NET is a document database, this means that some things are significantly cheaper to query than in an RDBMS

Using NHibernate Search is very easy, from configuration stand point, we need to define the following listeners:

<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search'
					type='post-insert'/>
<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search'
					type='post-update'/>
<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search'
					type='post-delete'/>

That done, we need to annotate our classes with attributes that will tell NHibernate Search how we want our entities to be indexed. Unlike NHibernate Validator, there is no alternate XML configuration for the indexing specification.

We annotate our entities like this:

[Indexed]
public class Post
{
	[DocumentId]
	public virtual int Id { get; set; }

	[IndexedEmbedded]
	public virtual Blog Blog { get; set; }

	[IndexedEmbedded]
	public virtual User User { get; set; }

	[Field(Index.Tokenized, Store = Store.Yes)]
	public virtual string Title { get; set; }

	[Field(Index.Tokenized)]
	public virtual string Text { get; set; }

	public virtual DateTime PostedAt { get; set; }

	public virtual ISet<Comment> Comments { get; set; }

	[IndexedEmbedded]
	public virtual ISet<Category> Categories { get; set; }

	[IndexedEmbedded]
	public virtual ISet<Tag> Tags { get; set; }
}

I am not going to go over the semantics of each attribute, and how they play together, suffice to say that Hibernate Search in Action will give you all the details about how to use this.

That said, let us look how this will actually get indexed:

image

We have a document, which has fields, and we can query those fields by any of its values, and get a pretty fast reply back. Note that the document structure that we have here is flat, so what would usually be a join is now an almost no cost operation. Of course, the more we put in the index, the bigger it is, but that is another tradeoff.

We can now query the index using:

using (var s = sf.OpenSession())
using(var search = Search.CreateFullTextSession(s))
using (var tx = s.BeginTransaction())
{
	var list = search.CreateFullTextQuery<Post>("Tags.Name:Hello")
		.SetMaxResults(5)
		.List<Post>();

	foreach (var post in list)
	{
		Console.WriteLine(post.Title);
	}

	tx.Commit();
}

As I said, actually getting down into all the syntax, options and tricks that we can use here is beyond the scope of this post, that is why I pointed out the book, which cover all of them in depth.

Comments

josh
05/04/2009 04:13 AM by
josh

this is absolute gold; only wish I had this about 2.5 years ago when working on a system with very heavy use of full-text searches. Would have saved us mountains of headaches.

Andreas &#214;hlund
05/04/2009 06:19 AM by
Andreas Öhlund

Nice post!

What tool are you using to display the contents of the index?

Marco Parenzan
05/04/2009 01:00 PM by
Marco Parenzan

Is Lucene.NET currently supported? I've seen that Lucene was out with version 2.4.1 in March 2009, while Licene.NET have a 2.0 version from 2007. Is valid to use it?

Jozef Sevcik
05/04/2009 05:47 PM by
Jozef Sevcik

Marco:

NHibernate.Search project in trunk uses latest stable Lucene 2.0.

However, I can confirm NH.Search works well with Lucene.NET trunk version (marked as 2.3).

It only requires downloading Lucene.NET from their SVN, building and then building NHibernate.Search with Lucene.NET 2.3 dll.

Ayende Rahien
05/04/2009 06:20 PM by
Ayende Rahien

Marco,

I can't speak about Lucene.NET catching up to Lucene, but Lucene.NET is most certainly supported by NHibernate Search

Paul Hatcher
05/05/2009 07:41 AM by
Paul Hatcher

Marco

We can upgrade to Lucene 2.4.1 quite easily - bear in mind that the project hadn't had a formal release for quite a while, we were using the latest.

Paul

Paul Hatcher
05/05/2009 07:47 AM by
Paul Hatcher

Sorry, mis-read the comment - the version of Lucene.NET we are using is almost up to date, we are using 2.31, they recently (March) changed over to 2.3.2 so I'll update to that in the next few days.

Paul

Andrey
05/05/2009 05:56 PM by
Andrey

Ayende,

Is there any way to put those attributes dynamically? We have entities with various number of fields (configuration is built in the runtime using dynamic components) and thus we don't have explicit fields. Could you recommend something in this case?

jbland
05/05/2009 07:31 PM by
jbland

I assume that DateTime fields can be indexed ? (missing in the code above)

Ayende Rahien
05/05/2009 10:19 PM by
Ayende Rahien

Andrey,

No, but feel free to submit a patch to do that.

Ayende Rahien
05/05/2009 10:22 PM by
Ayende Rahien

jbland,

Yes they can be, but be aware that they are done using textual indexing, not value indexing

This means that the granularity that you use it important

Erik
05/26/2009 11:19 AM by
Erik
Is it really possible to index the Categories like this, im trying to do exactly that but the items in the list wont get indexed in the file (checked with luke). In my case the ISet
<category is a many to many, could that cause the non existence of indexes?
  
>
Sebastijan Pistotnik
06/06/2009 09:24 AM by
Sebastijan Pistotnik

How do I index existing data? I am reading in Hibernate in action book about the fact that most of the time index data set cannot fit all in memory. If I attempt to load all objects and then index them, I can face OutOfMemoryException. So there is a method indexAllItems where in java it is used ScrollableResults and a method flushToIndexes on session object. What is the equivalent in .net for ScrollableResults and flushToIndexes. I checked the source code but there is no implementation for them. So is there some example how to do it?

Gokul
06/24/2009 05:12 AM by
Gokul

Hi Ayende,

Could you point to some articles which talks about using threaded parallel indexing to create faster indexes. I am trying to index around 2 million records it takes around 10+hrs. The initial 100K-200K is fast then it slows down very much.

Thanks.

Ayende Rahien
06/24/2009 05:48 AM by
Ayende Rahien

Gokul,

Ping me on email so we can discuss this more easily, I would need to see the code for doing the indexes.

atma
06/25/2009 10:12 AM by
atma

i'm wondering what is the recommended way of indexing a large quantity of existing data, so i' would be interested in this email thread :)

The examples most found on the net regarding Lucene.net integration with Nhibernate.Search deal mainly with automatic event wiring and inserting new data in the db, but i've yet to find an example for the other problem.

I haven't done a benchmark for IFullTextSession.Index yet, but i think this will be the simpler way of working in my case:

  • Page 1024 items

  • Index them

  • repeat t'il done

But i'm waiting for any other advice :)

Ayende Rahien
06/25/2009 10:30 AM by
Ayende Rahien

atma,

Yep, that is the way to go

Comments have been closed on this topic.