NHibernate Search

time to read 4 min | 605 words

NHibernate Search is an extension to NHibernate that allows you to utilize Lucene.NET, a full text search engine as your query engine, instead of putting additional load on the database itself. In a sense, this is a good way outsource your queries from the database.

This has several chief advantages:

Your database is now mostly about performing queries by primary key (fast) and handling data storage, transactional semantics, etc. All of which should be very fast.
Your costly queries can now run on a different (cheap) machine, and a long query isn’t going to take locks in the database (slowing everything down).
Lucene.NET is a document database, this means that some things are significantly cheaper to query than in an RDBMS

Using NHibernate Search is very easy, from configuration stand point, we need to define the following listeners:

<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search'
					type='post-insert'/>
<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search'
					type='post-update'/>
<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search'
					type='post-delete'/>

That done, we need to annotate our classes with attributes that will tell NHibernate Search how we want our entities to be indexed. Unlike NHibernate Validator, there is no alternate XML configuration for the indexing specification.

We annotate our entities like this:

[Indexed]
public class Post
{
	[DocumentId]
	public virtual int Id { get; set; }

	[IndexedEmbedded]
	public virtual Blog Blog { get; set; }

	[IndexedEmbedded]
	public virtual User User { get; set; }

	[Field(Index.Tokenized, Store = Store.Yes)]
	public virtual string Title { get; set; }

	[Field(Index.Tokenized)]
	public virtual string Text { get; set; }

	public virtual DateTime PostedAt { get; set; }

	public virtual ISet<Comment> Comments { get; set; }

	[IndexedEmbedded]
	public virtual ISet<Category> Categories { get; set; }

	[IndexedEmbedded]
	public virtual ISet<Tag> Tags { get; set; }
}

I am not going to go over the semantics of each attribute, and how they play together, suffice to say that Hibernate Search in Action will give you all the details about how to use this.

That said, let us look how this will actually get indexed:

We have a document, which has fields, and we can query those fields by any of its values, and get a pretty fast reply back. Note that the document structure that we have here is flat, so what would usually be a join is now an almost no cost operation. Of course, the more we put in the index, the bigger it is, but that is another tradeoff.

We can now query the index using:

using (var s = sf.OpenSession())
using(var search = Search.CreateFullTextSession(s))
using (var tx = s.BeginTransaction())
{
	var list = search.CreateFullTextQuery<Post>("Tags.Name:Hello")
		.SetMaxResults(5)
		.List<Post>();

	foreach (var post in list)
	{
		Console.WriteLine(post.Title);
	}

	tx.Commit();
}

As I said, actually getting down into all the syntax, options and tricks that we can use here is beyond the scope of this post, that is why I pointed out the book, which cover all of them in depth.

Tweet Share Share 19 comments

Tags:

NHibernate

Comments

04 May 2009
04:13 AM

josh

this is absolute gold; only wish I had this about 2.5 years ago when working on a system with very heavy use of full-text searches. Would have saved us mountains of headaches.

04 May 2009
06:19 AM

Andreas Öhlund

Nice post!

What tool are you using to display the contents of the index?

04 May 2009
06:41 AM

Ayende Rahien

Andreas,

Luke

04 May 2009
13:00 PM

Marco Parenzan

Is Lucene.NET currently supported? I've seen that Lucene was out with version 2.4.1 in March 2009, while Licene.NET have a 2.0 version from 2007. Is valid to use it?

04 May 2009
17:47 PM

Jozef Sevcik

Marco:

NHibernate.Search project in trunk uses latest stable Lucene 2.0.

However, I can confirm NH.Search works well with Lucene.NET trunk version (marked as 2.3).

It only requires downloading Lucene.NET from their SVN, building and then building NHibernate.Search with Lucene.NET 2.3 dll.

04 May 2009
18:20 PM

Ayende Rahien

Marco,

I can't speak about Lucene.NET catching up to Lucene, but Lucene.NET is most certainly supported by NHibernate Search

05 May 2009
07:41 AM

Paul Hatcher

Marco

We can upgrade to Lucene 2.4.1 quite easily - bear in mind that the project hadn't had a formal release for quite a while, we were using the latest.

Paul

05 May 2009
07:47 AM

Paul Hatcher

Sorry, mis-read the comment - the version of Lucene.NET we are using is almost up to date, we are using 2.31, they recently (March) changed over to 2.3.2 so I'll update to that in the next few days.

Paul

05 May 2009
17:56 PM

Andrey

Ayende,

Is there any way to put those attributes dynamically? We have entities with various number of fields (configuration is built in the runtime using dynamic components) and thus we don't have explicit fields. Could you recommend something in this case?

05 May 2009
19:31 PM

jbland

I assume that DateTime fields can be indexed ? (missing in the code above)

05 May 2009
22:19 PM

Ayende Rahien

Andrey,

No, but feel free to submit a patch to do that.

05 May 2009
22:22 PM

Ayende Rahien

jbland,

Yes they can be, but be aware that they are done using textual indexing, not value indexing

This means that the granularity that you use it important

26 May 2009
11:19 AM

Erik

Is it really possible to index the Categories like this, im trying to do exactly that but the items in the list wont get indexed in the file (checked with luke). In my case the ISet
<category is a many to many, could that cause the non existence of indexes?
  
>

06 Jun 2009
09:24 AM

Sebastijan Pistotnik

How do I index existing data? I am reading in Hibernate in action book about the fact that most of the time index data set cannot fit all in memory. If I attempt to load all objects and then index them, I can face OutOfMemoryException. So there is a method indexAllItems where in java it is used ScrollableResults and a method flushToIndexes on session object. What is the equivalent in .net for ScrollableResults and flushToIndexes. I checked the source code but there is no implementation for them. So is there some example how to do it?

06 Jun 2009
14:18 PM

Ayende Rahien

Sebasitjan,

Paging

24 Jun 2009
05:12 AM

Gokul

Hi Ayende,

Could you point to some articles which talks about using threaded parallel indexing to create faster indexes. I am trying to index around 2 million records it takes around 10+hrs. The initial 100K-200K is fast then it slows down very much.

Thanks.

24 Jun 2009
05:48 AM

Ayende Rahien

Gokul,

Ping me on email so we can discuss this more easily, I would need to see the code for doing the indexes.

25 Jun 2009
10:12 AM

atma

i'm wondering what is the recommended way of indexing a large quantity of existing data, so i' would be interested in this email thread :)

The examples most found on the net regarding Lucene.net integration with Nhibernate.Search deal mainly with automatic event wiring and inserting new data in the db, but i've yet to find an example for the other problem.

I haven't done a benchmark for IFullTextSession.Index yet, but i think this will be the simpler way of working in my case:

Page 1024 items
Index them
repeat t'il done

But i'm waiting for any other advice :)

25 Jun 2009
10:30 AM

Ayende Rahien

atma,

Yep, that is the way to go

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB