NHibernate Search
NHibernate Search is an extension to NHibernate that allows you to utilize Lucene.NET, a full text search engine as your query engine, instead of putting additional load on the database itself. In a sense, this is a good way outsource your queries from the database.
This has several chief advantages:
- Your database is now mostly about performing queries by primary key (fast) and handling data storage, transactional semantics, etc. All of which should be very fast.
- Your costly queries can now run on a different (cheap) machine, and a long query isn’t going to take locks in the database (slowing everything down).
- Lucene.NET is a document database, this means that some things are significantly cheaper to query than in an RDBMS
Using NHibernate Search is very easy, from configuration stand point, we need to define the following listeners:
<listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search' type='post-insert'/> <listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search' type='post-update'/> <listener class='NHibernate.Search.Event.FullTextIndexEventListener, NHibernate.Search' type='post-delete'/>
That done, we need to annotate our classes with attributes that will tell NHibernate Search how we want our entities to be indexed. Unlike NHibernate Validator, there is no alternate XML configuration for the indexing specification.
We annotate our entities like this:
[Indexed] public class Post { [DocumentId] public virtual int Id { get; set; } [IndexedEmbedded] public virtual Blog Blog { get; set; } [IndexedEmbedded] public virtual User User { get; set; } [Field(Index.Tokenized, Store = Store.Yes)] public virtual string Title { get; set; } [Field(Index.Tokenized)] public virtual string Text { get; set; } public virtual DateTime PostedAt { get; set; } public virtual ISet<Comment> Comments { get; set; } [IndexedEmbedded] public virtual ISet<Category> Categories { get; set; } [IndexedEmbedded] public virtual ISet<Tag> Tags { get; set; } }
I am not going to go over the semantics of each attribute, and how they play together, suffice to say that Hibernate Search in Action will give you all the details about how to use this.
That said, let us look how this will actually get indexed:
We have a document, which has fields, and we can query those fields by any of its values, and get a pretty fast reply back. Note that the document structure that we have here is flat, so what would usually be a join is now an almost no cost operation. Of course, the more we put in the index, the bigger it is, but that is another tradeoff.
We can now query the index using:
using (var s = sf.OpenSession()) using(var search = Search.CreateFullTextSession(s)) using (var tx = s.BeginTransaction()) { var list = search.CreateFullTextQuery<Post>("Tags.Name:Hello") .SetMaxResults(5) .List<Post>(); foreach (var post in list) { Console.WriteLine(post.Title); } tx.Commit(); }
As I said, actually getting down into all the syntax, options and tricks that we can use here is beyond the scope of this post, that is why I pointed out the book, which cover all of them in depth.
Comments
this is absolute gold; only wish I had this about 2.5 years ago when working on a system with very heavy use of full-text searches. Would have saved us mountains of headaches.
Nice post!
What tool are you using to display the contents of the index?
Andreas,
Luke
Is Lucene.NET currently supported? I've seen that Lucene was out with version 2.4.1 in March 2009, while Licene.NET have a 2.0 version from 2007. Is valid to use it?
Marco:
NHibernate.Search project in trunk uses latest stable Lucene 2.0.
However, I can confirm NH.Search works well with Lucene.NET trunk version (marked as 2.3).
It only requires downloading Lucene.NET from their SVN, building and then building NHibernate.Search with Lucene.NET 2.3 dll.
Marco,
I can't speak about Lucene.NET catching up to Lucene, but Lucene.NET is most certainly supported by NHibernate Search
Marco
We can upgrade to Lucene 2.4.1 quite easily - bear in mind that the project hadn't had a formal release for quite a while, we were using the latest.
Paul
Sorry, mis-read the comment - the version of Lucene.NET we are using is almost up to date, we are using 2.31, they recently (March) changed over to 2.3.2 so I'll update to that in the next few days.
Paul
Ayende,
Is there any way to put those attributes dynamically? We have entities with various number of fields (configuration is built in the runtime using dynamic components) and thus we don't have explicit fields. Could you recommend something in this case?
I assume that DateTime fields can be indexed ? (missing in the code above)
Andrey,
No, but feel free to submit a patch to do that.
jbland,
Yes they can be, but be aware that they are done using textual indexing, not value indexing
This means that the granularity that you use it important
How do I index existing data? I am reading in Hibernate in action book about the fact that most of the time index data set cannot fit all in memory. If I attempt to load all objects and then index them, I can face OutOfMemoryException. So there is a method indexAllItems where in java it is used ScrollableResults and a method flushToIndexes on session object. What is the equivalent in .net for ScrollableResults and flushToIndexes. I checked the source code but there is no implementation for them. So is there some example how to do it?
Sebasitjan,
Paging
Hi Ayende,
Could you point to some articles which talks about using threaded parallel indexing to create faster indexes. I am trying to index around 2 million records it takes around 10+hrs. The initial 100K-200K is fast then it slows down very much.
Thanks.
Gokul,
Ping me on email so we can discuss this more easily, I would need to see the code for doing the indexes.
i'm wondering what is the recommended way of indexing a large quantity of existing data, so i' would be interested in this email thread :)
The examples most found on the net regarding Lucene.net integration with Nhibernate.Search deal mainly with automatic event wiring and inserting new data in the db, but i've yet to find an example for the other problem.
I haven't done a benchmark for IFullTextSession.Index yet, but i think this will be the simpler way of working in my case:
Page 1024 items
Index them
repeat t'il done
But i'm waiting for any other advice :)
atma,
Yep, that is the way to go
Comment preview