Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,026 | Comments: 44,842

filter by tags archive

Find the differences: The optimization that changed behavior

time to read 2 min | 254 words

I was thinking about ways of optimizing NHibernate Search’ behavior, and I run into a bug in my proposed solution. It is an interesting one, so I thought that would be make a good post.

Right now NHibernate Search behavior is similar to this:

public IList<T> Search<T>(string query)
var results = new List<T>();
foreach(var idAndClass in DoLuceneSearch(query))
var result = (T)session.Get(idAndClass.ClassName, idAndClass.Id);
if(result != null)
return results;

This isn’t the actual code, but it shows how NHibernate works. It also shows the problem that I thought about fixing. The way it is implemented now, NHibernate Search will create a SELECT N+1 query.

Now, the optimization is simply:

public IList<T> Search<T>(string query)
return session
.Add(Restrictions.In("id", DoLuceneSearch(query).Select(x=>x.Id)))

There are at least two major differences between the behavior of the two versions, can you find them?



Is it possible that DoLuceneSearch can return the same row multiple times?

Also, this would do this in a different sort order if I'm not mistaken. (That is, if sort order in even an issue at that level which I assume it is)

Note: I don't use NHibernate, and I have no idea what Lucene is, so this is nothing more than an educated guess.

Dmitriy Nagirnyak

I think 3 major differences are:

  1. The 2nd case executes (in theory) 2 SQL queries. One to return Class+Id, the 2nd to return the actual list (probably using SELECT... FROM... WHERE id in (...). With all the pros/cons.

  2. The 2nd case ignores the NH 2nd level cache.

  3. The 2nd approach ignores the class thus you cannot use search with inheritance.


The query in the second example would always hit the database but it would only be done once. You might hit a 2100 parameter limit if it uses the IN clause.

The first example can take advantage of the identity map.

Markus Zywitza

1) Original code loads proxies, the optimized code loads the full objects.

2) The optimized code loads only T while the original code loads T and subclasses of T


Will the second one fail if no rows are returned by lucenesearch?

Johannes Gustafsson

The first query uses idAndClass.className to get the entity. If it is possible for idAndClass.className to be something else than T, then the second query could return an entirely different entity.

On the other hand, the first query would throw in this case.

Mogens Heller Grabe

One difference is that the first version will take advantage of the 1st level cache and the 2nd level entity cache if it is enabled.

The second version will always go to the database.

Ayende Rahien


Sort order is one such problem, yes.

Ayende Rahien


Yep, that is a big change in behavior.

Ayende Rahien


1 isn't true, the DoLuceneQuery doesn't hit the DB.

Howard Pinsley

Expanding on what Johannes mentioned, it seems like you can actually get an non-matching item of type T that exits with an id that was returned by the Lucene search for an entirely different class?

Johannes Gustafsson

I Guess one solution could be some kind of this?

public IList <t Search <t(string query)


return session



    .Add(Restrictions.In("id", DoLuceneSearch(query).Where(x => typeof(T).IsAssignableFrom(Assembly.GetExecutingAssembly().GetType(x.className))).Select(x=>x.Id)))



Chris Smith

I can see a big stinking NullReferenceException about to happen though :)


It's important to at least have the option to use the IN query version, especially if the query should be decorated from other sources, such as Rhino.Security

Pawel Lesnikowski

As far as I remember IN query has limits at least in Oracle database.

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. Technical observations from my wife (3):
    13 Nov 2015 - Production issues
  2. Production postmortem (13):
    13 Nov 2015 - The case of the “it is slow on that machine (only)”
  3. Speaking (5):
    09 Nov 2015 - Community talk in Kiev, Ukraine–What does it take to be a good developer
  4. Find the bug (5):
    11 Sep 2015 - The concurrent memory buster
  5. Buffer allocation strategies (3):
    09 Sep 2015 - Bad usage patterns
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats