Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

@

Posts: 5,947 | Comments: 44,539

filter by tags archive

Find the differences: The optimization that changed behavior


I was thinking about ways of optimizing NHibernate Search’ behavior, and I run into a bug in my proposed solution. It is an interesting one, so I thought that would be make a good post.

Right now NHibernate Search behavior is similar to this:

public IList<T> Search<T>(string query)
{
var results = new List<T>();
foreach(var idAndClass in DoLuceneSearch(query))
{
var result = (T)session.Get(idAndClass.ClassName, idAndClass.Id);
if(result != null)
results.Add(result);
}
return results;
}

This isn’t the actual code, but it shows how NHibernate works. It also shows the problem that I thought about fixing. The way it is implemented now, NHibernate Search will create a SELECT N+1 query.

Now, the optimization is simply:

public IList<T> Search<T>(string query)
{
return session
.CreateCriteria<T>()
.Add(Restrictions.In("id", DoLuceneSearch(query).Select(x=>x.Id)))
.List();
}

There are at least two major differences between the behavior of the two versions, can you find them?


Comments

configurator

Is it possible that DoLuceneSearch can return the same row multiple times?

Also, this would do this in a different sort order if I'm not mistaken. (That is, if sort order in even an issue at that level which I assume it is)

Note: I don't use NHibernate, and I have no idea what Lucene is, so this is nothing more than an educated guess.

Dmitriy Nagirnyak

I think 3 major differences are:

  1. The 2nd case executes (in theory) 2 SQL queries. One to return Class+Id, the 2nd to return the actual list (probably using SELECT... FROM... WHERE id in (...). With all the pros/cons.

  2. The 2nd case ignores the NH 2nd level cache.

  3. The 2nd approach ignores the class thus you cannot use search with inheritance.

Dmitry

The query in the second example would always hit the database but it would only be done once. You might hit a 2100 parameter limit if it uses the IN clause.

The first example can take advantage of the identity map.

Markus Zywitza

1) Original code loads proxies, the optimized code loads the full objects.

2) The optimized code loads only T while the original code loads T and subclasses of T

Bunter

Will the second one fail if no rows are returned by lucenesearch?

Johannes Gustafsson

The first query uses idAndClass.className to get the entity. If it is possible for idAndClass.className to be something else than T, then the second query could return an entirely different entity.

On the other hand, the first query would throw in this case.

Mogens Heller Grabe

One difference is that the first version will take advantage of the 1st level cache and the 2nd level entity cache if it is enabled.

The second version will always go to the database.

Ayende Rahien

Configurator,

Sort order is one such problem, yes.

Ayende Rahien

Johannes,

Yep, that is a big change in behavior.

Ayende Rahien

Dmitriy,

1 isn't true, the DoLuceneQuery doesn't hit the DB.

Howard Pinsley

Expanding on what Johannes mentioned, it seems like you can actually get an non-matching item of type T that exits with an id that was returned by the Lucene search for an entirely different class?

Johannes Gustafsson

I Guess one solution could be some kind of this?

public IList <t Search <t(string query)

{

return session

    .CreateCriteria

<t()

    .Add(Restrictions.In("id", DoLuceneSearch(query).Where(x => typeof(T).IsAssignableFrom(Assembly.GetExecutingAssembly().GetType(x.className))).Select(x=>x.Id)))

    .List();

}

Chris Smith

I can see a big stinking NullReferenceException about to happen though :)

gunteman

It's important to at least have the option to use the IN query version, especially if the query should be decorated from other sources, such as Rhino.Security

Pawel Lesnikowski

As far as I remember IN query has limits at least in Oracle database.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB Sharding (2):
    21 May 2015 - Adding a new shard to an existing cluster, the easy way
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats