Find the differences: The optimization that changed behavior

time to read 2 min | 254 words

I was thinking about ways of optimizing NHibernate Search’ behavior, and I run into a bug in my proposed solution. It is an interesting one, so I thought that would be make a good post.

Right now NHibernate Search behavior is similar to this:

public IList<T> Search<T>(string query)
{
    var results = new List<T>();
    foreach(var idAndClass in DoLuceneSearch(query))
    {
        var result = (T)session.Get(idAndClass.ClassName, idAndClass.Id);
        if(result != null)
            results.Add(result);
    }
    return results;
}

This isn’t the actual code, but it shows how NHibernate works. It also shows the problem that I thought about fixing. The way it is implemented now, NHibernate Search will create a SELECT N+1 query.

Now, the optimization is simply:

public IList<T> Search<T>(string query)
{
    return session
        .CreateCriteria<T>()
        .Add(Restrictions.In("id", DoLuceneSearch(query).Select(x=>x.Id)))
        .List();
}

There are at least two major differences between the behavior of the two versions, can you find them?

Tweet Share Share 17 comments

Comments

16 Sep 2009
04:34 AM

configurator

Is it possible that DoLuceneSearch can return the same row multiple times?

Also, this would do this in a different sort order if I'm not mistaken. (That is, if sort order in even an issue at that level which I assume it is)

Note: I don't use NHibernate, and I have no idea what Lucene is, so this is nothing more than an educated guess.

16 Sep 2009
04:49 AM

Dmitriy Nagirnyak

I think 3 major differences are:

The 2nd case executes (in theory) 2 SQL queries. One to return Class+Id, the 2nd to return the actual list (probably using SELECT... FROM... WHERE id in (...). With all the pros/cons.
The 2nd case ignores the NH 2nd level cache.
The 2nd approach ignores the class thus you cannot use search with inheritance.

16 Sep 2009
05:00 AM

Dmitry

The query in the second example would always hit the database but it would only be done once. You might hit a 2100 parameter limit if it uses the IN clause.

The first example can take advantage of the identity map.

16 Sep 2009
05:29 AM

Markus Zywitza

1) Original code loads proxies, the optimized code loads the full objects.

2) The optimized code loads only T while the original code loads T and subclasses of T

16 Sep 2009
05:48 AM

Bunter

Will the second one fail if no rows are returned by lucenesearch?

16 Sep 2009
06:01 AM

Johannes Gustafsson

The first query uses idAndClass.className to get the entity. If it is possible for idAndClass.className to be something else than T, then the second query could return an entirely different entity.

On the other hand, the first query would throw in this case.

16 Sep 2009
06:04 AM

Mogens Heller Grabe

One difference is that the first version will take advantage of the 1st level cache and the 2nd level entity cache if it is enabled.

The second version will always go to the database.

16 Sep 2009
09:00 AM

Ayende Rahien

Configurator,

Sort order is one such problem, yes.

16 Sep 2009
09:02 AM

Ayende Rahien

Johannes,

Yep, that is a big change in behavior.

16 Sep 2009
09:03 AM

Ayende Rahien

Mogens,

Yep :-)

16 Sep 2009
09:11 AM

Ayende Rahien

Dmitriy,

1 isn't true, the DoLuceneQuery doesn't hit the DB.

16 Sep 2009
15:34 PM

Howard Pinsley

Expanding on what Johannes mentioned, it seems like you can actually get an non-matching item of type T that exits with an id that was returned by the Lucene search for an entirely different class?

17 Sep 2009
06:51 AM

Johannes Gustafsson

I Guess one solution could be some kind of this?

public IList <t Search <t(string query)

{

return session

    .CreateCriteria

<t()

    .Add(Restrictions.In("id", DoLuceneSearch(query).Where(x => typeof(T).IsAssignableFrom(Assembly.GetExecutingAssembly().GetType(x.className))).Select(x=>x.Id)))

    .List();

}

17 Sep 2009
10:54 AM

Chris Smith

I can see a big stinking NullReferenceException about to happen though :)

17 Sep 2009
11:09 AM

Paul Hatcher

Is this still true, I applied a patch (see http://nhjira.koah.net/browse/NHSR-17) that addressed at least one use case of this.

20 Sep 2009
12:48 PM

gunteman

It's important to at least have the option to use the IN query version, especially if the query should be decorated from other sources, such as Rhino.Security

21 Sep 2009
06:20 AM

Pawel Lesnikowski

As far as I remember IN query has limits at least in Oracle database.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB