Combating the Select N + 1 Problem In NHibernate

time to read 7 min | 1278 words

Using an O/RM can greatly simplify your life, but is has its on set of Gotcha that you need to be aware of. One of the more serious ones is the Select N + 1 issue. To describe the issue, let's look at a typical example. Users א M:M א Blogs א 1:M א Posts א 1:M א Comments. Here is the diagram:

I hope that both my notation above and diagram makes it clear what the relations between the objects are (isn't class designer cool)? As an aside, the above model is from NHibernate.Generics tests. All collections are lazy.

I want to show the user all the comments from all the posts, so they can delete all the nasty comments. The naןve implementation would be something like:

foreach (Post post in blog.Posts)

{

    foreach (Comment comment in post.Comments)

    {

        //print comment...

    }

}

The posts collection has 80 posts in it. When we access it, we load all the posts (1 select to grab all the posts), now, when we access each post's Comments' collection, we need to issue a new select to get the data (1 select to grab all the comments for this post). In my testing, I got 81 selects from this seemingly innocent piece of code. I'm not sure about you, but I'm usually motivated to reduce the number of external queries. To put it another way, for every iteration of the external loop above, we have a database query. And that is in order to print a fairly simple UI, and this is a fairly common task.

There are several ways to solve the issue. The easiest is to just ask NHibernate to give you it all in a single query, like this:

IList posts = session.CreateQuery(

        @"from Post post left join fetch

        post.Comments where post.Blog = :blog")

    .SetEntity("blog",blog)

    .List();

foreach (Post post in posts)

{

    foreach (Comment comment in post.Comments)

    {

        //print comment...

    }

}

This code is a little more explicit in what it does; it tells NHibernate "give me all the posts for this blog, and make sure to fetch all their comments as well, since I'm going to use this". The result of this query is a single SQL Statement to load all the data in one go.

 

Fetch Join vs. Join

One important thing that you should be aware of is the difference between join fetch and just join. NHibernate can use multiply joins in a single statement, but can fetch only a single collection – it can fetch several properties {many to one or one to one} – in a single query. Fetching means that NHibernate will load the collection and put it in its proper place (for instance, load all the comments for this post and put them in the collection).

Normal joining means that NHibernate will return the correct objects, but it will not be able to associate them to their proper place in the objects without going to the database again.

To clarify, the result of this query:

IList  blogs = session.CreateQuery(

        @"from Blog blog left join fetch

        blog.Posts left join blog.Users

        where blog.id = 1")

    .List();

It is not a Blog object with all its collections already filled. It is a list of tuples { { blog1, user1} , { blog1, user2}, { blog1, user3},{ blog1, user1} }, etc. The list in my case has a length of 567, and contains many duplicate items. The reason it contains duplicate items is that NHibernate doesn't attempt to uniqueify (is that a word?) the result from the database, and because the database returned  567 (for all permutations of a post, blog and user), this is what you get. NHibernate is smart enough to return the same instances, so it is not a problem to do the filtering yourself.

However, if we will try to access the Users' collection of the blog that was returned, we will still get a query to the database to get the collection. NHibernate is smart enough to figure out that it already has the data itself in memory, so it will just ask for the identifiers and nothing more.

 

There are other ways to get the same result, we can use the Criteria API (my favorite), to do the same:

session.CreateCriteria(typeof(Post))

    .SetFetchMode("Comments", FetchMode.Eager)

    .Add(Expression.Eq("Blog", blog))

    .List();

This is one of the more powerful features on NHibernate, since it allows you a fine grained control of the loading strategy, based on your scenario. A fairly simple change turned an 80+1 queries (for a small example. If I was using my own blog for example, I would need ~1300+1 queries to get the data!!) to a single query.

And I can do this on a case by case basis! That is truly powerful.  

The other ways that I mentioned are no as fined grained, and involve changing the mapping for the specified association(s). This is good if you need to change the behavior for all the application in one stroke. Personally, I can't think of hand on many cases where I would need that, but it is good to have the option for when I would need it.