Ayende @ Rahien

Refunds available at head office

Performance: Multiply Collections Fetch With NHibernate

We are back to the traditional model of Blog -> Posts -> Comments. Now, we want to load all the blogs, posts and comments and index them to improve search performance. A naive implemention will use this code:

foreach (Blog blog in session.CreateQuery("from Blog").List<Blog>())

{

       Index(blog,"Blog");

       foreach (Post post in blog.Posts)

       {

              Index(post, "Post");

              foreach (Comment comment in post.Comments)

              {

                     Index(comment, "Comments");

              }

       }

}

This code is going to produce an ungodly amount of database queries. We want to grab the data from the database in as few queries as possible. Obviously we want to use some sort of eager loading, but what can we use? Let us analyze what are our options for solving this.

The easiest route is to tell NHibernate to load the entire object graph in one go. For a long time, NHibernate had a limitation of a single collection eager load per query. This means that you couldn't load both the Posts collection and the Comments collection. Sergey has recently removed this limitation, so let us replacing the HQL query with:

from Blog blog

     left join fetch blog.Posts post

     left join fetch post.Comments

This query results in this SQL (I cleaned up the names a bit):

select  blog.Id ,                                    post.BlogId ,

        post.Id ,                                    post.UserId ,

        comment.Id ,                                 comment.Name ,

        blog.Title ,                                 comment.Email ,

        blog.Subtitle ,                              comment.HomePage ,

        blog.AllowsComments ,                        comment.Ip ,

        blog.CreatedAt ,                             comment.Text ,

        post.Title ,                                 comment.PostId ,

        post.Text ,                                  post.BlogId ,

        post.PostedAt ,                              post.Id ,

        comment.PostId ,                             comment.Id

from    Blogs blog

        left outer join Posts post on blog.Id = post.BlogId

        left outer join Comments comment on post.Id = comment.PostId

A word of warning, though:

There is a non trivial cartesian product here. If you know that you are working on small sets, that is fine, but be wary of using this technique over large sets of data.

Comments

Dan
03/19/2007 07:48 AM by
Dan

Wouldn't the following be just as efficient, without the chance of getting a huge Cartesian product?

IList list = (IList)session.CreateMutliQuery()

.Add("from Blog")

.Add("select p from Post p left join p.Blog b")

.Add("select c from Comment c left join c.Post p left join p.Blog b")

.List()[0];

foreach (Blog blog in list)

...

I'm a bit uncertain if I used the right syntax, I mostly use the query API, and the point with the joins is so you can add a where statement to the blog, and simply add the same statement in the other 2 queries to avoid pulling out other posts/comments.

Roger
03/19/2007 02:07 PM by
Roger

I've been waiting for this feature a long time! Perfect!

Can you please inform where to find this update?

Comments have been closed on this topic.