Combating the Select N + 1 Problem In NHibernate
Using
an O/RM can greatly simplify your life, but is has its on set of Gotcha that
you need to be aware of. One of the more serious ones is the Select N + 1
issue. To describe the issue, let's look at a typical example. Users א M:M א Blogs א 1:M א Posts א 1:M א Comments. Here
is the diagram:
I
hope that both my notation above and diagram makes it clear what the relations
between the objects are (isn't class designer cool)?
As an aside, the above model is from NHibernate.Generics
tests. All collections are lazy.
I
want to show the user all the comments from all the posts, so they can delete
all the nasty comments. The naןve implementation would be something like:
foreach
(Post post in blog.Posts)
{
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
The
posts collection has 80 posts in it. When we access it, we load all the posts
(1 select to grab all the posts), now, when we access each post's Comments'
collection, we need to issue a new select to get the data (1 select to grab all
the comments for this post). In my testing, I got 81 selects from this
seemingly innocent piece of code. I'm not sure about you, but I'm usually
motivated to reduce the number of external queries. To put it another way, for every
iteration of the external loop above, we have a database query. And that is in order
to print a fairly simple UI, and this is a fairly common task.
There
are several ways to solve the issue. The easiest is to just ask NHibernate to
give you it all in a single query, like this:
IList
posts = session.CreateQuery(
@"from Post post left join fetch
post.Comments where post.Blog = :blog")
.SetEntity("blog",blog)
.List();
foreach
(Post post in posts)
{
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
This
code is a little more explicit in what it does; it tells NHibernate "give
me all the posts for this blog, and make sure to fetch all their comments as
well, since I'm going to use this". The result of this query is a single SQL Statement to load all the data
in one go.
Fetch Join vs. Join
One
important thing that you should be aware of is the difference between join fetch and just join. NHibernate can use multiply joins in
a single statement, but can fetch
only a single collection – it
can fetch several properties {many to one or one to one} – in a single
query. Fetching means that NHibernate will load the collection and put it in
its proper place (for instance, load all the comments for this post and put
them in the collection).
Normal
joining means that NHibernate will return the correct objects, but it will not
be able to associate them to their proper place in the objects without going to
the database again.
To
clarify, the result of this query:
IList
blogs = session.CreateQuery(
@"from Blog blog left join fetch
blog.Posts left join blog.Users
where
blog.id = 1")
.List();
It is
not a Blog object with all its collections already filled. It is a list of
tuples { { blog1, user1} , { blog1, user2}, { blog1, user3},{ blog1, user1} },
etc. The list in my case has a length of 567, and contains many duplicate
items. The reason it contains duplicate items is that NHibernate doesn't attempt
to uniqueify (is that a word?) the result from the database, and because the
database returned 567 (for all permutations of a post, blog and user), this is
what you get. NHibernate is smart
enough to return the same instances,
so it is not a problem to do the filtering yourself.
However,
if we will try to access the Users' collection of the blog that was returned,
we will still get a query to the
database to get the collection. NHibernate is smart enough to figure out that
it already has the data itself in memory, so it will just ask for the identifiers
and nothing more.
There
are other ways to get the same result, we can use the Criteria API (my favorite),
to do the same:
session.CreateCriteria(typeof(Post))
.SetFetchMode("Comments", FetchMode.Eager)
.Add(Expression.Eq("Blog", blog))
.List();
This
is one of the more powerful features on NHibernate, since it allows you a fine
grained control of the loading strategy, based on your scenario. A fairly
simple change turned an 80+1 queries (for a small example. If I was using my
own blog for example, I would need ~1300+1 queries to get the data!!) to a
single query.
And I
can do this on a case by case basis! That is truly powerful.
The other
ways that I mentioned are no as fined grained, and involve changing the mapping
for the specified association(s). This is good if you need to change the behavior
for all the application in one stroke. Personally, I can't think of hand on
many cases where I would need that, but it is good to have the option for when
I would need it.
Comments
nkxyoowp
Comment preview