Ayende @ Rahien

It's a girl

NH Prof Alerts: Select N + 1

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

Select N+1 is a data access anti pattern, in which we are accessing the database in one of the least optimal ways. Let us take a look at a code sample, and then discuss what is going on. I want to show the user all the comments from all the posts, so they can delete all the nasty comments. The naןve implementation would be something like:

// SELECT * FROM Posts
foreach (Post post in session.CreateQuery("from Post").List()) 
{
     //lazy loading of comments list causes: SELECT * FROM Comments where PostId = @p0
    foreach (Comment comment in post.Comments) 
    {
        //do something with comment
    }
}


In this example, we can see that we are loading a list of posts ( the first select ) and then traversing the object graph. However, we access the lazily loaded collection, causing NHibernate to go to the database and bring the results one row at a time. This is incredibly inefficient, and the NHibernate Profiler will generate a warning whenever it encounters such a case. The solution for this example is simple, we simple force an eager load of the collection up front.

Using HQL:

var posts = session
	.CreateQuery("from Post p left join fetch p.Comments")
	.List();

Using the criteria API:
session.CreateCriteria(typeof(Post)) 
	.SetFetchMode("Comments", FetchMode.Eager) 
	.List();


In both cases, we will get a join and only a single query to the database. Note, this is the classic appearance of the problem, it can also surface in other scenarios, such as calling the database in a loop, or more complex object graph traversals. In those cases, it it generally much harder to see what is causing the issue.

NHibernate Profiler will detect those scenarios as well, and give you the exact line in the source code that cause this SQL to be generated. Another option for solving this issue is: MutliQuery and MultiCriteria, which are also used to solve the issue of Too Many Queries.

Comments

Steve Bohlen
12/28/2008 08:27 PM by
Steve Bohlen

Select N+1 is a data access anti pattern, in which we are accessing the database in the least optimal way.

I think this should say 'non-optiimal' or 'sub-optimal' as I'm not 100% certain that I cannot think of an even less-optimal way to access the DB :D

which are also used to solved the issue of Too Many Queries.

Minor point, but I think '...solve the issue of...' is what's meant here (recognizing that english isn't your first language and its my ONLY (non-programming) language, so offering as helpful input, not criticism).

Looking (really) forward to this becoming a product I can buy~!

anonymous coward
12/28/2008 09:57 PM by
anonymous coward

I believe a more common name for "Select N+1" is "N+1 selects" antipattern, I don't think there's anything bad in selecting a number N+1 from a database :)

Ayende Rahien
12/29/2008 05:57 AM by
Ayende Rahien

Steve,

Thanks, I updated the docs & the post.

alberto
12/29/2008 11:04 AM by
alberto

Indirectly related, I'm not so sure how people deal with this cases when you have a separate DAL (i.e Repositories), maybe a hint would be useful for your users too.

Ayende Rahien
12/29/2008 12:37 PM by
Ayende Rahien

Alberto,

I don't follow you, can you explain further?

alberto
12/30/2008 01:15 PM by
alberto

If you have a PostRepository and a GetAll() method that lazy loads Comments. Now you want run into this problem for some scenario. What do you do?

Ayende Rahien
12/30/2008 02:35 PM by
Ayende Rahien

Alberto,

You have GetAllWithComments(), or something like that

alberto
12/30/2008 04:19 PM by
alberto

My concerns with that solution are:

1) Does it really work in more complex, real world apps? Seems like it would lead to method explosion in the repository, polluting the API.

2) It doesn't seem a big difference from a PI point of view to the (openly criticized) "explicit lazy loading" in EF.

Ayende Rahien
12/30/2008 04:27 PM by
Ayende Rahien

1) Yes, it does. In practice, there are only a few ways of processing the graph.

2) There is a big difference, the only code that needs to know about this is the persistence code, not all the rest of the code, as is the case with EF.

alberto
12/30/2008 05:15 PM by
alberto

1) Well, if you have N lazy loaded relations there are 2^N possible combinations (hence the method explosion) but you are probably right that in practice YAGNI all those.

2) But the consumers of the repository must now know which of the methods to call. Maybe I'm worrying to much, as it should be obvious (while "1" remains true) depending on what they want.

Thanks for the insight!

Henning
01/03/2009 01:57 PM by
Henning

I might be mistaken, but if I do an eager load, doesn't that like join posts and comments? So when I have like a "bundle" of comments per post, I redundantly retrieve the post-data from the db :(

I would rather like two calls to the DB:

  1. all the posts

  2. get all the comments where the post-id is in the list of post-ids of the posts

This would reduce the DB-traffic as much as possible. But how would I do that painless with NH?

If I would have a methode like "GetAllWithComments()" this could first get all posts, and then issue a second query with a "in"-restriction or something. Or is there a better way?

Ayende Rahien
01/03/2009 08:31 PM by
Ayende Rahien

Henning,

There are many options for eager loading with NH.

In this case, you are talking about the sub select option.

Henning
01/06/2009 09:35 PM by
Henning

maybe I'm missing your point :) what are you trying to tell me by mentioning the "sub select option" - I'm not sure.

I did a quick test: I have two orders, one with one orderitem, the otherone with two orderitems.

When I do the following:

DetachedCriteria criteria = DetachedCriteria.For(typeof (Order))

            .SetFetchMode("OrderItems", FetchMode.Select);


        IList

<order orders = criteria.GetExecutableCriteria(_sessionFactory.OpenSession()).List <order();

I get three SQL-statements (N+1 select!).

When I do the following instead:

DetachedCriteria criteria = DetachedCriteria.For(typeof (Order))

            .SetFetchMode("OrderItems", FetchMode.Eager);


        IList

<order orders = criteria.GetExecutableCriteria(_sessionFactory.OpenSession()).List <order();

I only get one statement, but each row contains the redundant order information.

I would like to get two statements: one for all orders and one for all orderitems, belonging to the previously fetched orders (in this case the orders are not restricted for simplicity).

Henning
01/06/2009 09:38 PM by
Henning

... ahh; does it matter that I'm still using NH 1.2? :)

Ayende Rahien
01/07/2009 06:44 AM by
Ayende Rahien

Yes, NH 1.2 doesn't have subselect fetching.

Henning
01/07/2009 11:18 AM by
Henning

OK - so I linked my little playground to the 2.0.1 GA ... so what do I do next :) I mean - how do I this subselect fetching?

Neal Blomfield
01/09/2009 12:13 AM by
Neal Blomfield

Select N+1 also seems to be caused by specifying lazy = false for the collection in the mapping file. I am guessing that NH2 defaults to using FetchMode.Select if you do not explicitly specify a fetch mode (I need to go read the docs / look at the code but commenting while I remember).

Might be worth adding something about this to the alert docs?

Comments have been closed on this topic.