Solving the Select N+1 Problem
Note, I am currently writing the NH Prof documentation, so you are probably going to get quite a bit of NHibernate tidbits over the next few days.
I want to show the user all the comments from all the posts, so they can delete all the nasty comments. The naןve implementation would be something like:
foreach (Post post in session.CreateQuery("from Post").List()) // SELECT * FROM Posts}
{
foreach (Comment comment in post.Comments) //lazy loading of comments list causes: SELECT * FROM Comments where PostId = @p0
{
//print comment...
}
In this example, we can see that we are loading a list of posts ( the first select ) and then traversing the object graph. However, we access the lazily loaded collection, causing NHibernate to go to the database and bring the results one row at a time. This is incredibly inefficient, and the NHibernate Profiler will generate a warning whenever it encounters such a case.
The solution for this example is simple, we simple force an eager load of the collection up front.
Using HQL:
var posts = session.CreateQuery("from Post p left join fetch p.Comments").List();
Using the criteria API:
session.CreateCriteria(typeof(Post))
.SetFetchMode("Comments", FetchMode.Eager)
.List();
In both cases, we will get a join and only a single query to the database.
Note, this is the classic appearance of the problem, it can also surface in other scenarios, such as calling the database in a loop, or more complex object graph traversals. In those cases, it it generally much harder to see what is causing the issue.
NHibernate Profiler will detect those scenarios as well, and give you the exact line in the source code that cause this SQL to be generated.
Another option for solving this issue is: MutliQuery and MultiCriteria, which are also used to solved the issue of Too Many Queries.
Comments
The Select N +1 problem is certainly something commonly used by proponents of stored procedures to dismiss ORMs as being inefficient and suffering from poor performance. Obviously, it takes a careful and somewhat experienced eye with ORMs to recognize when this occurs and how to fix it.
Your NHibernate Profiler should hopefully help mitigate this.
As an aside, any NHibernate tools in the future to help identify potential security holes such as not using named parameters (another favorite target of ORM opponents)?
When accessing elements by index I can understand why an OPF would want to fetch individual elements, but when you are iterating over them using an IEnumerable it is pretty obvious you are going to be accessing them each in turn so why not fetch them in chunks of 50 or something by default?
PS: By "why not fetch them" I meant the class in NHibernate implementing the enumerator.
That join query is still inefficient, as you can introduce a lot of duplicates for Post which you have to wade through when materializing Post instances. With multi-branched graphs, it gets downhill pretty quickly.
It seems that 'SetFetchMode(propertyPath, FetchMode.Eager)' doesn't take into account the 'SubSelect' fetch mode declared in the mapping, which is usually better than outer-joining.
Peter,
It can, you need to configure it, NH in general doesn't try to be too smart in trying to figure out what the best data access strategy is.
That subject is too complex to guess successfully, so you need to tell it how you want it to behave.
See batch="50"
Frans,
For multi branched graphs, you use other options. I am using this to show the classic issue.
Also, you can do eager loading in NH using other options than a join.
Ayende: how does this NH Prof tool work? I mean, does it need to see the app running this code in order to report the warning. If yes, that's a shame because you're doing "lazy loading" of the NH usage of the code in your app :)
I mean, with a compile-time tool you would get all the warnings in a row, without trying to reproduce every bottleneck.
Yes, you need to execute the code in order to get the warning.
Trying to do this with static analysis would be extremely complex.
The piece of code above is completely valid, but you would need to search for all calling code paths and analyze them to try to figure out what is going on here.
Not impossible, but an extremely complex solution
In addition to that, those sort of things are usually not something that you have to work hard to reproduce, they are generally very easy to find.
Well, you would be amazed of the kind of things you can do with Gendarme today (it uses Cecil BTW, now that you mention it), so I'd say it's not so difficult.
In this case you would for IL that resembles a for/foreach construct on a property whose name you have not used in a fetch clause.
Andres,
That is a very simplistic solution.
You would need to scan all the code paths to get to that, and that means that you have to do full system analysis.
You suggestion would create false positive for code path that just created the objects, or would ignore something that does Select N+1 by stepping through the Parent.Parent.Parent relationship
That's why I said "in this case". Ideally you would build the rule meanwhile you come up with more test cases. I just mentioned an initial idea for the first testcase of the rule that would cover the code in this blogspot :)
I'd love to start doing Gendarme rules for NH but don't have too much time for it :(
Ok this is a manual pingback...
I've seen quite a lot of people saying that lazy load is evil. But remember when they were saying that automatic memory management was bad !
For sure I hope one day will come where data fetching will totally transparent, with a minor performance impact.
This day we'll be able to spend our time on more important things.
But today we have to decide between fetch early (performance hit on some scenarii), lazy loads (perf hit on other cases) or no fetch (the data is not available). The 3rd choice can be ok to be Yagni. Fighting 1 vs 2 before reaching a real performance problem is premature optimisation...
Stupid question:
if I have a Blog with a lazy-load collection of Posts.
And do:
Blog b = GetById(1);
foreach(Post p in b.Posts)
...
is that foreach calling each - or since I reference the collection, it gets them all at that point - one call ?
Steve,
That will be two calls.
select blog
select posts where blogid =?
Ok, thank you.
So I guess the best solution is to pass your fetch strategy to your persistence layer wherever you make your calls?
I'm using NHprofiler and I'm getting a SELECT N+1 alert .. but im not sure why.
My object graph is as follows: AuthUsers have SystemUsers, SystemUsers have Permissions with a many-many (SystemUserPermission)
In my mappings, the realtionship is not lazy between either of those entities, and I get the warning. I do see that 4 queries were executed and it looks like it's looping, but my code isn't doing that and it seems to be the result of lazy loading..
SELECT permission0_.SystemUserId as SystemUs1___1_,
FROM Omega.dbo.systemuserpermission permission0_
WHERE permission0_.SystemUserId = 89 /* @p0 */
Any suggestions?
You aren't using lazy load, so it is forcing the load of all the items in sub optimal fashion
Ok, thanks for the reply.
So I should be using lazy load here?
I'm trying to implement your commons library for your unit of work/session in view pattern. When I was using lazy loading I was getting lazy exceptions, so I suppose this was a fix in the wrong direction.
Obviously I'm not the NHibernate expert you are, but I plan to go through everything with your profiler and learn/clean.
Comment preview