Eagerly loading entity associations efficiently with NHibernate
One of the things that seems to pop up frequently is people wanting to load an entity with all of its associations eagerly. That is pretty easy to do when the associations are many-to-one (that is, there is only one of them for each root entity). Example of those would be things like Owner, Site, etc.
Here is an HQL query that would load a blog with its owner and site as well:
from Blog b left join fetch b.Owner left join fetch b.Site
The problem starts when you try to do the same for multiple collection associations. NHibernate allows you to do so, but the result is probably not what you would initially expect. This query, for example:
from Blog b left join fetch b.Posts left join fetch b.Users where b.Id = :id
Will result in the following SQL statement:
select blog0_.Id as Id7_0_, posts1_.Id as Id0_1_, user3_.Id as Id5_2_, blog0_.Title as Title7_0_, blog0_.Subtitle as Subtitle7_0_, blog0_.AllowsComments as AllowsCo4_7_0_, blog0_.CreatedAt as CreatedAt7_0_, posts1_.Title as Title0_1_, posts1_.Text as Text0_1_, posts1_.PostedAt as PostedAt0_1_, posts1_.BlogId as BlogId0_1_, posts1_.UserId as UserId0_1_, posts1_.BlogId as BlogId0__, posts1_.Id as Id0__, user3_.Password as Password5_2_, user3_.Username as Username5_2_, user3_.Email as Email5_2_, user3_.CreatedAt as CreatedAt5_2_, user3_.Bio as Bio5_2_, users2_.BlogId as BlogId1__, users2_.UserId as UserId1__ from Blogs blog0_ left outer join Posts posts1_ on blog0_.Id = posts1_.BlogId left outer join UsersBlogs users2_ on blog0_.Id = users2_.BlogId left outer join Users user3_ on users2_.UserId = user3_.Id where blog0_.Id = 1 /* @p0 */
Something that may not be apparent immediately is going to result in a Cartesian product. This is pointed out in the documentation, but I think that we can all agree that while there may be reasons for this behavior, it is far from ideal.
Let us look at what other OR/Ms are doing, shall we?
The comparable query using Entity Framework would look like this:
db.Blogs .Include("Posts") .Include("Users") .Where(x => x.Id == i) .ToArray();
And the resulting SQL would be:
SELECT [UnionAll1].[Id] AS [C1], [UnionAll1].[Title] AS [C2], [UnionAll1].[Subtitle] AS [C3], [UnionAll1].[AllowsComments] AS [C4], [UnionAll1].[CreatedAt] AS [C5], [UnionAll1].[C2] AS [C6], [UnionAll1].[C1] AS [C7], [UnionAll1].[C3] AS [C8], [UnionAll1].[Id1] AS [C9], [UnionAll1].[Title1] AS [C10], [UnionAll1].[Text] AS [C11], [UnionAll1].[PostedAt] AS [C12], [UnionAll1].[BlogId] AS [C13], [UnionAll1].[UserId] AS [C14], [UnionAll1].[C4] AS [C15], [UnionAll1].[C5] AS [C16], [UnionAll1].[C6] AS [C17], [UnionAll1].[C7] AS [C18], [UnionAll1].[C8] AS [C19], [UnionAll1].[C9] AS [C20] FROM (SELECT CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], [Extent1].[Id] AS [Id], [Extent1].[Title] AS [Title], [Extent1].[Subtitle] AS [Subtitle], [Extent1].[AllowsComments] AS [AllowsComments], [Extent1].[CreatedAt] AS [CreatedAt], 1 AS [C2], CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3], [Extent2].[Id] AS [Id1], [Extent2].[Title] AS [Title1], [Extent2].[Text] AS [Text], [Extent2].[PostedAt] AS [PostedAt], [Extent2].[BlogId] AS [BlogId], [Extent2].[UserId] AS [UserId], CAST(NULL AS int) AS [C4], CAST(NULL AS varbinary(1)) AS [C5], CAST(NULL AS varchar(1)) AS [C6], CAST(NULL AS varchar(1)) AS [C7], CAST(NULL AS datetime) AS [C8], CAST(NULL AS varchar(1)) AS [C9] FROM [dbo].[Blogs] AS [Extent1] LEFT OUTER JOIN [dbo].[Posts] AS [Extent2] ON [Extent1].[Id] = [Extent2].[BlogId] WHERE [Extent1].[Id] = 1 /* @p__linq__1 */ UNION ALL SELECT 2 AS [C1], [Extent3].[Id] AS [Id], [Extent3].[Title] AS [Title], [Extent3].[Subtitle] AS [Subtitle], [Extent3].[AllowsComments] AS [AllowsComments], [Extent3].[CreatedAt] AS [CreatedAt], 1 AS [C2], CAST(NULL AS int) AS [C3], CAST(NULL AS int) AS [C4], CAST(NULL AS varchar(1)) AS [C5], CAST(NULL AS varchar(1)) AS [C6], CAST(NULL AS datetime) AS [C7], CAST(NULL AS int) AS [C8], CAST(NULL AS int) AS [C9], [Join2].[Id] AS [Id1], [Join2].[Password] AS [Password], [Join2].[Username] AS [Username], [Join2].[Email] AS [Email], [Join2].[CreatedAt] AS [CreatedAt1], [Join2].[Bio] AS [Bio] FROM [dbo].[Blogs] AS [Extent3] INNER JOIN (SELECT [Extent4].[UserId] AS [UserId], [Extent4].[BlogId] AS [BlogId], [Extent5].[Id] AS [Id], [Extent5].[Password] AS [Password], [Extent5].[Username] AS [Username], [Extent5].[Email] AS [Email], [Extent5].[CreatedAt] AS [CreatedAt], [Extent5].[Bio] AS [Bio] FROM [dbo].[UsersBlogs] AS [Extent4] INNER JOIN [dbo].[Users] AS [Extent5] ON [Extent5].[Id] = [Extent4].[UserId]) AS [Join2] ON [Extent3].[Id] = [Join2].[BlogId] WHERE [Extent3].[Id] = 1 /* @p__linq__1 */) AS [UnionAll1] ORDER BY [UnionAll1].[Id] ASC, [UnionAll1].[C1] ASC
At this point, I am pretty sure, your eyes shut down in self defense. This is one complex query. But, basically, this is a complex query because EF is executing the following two queries and unioning them.
Eager load Blog Posts:
SELECT CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], [Extent1].[Id] AS [Id], [Extent1].[Title] AS [Title], [Extent1].[Subtitle] AS [Subtitle], [Extent1].[AllowsComments] AS [AllowsComments], [Extent1].[CreatedAt] AS [CreatedAt], 1 AS [C2], CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3], [Extent2].[Id] AS [Id1], [Extent2].[Title] AS [Title1], [Extent2].[Text] AS [Text], [Extent2].[PostedAt] AS [PostedAt], [Extent2].[BlogId] AS [BlogId], [Extent2].[UserId] AS [UserId], CAST(NULL AS int) AS [C4], CAST(NULL AS varbinary(1)) AS [C5], CAST(NULL AS varchar(1)) AS [C6], CAST(NULL AS varchar(1)) AS [C7], CAST(NULL AS datetime) AS [C8], CAST(NULL AS varchar(1)) AS [C9] FROM [dbo].[Blogs] AS [Extent1] LEFT OUTER JOIN [dbo].[Posts] AS [Extent2] ON [Extent1].[Id] = [Extent2].[BlogId] WHERE [Extent1].[Id] = 1 /* @p__linq__1 */
Eager load Blog Users:
SELECT 2 AS [C1], [Extent3].[Id] AS [Id], [Extent3].[Title] AS [Title], [Extent3].[Subtitle] AS [Subtitle], [Extent3].[AllowsComments] AS [AllowsComments], [Extent3].[CreatedAt] AS [CreatedAt], 1 AS [C2], CAST(NULL AS int) AS [C3], CAST(NULL AS int) AS [C4], CAST(NULL AS varchar(1)) AS [C5], CAST(NULL AS varchar(1)) AS [C6], CAST(NULL AS datetime) AS [C7], CAST(NULL AS int) AS [C8], CAST(NULL AS int) AS [C9], [Join2].[Id] AS [Id1], [Join2].[Password] AS [Password], [Join2].[Username] AS [Username], [Join2].[Email] AS [Email], [Join2].[CreatedAt] AS [CreatedAt1], [Join2].[Bio] AS [Bio] FROM [dbo].[Blogs] AS [Extent3] INNER JOIN (SELECT [Extent4].[UserId] AS [UserId], [Extent4].[BlogId] AS [BlogId], [Extent5].[Id] AS [Id], [Extent5].[Password] AS [Password], [Extent5].[Username] AS [Username], [Extent5].[Email] AS [Email], [Extent5].[CreatedAt] AS [CreatedAt], [Extent5].[Bio] AS [Bio] FROM [dbo].[UsersBlogs] AS [Extent4] INNER JOIN [dbo].[Users] AS [Extent5] ON [Extent5].[Id] = [Extent4].[UserId]) AS [Join2] ON [Extent3].[Id] = [Join2].[BlogId] WHERE [Extent3].[Id] = 1 /* @p__linq__1 */)
The query may be complex, but it get the job done and does so without bothering the us much. The question is, can we do the same with NHibernate?
As it run out, we can, pretty easily too. The following code will do just that:
var blogs = s.CreateQuery("from Blog b left join fetch b.Posts where b.Id = :id") .SetParameter("id", 1) .Future<Blog>(); s.CreateQuery("from Blog b left join fetch b.Users where b.Id = :id1") .SetParameter("id1", 1) .Future<Blog>();
So, what is going on here? We are actually issuing two queries, each of them to eagerly load a single collection. The trick is that we are using future queries to do so. That means that the query that is going to be sent to the database to get those results is:
select blog0_.Id as Id7_0_, posts1_.Id as Id0_1_, blog0_.Title as Title7_0_, blog0_.Subtitle as Subtitle7_0_, blog0_.AllowsComments as AllowsCo4_7_0_, blog0_.CreatedAt as CreatedAt7_0_, posts1_.Title as Title0_1_, posts1_.Text as Text0_1_, posts1_.PostedAt as PostedAt0_1_, posts1_.BlogId as BlogId0_1_, posts1_.UserId as UserId0_1_, posts1_.BlogId as BlogId0__, posts1_.Id as Id0__ from Blogs blog0_ left outer join Posts posts1_ on blog0_.Id = posts1_.BlogId where blog0_.Id = 1 /* @p0 */ select blog0_.Id as Id7_0_, user2_.Id as Id5_1_, blog0_.Title as Title7_0_, blog0_.Subtitle as Subtitle7_0_, blog0_.AllowsComments as AllowsCo4_7_0_, blog0_.CreatedAt as CreatedAt7_0_, user2_.Password as Password5_1_, user2_.Username as Username5_1_, user2_.Email as Email5_1_, user2_.CreatedAt as CreatedAt5_1_, user2_.Bio as Bio5_1_, users1_.BlogId as BlogId0__, users1_.UserId as UserId0__ from Blogs blog0_ left outer join UsersBlogs users1_ on blog0_.Id = users1_.BlogId left outer join Users user2_ on users1_.UserId = user2_.Id where blog0_.Id = 1 /* @p1 */
Note that we are essentially discarding the results of the second query. The reason for that is that we aren’t actually interested in those results, we execute this query solely to get NHibernate to fill the Users’ collection of the relevant entity.
I generally use this method so I would have the first query to eager load all the many-to-one associations, and then a series of queries (one per required loaded collection) to load one-to-many or many-to-many associations.
Comments
You are scarily relevant sometimes, are you reading my code?
"the first query to eager load all the one-to-many associations, and then a series of queries (one per required loaded collection) to load many-to-one or many-to-many associations."
Didn't you mean the opposite?
First query to load many-to-one, the rest to load one-to-many and many-to-many (collections).
I recently hit this problem when stream processing a gigantic table with a few "child"-tables like in your example. Left join did a poor job while the union all pattern worked very nice. SQL Server is defiantely able to optimize it into a merge join when the subtables are clustered in the right way or have indices.
I would like to request the union-all join for NHibernate as it seems to me that it is superior in every case.
Diego,
Yes, I did mean the opposite, fixed.
excellent job!
the only par missing are the results which we'll get using ANTLR based LINQ provider (that Steve Strong is working on) running the same LINQ query.
Will it use futures as well and thus will "spit" the same SQL? Maybe it's the question for Steve...
Ayende,
You cheated a little. You used HQL to generate your one query which is much more efficient than if you did the same through NHibernate LINQ.
Same is true for EF, if you wanted to compare apples to apples you should have used EFQL.
Nick,
You can do the same using the Criteria API, which is what the release NH Linq provider is based on.
And you can do the same with the trunk Linq provider.
In this specific example EF eager-loading API looks better:
EF uses "Includes" which is an abstract concept, while NH uses "join" which is a DB concept.
Ayende,
and what to do if a have to do the same thing but paged?
Sarong,
SetMaxResults, but that might cause issues with the loading.
At that point, I might want to execute two totally separate queries, first to get just the entities, then to load the related associations.
VERY glad you posted this, as I've been trying to figure out how to load more complex object graphs in NH for quite some time now.
I think Andrey Shchekin has it right though, the API for EF is superior when it comes to situations like this. The ease of use of the Include() method, and the flexibility of what it will interpret makes EF not only easier to use, but perhaps allows for certain queries that NH is not able to construct.
Loading a single Blog, with both the Posts and Users collections populated is fine, but it's neither a large nor an overly complex object graph. EF allows you to put together pretty much anything you want.
var q =
from blog in context.Blogs
.Include("Users.Friends")
.Include("Users.Preferences")
.Include("Comments.Replies")
.Include("Posts.Comments")
where blog.ID == 1
select blog;
That would give you a blog object, with the Users collection already populated, and for each User, their Friends collections populated, each Users Preferences object pre fetched, the root Blogs Comments collection, with each comments Replies collection populated, the root blogs Posts, with each Posts comments, etc. EF will do this in a single query, and will ONLY produce the number of records required to get the data back. The way I've been doing this with NH, I get the data back I want, but the SQL generated does a bunch of flat (no subselects) left joins that give me a giant cartesian product that produces millions of records, taking minutes to execute on the server sometimes.
Could you possibly show an example using Futures that does a more complex object graph like this? I'd be happy to see it working, but in the end, I still don't think it can be done in NH with a single query.
Also, the Include() method is arbitrary enough that I don't have to construct the queries, specifying the joins, setting up the futures, etc. It's just a list of strings indicating the relationship paths I want to eagerly fetch. This allows me to create ORM agnostic methods like:
T GetByKey<T>(object key, string where, params string[] includes)
That lets me grab any entity type by key, with a where filter, and a list of related entities I want to eagerly fetch.
Can the Futures be set up generically, using nothing but a string[] of association paths you want to load along with your object? Including any number of branches and multiple levels of collections?
Been trying to solve this for a while! I'd love some help...
We should implements a pending task merging the behaviour of <batch-size with
<fetch introducing fetch="delayed" (delayed = batch-load at first iteration).
Fabio,
I'm not sure if I understood what you mean but I think the two features missing in NH at the moment to allow for this kind of complex fetching are:
currently it always assumes lazy="true", thus requiring an Initialize() - and so an additional trip to the db - to load the collections. Specifying lazy="false" should result in combining the original sql query with the subselect query in one db trip IMO
In case you want to load the may-to-ones that are references through one-to-many. e.g. blog.Posts.Writer
I don't think it's a huge problem because yo can easily work around it by writing your own custom hql for each association of course.
Many, many thanks for this post. I'm used to LLBLGen, which handles complex prefetch graphs very easily. Every time I've tried to use NHibernate, I've hit this issue, I couldn't find any article that would at least acknowledge the issue, and try to propose solutions.
I still thing that NHibernate lacks an easy way to prefetch complex graphs (including filtering, sorting, etc.), though.
Or is it that I am not thinking "the NHibernate way"?
Hmm... I think I've been too complicated then. In the past I've done something similar with MultiQueries but simply discarded the second (or Third or whatever) set of results as I know NH has loaded it.
Though this is in a 1.2.0GA app, I assume Futures is the way to do this now in 2.1.3 ?
If you choose to do so, I'm guessing there wil be a NH-Linq equivalent in NH 3.0.0?
Graham,
Yes & Yes.
Ayende,
I lead the development of a C++ base ORM for a company starting in ’96 (using C++ and COM). The ORM is successfully running in a large percentage of the worlds news papers for the last 10 years It had a more formal approach to defining object hierarchies.
We initially used a join approach to optimise the fetching of “contained collections” (associated entities) but as you’ve noted this became complex and slow loading many associations (including child and grandchild associations).
We solved the issue by factoring out these loads into separate queries. Knowing the query to return entities for the root entity and the associations joins to that root entity it’s easy to load the associated entities via a select from their table joined to the root table. In detailed performance testing the cost of the additional query was easily outweighed by the additional cost of returning and processing the duplicated root entity data.
You’ve practically done that with the 2 future joins – with the 2 queries join via the SQL union. Could nHibernate be updated to have an option to load associated entities via additional queries instead of via joins (for worse still fetches).
I’ve moved onto a new company now and am having to do multiple future joins as you suggest to overcome some N+1 performance issues.
Thanks,
SB
Simon,
It is likely that NH will gain something like this, yes.
Ayende,
Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?
Steven,
You can't, period.
Not even using a query like ... WHERE child.id IN (SELECT parent.id FROM parent ParentCollection WHERE parent.RootObjectId = :aggregateRootObjectId) ... ?
Sorry, I meant child.ParentId ..
Correction:
select * from ChildCollection child WHERE child.id IN (SELECT parent.id FROM parent ParentCollection WHERE parent.RootObjectId = :aggregateRootObjectId)
SELECT * FROM ChildCollection child WHERE child.ParentId IN (SELECT parent.id FROM parent ParentCollection WHERE parent.RootObjectId = :aggregateRootObjectId)
Mohamed,
Depending on which DB you use, IN queries do not use the proper indexing
Comment preview