Ayende @ Rahien

It's a girl

Eagerly loading entity associations efficiently with NHibernate

One of the things that seems to pop up frequently is people wanting to load an entity with all of its associations eagerly. That is pretty easy to do when the associations are many-to-one (that is, there is only one of them for each root entity). Example of those would be things like Owner, Site, etc.

Here is an HQL query that would load a blog with its owner and site as well:

from Blog b left join fetch b.Owner left join fetch b.Site

The problem starts when you try to do the same for multiple collection associations. NHibernate allows you to do so, but the result is probably not what you would initially expect. This query, for example:

from Blog b left join fetch b.Posts left join fetch b.Users where b.Id = :id

Will result in the following SQL statement:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       user3_.Id             as Id5_2_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__,
       user3_.Password       as Password5_2_,
       user3_.Username       as Username5_2_,
       user3_.Email          as Email5_2_,
       user3_.CreatedAt      as CreatedAt5_2_,
       user3_.Bio            as Bio5_2_,
       users2_.BlogId        as BlogId1__,
       users2_.UserId        as UserId1__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
       left outer join UsersBlogs users2_
         on blog0_.Id = users2_.BlogId
       left outer join Users user3_
         on users2_.UserId = user3_.Id
where  blog0_.Id = 1 /* @p0 */

Something that may not be apparent immediately is going to result in a Cartesian product. This is pointed out in the documentation, but I think that we can all agree that while there may be reasons for this behavior, it is far from ideal.

Let us look at what other OR/Ms are doing, shall we?

The comparable query using Entity Framework would look like this:

db.Blogs
    .Include("Posts")
    .Include("Users")
    .Where(x => x.Id == i)
    .ToArray();

And the resulting SQL would be:

SELECT   [UnionAll1].[Id]             AS [C1],
         [UnionAll1].[Title]          AS [C2],
         [UnionAll1].[Subtitle]       AS [C3],
         [UnionAll1].[AllowsComments] AS [C4],
         [UnionAll1].[CreatedAt]      AS [C5],
         [UnionAll1].[C2]             AS [C6],
         [UnionAll1].[C1]             AS [C7],
         [UnionAll1].[C3]             AS [C8],
         [UnionAll1].[Id1]            AS [C9],
         [UnionAll1].[Title1]         AS [C10],
         [UnionAll1].[Text]           AS [C11],
         [UnionAll1].[PostedAt]       AS [C12],
         [UnionAll1].[BlogId]         AS [C13],
         [UnionAll1].[UserId]         AS [C14],
         [UnionAll1].[C4]             AS [C15],
         [UnionAll1].[C5]             AS [C16],
         [UnionAll1].[C6]             AS [C17],
         [UnionAll1].[C7]             AS [C18],
         [UnionAll1].[C8]             AS [C19],
         [UnionAll1].[C9]             AS [C20]
FROM     (SELECT CASE 
                   WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
                   ELSE 1
                 END AS [C1],
                 [Extent1].[Id]             AS [Id],
                 [Extent1].[Title]          AS [Title],
                 [Extent1].[Subtitle]       AS [Subtitle],
                 [Extent1].[AllowsComments] AS [AllowsComments],
                 [Extent1].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C2],
                 CASE 
                   WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
                   ELSE 1
                 END AS [C3],
                 [Extent2].[Id]             AS [Id1],
                 [Extent2].[Title]          AS [Title1],
                 [Extent2].[Text]           AS [Text],
                 [Extent2].[PostedAt]       AS [PostedAt],
                 [Extent2].[BlogId]         AS [BlogId],
                 [Extent2].[UserId]         AS [UserId],
                 CAST(NULL AS int)          AS [C4],
                 CAST(NULL AS varbinary(1)) AS [C5],
                 CAST(NULL AS varchar(1))   AS [C6],
                 CAST(NULL AS varchar(1))   AS [C7],
                 CAST(NULL AS datetime)     AS [C8],
                 CAST(NULL AS varchar(1))   AS [C9]
          FROM   [dbo].[Blogs] AS [Extent1]
                 LEFT OUTER JOIN [dbo].[Posts] AS [Extent2]
                   ON [Extent1].[Id] = [Extent2].[BlogId]
          WHERE  [Extent1].[Id] = 1 /* @p__linq__1 */
          UNION ALL
          SELECT 2                          AS [C1],
                 [Extent3].[Id]             AS [Id],
                 [Extent3].[Title]          AS [Title],
                 [Extent3].[Subtitle]       AS [Subtitle],
                 [Extent3].[AllowsComments] AS [AllowsComments],
                 [Extent3].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C2],
                 CAST(NULL AS int)          AS [C3],
                 CAST(NULL AS int)          AS [C4],
                 CAST(NULL AS varchar(1))   AS [C5],
                 CAST(NULL AS varchar(1))   AS [C6],
                 CAST(NULL AS datetime)     AS [C7],
                 CAST(NULL AS int)          AS [C8],
                 CAST(NULL AS int)          AS [C9],
                 [Join2].[Id]               AS [Id1],
                 [Join2].[Password]         AS [Password],
                 [Join2].[Username]         AS [Username],
                 [Join2].[Email]            AS [Email],
                 [Join2].[CreatedAt]        AS [CreatedAt1],
                 [Join2].[Bio]              AS [Bio]
          FROM   [dbo].[Blogs] AS [Extent3]
                 INNER JOIN (SELECT [Extent4].[UserId]    AS [UserId],
                                    [Extent4].[BlogId]    AS [BlogId],
                                    [Extent5].[Id]        AS [Id],
                                    [Extent5].[Password]  AS [Password],
                                    [Extent5].[Username]  AS [Username],
                                    [Extent5].[Email]     AS [Email],
                                    [Extent5].[CreatedAt] AS [CreatedAt],
                                    [Extent5].[Bio]       AS [Bio]
                             FROM   [dbo].[UsersBlogs] AS [Extent4]
                                    INNER JOIN [dbo].[Users] AS [Extent5]
                                      ON [Extent5].[Id] = [Extent4].[UserId]) AS [Join2]
                   ON [Extent3].[Id] = [Join2].[BlogId]
          WHERE  [Extent3].[Id] = 1 /* @p__linq__1 */) AS [UnionAll1]
ORDER BY [UnionAll1].[Id] ASC,
         [UnionAll1].[C1] ASC

At this point, I am pretty sure, your eyes shut down in self defense. This is one complex query. But, basically, this is a complex query because EF is executing the following two queries and unioning them.

Eager load Blog Posts:

SELECT CASE 
       WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
       ELSE 1
     END AS [C1],
     [Extent1].[Id]             AS [Id],
     [Extent1].[Title]          AS [Title],
     [Extent1].[Subtitle]       AS [Subtitle],
     [Extent1].[AllowsComments] AS [AllowsComments],
     [Extent1].[CreatedAt]      AS [CreatedAt],
     1                          AS [C2],
     CASE 
       WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
       ELSE 1
     END AS [C3],
     [Extent2].[Id]             AS [Id1],
     [Extent2].[Title]          AS [Title1],
     [Extent2].[Text]           AS [Text],
     [Extent2].[PostedAt]       AS [PostedAt],
     [Extent2].[BlogId]         AS [BlogId],
     [Extent2].[UserId]         AS [UserId],
     CAST(NULL AS int)          AS [C4],
     CAST(NULL AS varbinary(1)) AS [C5],
     CAST(NULL AS varchar(1))   AS [C6],
     CAST(NULL AS varchar(1))   AS [C7],
     CAST(NULL AS datetime)     AS [C8],
     CAST(NULL AS varchar(1))   AS [C9]
FROM   [dbo].[Blogs] AS [Extent1]
     LEFT OUTER JOIN [dbo].[Posts] AS [Extent2]
       ON [Extent1].[Id] = [Extent2].[BlogId]
WHERE  [Extent1].[Id] = 1 /* @p__linq__1 */

Eager load Blog Users:

SELECT 2                          AS [C1],
     [Extent3].[Id]             AS [Id],
     [Extent3].[Title]          AS [Title],
     [Extent3].[Subtitle]       AS [Subtitle],
     [Extent3].[AllowsComments] AS [AllowsComments],
     [Extent3].[CreatedAt]      AS [CreatedAt],
     1                          AS [C2],
     CAST(NULL AS int)          AS [C3],
     CAST(NULL AS int)          AS [C4],
     CAST(NULL AS varchar(1))   AS [C5],
     CAST(NULL AS varchar(1))   AS [C6],
     CAST(NULL AS datetime)     AS [C7],
     CAST(NULL AS int)          AS [C8],
     CAST(NULL AS int)          AS [C9],
     [Join2].[Id]               AS [Id1],
     [Join2].[Password]         AS [Password],
     [Join2].[Username]         AS [Username],
     [Join2].[Email]            AS [Email],
     [Join2].[CreatedAt]        AS [CreatedAt1],
     [Join2].[Bio]              AS [Bio]
FROM   [dbo].[Blogs] AS [Extent3]
     INNER JOIN (SELECT [Extent4].[UserId]    AS [UserId],
                        [Extent4].[BlogId]    AS [BlogId],
                        [Extent5].[Id]        AS [Id],
                        [Extent5].[Password]  AS [Password],
                        [Extent5].[Username]  AS [Username],
                        [Extent5].[Email]     AS [Email],
                        [Extent5].[CreatedAt] AS [CreatedAt],
                        [Extent5].[Bio]       AS [Bio]
                 FROM   [dbo].[UsersBlogs] AS [Extent4]
                        INNER JOIN [dbo].[Users] AS [Extent5]
                          ON [Extent5].[Id] = [Extent4].[UserId]) AS [Join2]
       ON [Extent3].[Id] = [Join2].[BlogId]
WHERE  [Extent3].[Id] = 1 /* @p__linq__1 */) 

The query may be complex, but it get the job done and does so without bothering the us much. The question is, can we do the same with NHibernate?

As it run out, we can, pretty easily too. The following code will do just that:

 var blogs = s.CreateQuery("from Blog b left join fetch b.Posts where b.Id = :id")
     .SetParameter("id", 1)
     .Future<Blog>();

 s.CreateQuery("from Blog b left join fetch b.Users where b.Id = :id1")
     .SetParameter("id1", 1)
     .Future<Blog>();

So, what is going on here? We are actually issuing two queries, each of them to eagerly load a single collection. The trick is that we are using future queries to do so. That means that the query that is going to be sent to the database to get those results is:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
where  blog0_.Id = 1 /* @p0 */
select blog0_.Id             as Id7_0_,
       user2_.Id             as Id5_1_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       user2_.Password       as Password5_1_,
       user2_.Username       as Username5_1_,
       user2_.Email          as Email5_1_,
       user2_.CreatedAt      as CreatedAt5_1_,
       user2_.Bio            as Bio5_1_,
       users1_.BlogId        as BlogId0__,
       users1_.UserId        as UserId0__
from   Blogs blog0_
       left outer join UsersBlogs users1_
         on blog0_.Id = users1_.BlogId
       left outer join Users user2_
         on users1_.UserId = user2_.Id
where  blog0_.Id = 1 /* @p1 */

Note that we are essentially discarding the results of the second query. The reason for that is that we aren’t actually interested in those results, we execute this query solely to get NHibernate to fill the Users’ collection of the relevant entity.

I generally use this method so I would have the first query to eager load all the many-to-one associations, and then a series of queries (one per required loaded collection) to load one-to-many or many-to-many associations.

Comments

Barry Dahlberg
01/16/2010 10:38 AM by
Barry Dahlberg

You are scarily relevant sometimes, are you reading my code?

Diego Mijelshon
01/16/2010 11:49 AM by
Diego Mijelshon

"the first query to eager load all the one-to-many associations, and then a series of queries (one per required loaded collection) to load many-to-one or many-to-many associations."

Didn't you mean the opposite?

First query to load many-to-one, the rest to load one-to-many and many-to-many (collections).

tobi
01/16/2010 11:56 AM by
tobi

I recently hit this problem when stream processing a gigantic table with a few "child"-tables like in your example. Left join did a poor job while the union all pattern worked very nice. SQL Server is defiantely able to optimize it into a merge join when the subtables are clustered in the right way or have indices.

I would like to request the union-all join for NHibernate as it seems to me that it is superior in every case.

Ayende Rahien
01/16/2010 02:19 PM by
Ayende Rahien

Diego,

Yes, I did mean the opposite, fixed.

cowgaR
01/16/2010 02:37 PM by
cowgaR

excellent job!

the only par missing are the results which we'll get using ANTLR based LINQ provider (that Steve Strong is working on) running the same LINQ query.

Will it use futures as well and thus will "spit" the same SQL? Maybe it's the question for Steve...

Nick Berardi
01/16/2010 03:32 PM by
Nick Berardi

Ayende,

You cheated a little. You used HQL to generate your one query which is much more efficient than if you did the same through NHibernate LINQ.

Same is true for EF, if you wanted to compare apples to apples you should have used EFQL.

Ayende Rahien
01/16/2010 06:11 PM by
Ayende Rahien

Nick,

You can do the same using the Criteria API, which is what the release NH Linq provider is based on.

And you can do the same with the trunk Linq provider.

Andrey Shchekin
01/16/2010 07:44 PM by
Andrey Shchekin

In this specific example EF eager-loading API looks better:

EF uses "Includes" which is an abstract concept, while NH uses "join" which is a DB concept.

sarong
01/16/2010 08:00 PM by
sarong

Ayende,

and what to do if a have to do the same thing but paged?

Ayende Rahien
01/16/2010 09:16 PM by
Ayende Rahien

Sarong,

SetMaxResults, but that might cause issues with the loading.

At that point, I might want to execute two totally separate queries, first to get just the entities, then to load the related associations.

Sam Meacham
01/17/2010 12:12 AM by
Sam Meacham

VERY glad you posted this, as I've been trying to figure out how to load more complex object graphs in NH for quite some time now.

I think Andrey Shchekin has it right though, the API for EF is superior when it comes to situations like this. The ease of use of the Include() method, and the flexibility of what it will interpret makes EF not only easier to use, but perhaps allows for certain queries that NH is not able to construct.

Loading a single Blog, with both the Posts and Users collections populated is fine, but it's neither a large nor an overly complex object graph. EF allows you to put together pretty much anything you want.

var q =

from blog in context.Blogs

.Include("Users.Friends")

.Include("Users.Preferences")

.Include("Comments.Replies")

.Include("Posts.Comments")

where blog.ID == 1

select blog;

That would give you a blog object, with the Users collection already populated, and for each User, their Friends collections populated, each Users Preferences object pre fetched, the root Blogs Comments collection, with each comments Replies collection populated, the root blogs Posts, with each Posts comments, etc. EF will do this in a single query, and will ONLY produce the number of records required to get the data back. The way I've been doing this with NH, I get the data back I want, but the SQL generated does a bunch of flat (no subselects) left joins that give me a giant cartesian product that produces millions of records, taking minutes to execute on the server sometimes.

Could you possibly show an example using Futures that does a more complex object graph like this? I'd be happy to see it working, but in the end, I still don't think it can be done in NH with a single query.

Also, the Include() method is arbitrary enough that I don't have to construct the queries, specifying the joins, setting up the futures, etc. It's just a list of strings indicating the relationship paths I want to eagerly fetch. This allows me to create ORM agnostic methods like:

T GetByKey(object key, string where, params string[] includes)

That lets me grab any entity type by key, with a where filter, and a list of related entities I want to eagerly fetch.

Can the Futures be set up generically, using nothing but a string[] of association paths you want to load along with your object? Including any number of branches and multiple levels of collections?

Been trying to solve this for a while! I'd love some help...

Fabio Maulo
01/17/2010 03:27 AM by
Fabio Maulo

We should implements a pending task merging the behaviour of <batch-size with
<fetch introducing fetch="delayed" (delayed = batch-load at first iteration).

Toloma&#252;s
01/17/2010 09:22 AM by
Tolomaüs

Fabio,

I'm not sure if I understood what you mean but I think the two features missing in NH at the moment to allow for this kind of complex fetching are:

  • combining fetch="subselect" with lazy="true"/"false" for one-to-many associations

currently it always assumes lazy="true", thus requiring an Initialize() - and so an additional trip to the db - to load the collections. Specifying lazy="false" should result in combining the original sql query with the subselect query in one db trip IMO

  • the possibility to specify fetch="subselect" for many-to-one associations as well

In case you want to load the may-to-ones that are references through one-to-many. e.g. blog.Posts.Writer

I don't think it's a huge problem because yo can easily work around it by writing your own custom hql for each association of course.

Stephen
01/17/2010 08:57 PM by
Stephen

Many, many thanks for this post. I'm used to LLBLGen, which handles complex prefetch graphs very easily. Every time I've tried to use NHibernate, I've hit this issue, I couldn't find any article that would at least acknowledge the issue, and try to propose solutions.

I still thing that NHibernate lacks an easy way to prefetch complex graphs (including filtering, sorting, etc.), though.

Or is it that I am not thinking "the NHibernate way"?

Graham
01/20/2010 11:20 PM by
Graham

Hmm... I think I've been too complicated then. In the past I've done something similar with MultiQueries but simply discarded the second (or Third or whatever) set of results as I know NH has loaded it.

Though this is in a 1.2.0GA app, I assume Futures is the way to do this now in 2.1.3 ?

If you choose to do so, I'm guessing there wil be a NH-Linq equivalent in NH 3.0.0?

Simon Bateman
01/22/2010 12:02 AM by
Simon Bateman

Ayende,

I lead the development of a C++ base ORM for a company starting in ’96 (using C++ and COM). The ORM is successfully running in a large percentage of the worlds news papers for the last 10 years It had a more formal approach to defining object hierarchies.

We initially used a join approach to optimise the fetching of “contained collections” (associated entities) but as you’ve noted this became complex and slow loading many associations (including child and grandchild associations).

We solved the issue by factoring out these loads into separate queries. Knowing the query to return entities for the root entity and the associations joins to that root entity it’s easy to load the associated entities via a select from their table joined to the root table. In detailed performance testing the cost of the additional query was easily outweighed by the additional cost of returning and processing the duplicated root entity data.

You’ve practically done that with the 2 future joins – with the 2 queries join via the SQL union. Could nHibernate be updated to have an option to load associated entities via additional queries instead of via joins (for worse still fetches).

I’ve moved onto a new company now and am having to do multiple future joins as you suggest to overcome some N+1 performance issues.

Thanks,

SB

Ayende Rahien
01/22/2010 06:57 AM by
Ayende Rahien

Simon,

It is likely that NH will gain something like this, yes.

Steven De Kock
01/28/2010 09:01 AM by
Steven De Kock

Ayende,

Can you show us an example for loading collections of collections.

How would you write a query and avoid a Cartesian product multiple levels deep ?

Ayende Rahien
01/30/2010 11:16 PM by
Ayende Rahien

Steven,

You can't, period.

Mohamed Meligy
02/08/2010 01:10 PM by
Mohamed Meligy

Not even using a query like ... WHERE child.id IN (SELECT parent.id FROM parent ParentCollection WHERE parent.RootObjectId = :aggregateRootObjectId) ... ?

Mohamed Meligy
02/08/2010 01:16 PM by
Mohamed Meligy

Sorry, I meant child.ParentId ..

Correction:

select * from ChildCollection child WHERE child.id IN (SELECT parent.id FROM parent ParentCollection WHERE parent.RootObjectId = :aggregateRootObjectId)

Mohamed Meligy
02/08/2010 01:18 PM by
Mohamed Meligy

SELECT * FROM ChildCollection child WHERE child.ParentId IN (SELECT parent.id FROM parent ParentCollection WHERE parent.RootObjectId = :aggregateRootObjectId)

Ayende Rahien
02/08/2010 08:22 PM by
Ayende Rahien

Mohamed,

Depending on which DB you use, IN queries do not use the proper indexing

Comments have been closed on this topic.