If you are way off in the deep end, there is only so much that tooling can do for you

time to read 18 min | 3414 words

I get a lot of requests for what I term, the regex problem. Why the regex problem?

Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems.Jamie Zawinski in comp.lang.emacs.

A case in point, which comes up repeatedly, is this question:

Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?

In this case, we have someone who wants to load a blog, all its posts, and all its comments, and do it in the most efficient manner possible. At the same time, they want to have the tool handle that for them.

Let us take a look at how two different OR/Ms handle this task, then discuss what an optimal solution is.

First, Entity Framework, using this code:

db.Blogs
    .Include("Posts")
    .Include("Posts.Comments")
    .Where(x => x.Id == 1)
    .ToList();

This code will generate:

SELECT   [Project2].[Id]             AS [Id],
         [Project2].[Title]          AS [Title],
         [Project2].[Subtitle]       AS [Subtitle],
         [Project2].[AllowsComments] AS [AllowsComments],
         [Project2].[CreatedAt]      AS [CreatedAt],
         [Project2].[C1]             AS [C1],
         [Project2].[C4]             AS [C2],
         [Project2].[Id1]            AS [Id1],
         [Project2].[Title1]         AS [Title1],
         [Project2].[Text]           AS [Text],
         [Project2].[PostedAt]       AS [PostedAt],
         [Project2].[BlogId]         AS [BlogId],
         [Project2].[UserId]         AS [UserId],
         [Project2].[C3]             AS [C3],
         [Project2].[C2]             AS [C4],
         [Project2].[Id2]            AS [Id2],
         [Project2].[Name]           AS [Name],
         [Project2].[Email]          AS [Email],
         [Project2].[HomePage]       AS [HomePage],
         [Project2].[Ip]             AS [Ip],
         [Project2].[Text1]          AS [Text1],
         [Project2].[PostId]         AS [PostId]
FROM     (SELECT [Extent1].[Id]             AS [Id],
                 [Extent1].[Title]          AS [Title],
                 [Extent1].[Subtitle]       AS [Subtitle],
                 [Extent1].[AllowsComments] AS [AllowsComments],
                 [Extent1].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C1],
                 [Project1].[Id]            AS [Id1],
                 [Project1].[Title]         AS [Title1],
                 [Project1].[Text]          AS [Text],
                 [Project1].[PostedAt]      AS [PostedAt],
                 [Project1].[BlogId]        AS [BlogId],
                 [Project1].[UserId]        AS [UserId],
                 [Project1].[Id1]           AS [Id2],
                 [Project1].[Name]          AS [Name],
                 [Project1].[Email]         AS [Email],
                 [Project1].[HomePage]      AS [HomePage],
                 [Project1].[Ip]            AS [Ip],
                 [Project1].[Text1]         AS [Text1],
                 [Project1].[PostId]        AS [PostId],
                 CASE 
                   WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
                   ELSE CASE 
                          WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
                          ELSE 1
                        END
                 END AS [C2],
                 CASE 
                   WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
                   ELSE CASE 
                          WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
                          ELSE 1
                        END
                 END AS [C3],
                 [Project1].[C1]            AS [C4]
          FROM   [dbo].[Blogs] AS [Extent1]
                 LEFT OUTER JOIN (SELECT [Extent2].[Id]       AS [Id],
                                         [Extent2].[Title]    AS [Title],
                                         [Extent2].[Text]     AS [Text],
                                         [Extent2].[PostedAt] AS [PostedAt],
                                         [Extent2].[BlogId]   AS [BlogId],
                                         [Extent2].[UserId]   AS [UserId],
                                         [Extent3].[Id]       AS [Id1],
                                         [Extent3].[Name]     AS [Name],
                                         [Extent3].[Email]    AS [Email],
                                         [Extent3].[HomePage] AS [HomePage],
                                         [Extent3].[Ip]       AS [Ip],
                                         [Extent3].[Text]     AS [Text1],
                                         [Extent3].[PostId]   AS [PostId],
                                         1                    AS [C1]
                                  FROM   [dbo].[Posts] AS [Extent2]
                                         LEFT OUTER JOIN [dbo].[Comments] AS [Extent3]
                                           ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1]
                   ON [Extent1].[Id] = [Project1].[BlogId]
          WHERE  1 = [Extent1].[Id]) AS [Project2]
ORDER BY [Project2].[Id] ASC,
         [Project2].[C4] ASC,
         [Project2].[Id1] ASC,
         [Project2].[C3] ASC

If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.

What about NHibernate? The following code:

var blogs = s.CreateQuery(
    @"from Blog b 
        left join fetch b.Posts p 
        left join fetch p.Comments 
    where b.Id = :id")
    .SetParameter("id", 1)
    .List<Blog>();

Will generate a much saner statement:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       comments2_.Id         as Id2_2_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__,
       comments2_.Name       as Name2_2_,
       comments2_.Email      as Email2_2_,
       comments2_.HomePage   as HomePage2_2_,
       comments2_.Ip         as Ip2_2_,
       comments2_.Text       as Text2_2_,
       comments2_.PostId     as PostId2_2_,
       comments2_.PostId     as PostId1__,
       comments2_.Id         as Id1__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
       left outer join Comments comments2_
         on posts1_.Id = comments2_.PostId
where  blog0_.Id = 1 /* @p0 */

While this is a saner statement, it will also generate a Cartesian product. There are no two ways about it, this is bad bad bad bad.

And the way to do that is quite simple, don’t try to do it in a single query, instead, we can break it up into multiple queries, each loading just a part of the graph and rely on the Identity Map implementation to stitch the graph together.  You can read the post about it here. Doing this may require more work on your part, but it will end up being much faster, and it is also something that would be much easier to write, maintain and work with.