I get a lot of requests for what I term, the regex problem. Why the regex problem?
Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. — Jamie Zawinski in comp.lang.emacs.
A case in point, which comes up repeatedly, is this question:
Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?
In this case, we have someone who wants to load a blog, all its posts, and all its comments, and do it in the most efficient manner possible. At the same time, they want to have the tool handle that for them.
Let us take a look at how two different OR/Ms handle this task, then discuss what an optimal solution is.
First, Entity Framework, using this code:
db.Blogs
.Include("Posts")
.Include("Posts.Comments")
.Where(x => x.Id == 1)
.ToList();
This code will generate:
SELECT [Project2].[Id] AS [Id], [Project2].[Title] AS [Title], [Project2].[Subtitle] AS [Subtitle], [Project2].[AllowsComments] AS [AllowsComments], [Project2].[CreatedAt] AS [CreatedAt], [Project2].[C1] AS [C1], [Project2].[C4] AS [C2], [Project2].[Id1] AS [Id1], [Project2].[Title1] AS [Title1], [Project2].[Text] AS [Text], [Project2].[PostedAt] AS [PostedAt], [Project2].[BlogId] AS [BlogId], [Project2].[UserId] AS [UserId], [Project2].[C3] AS [C3], [Project2].[C2] AS [C4], [Project2].[Id2] AS [Id2], [Project2].[Name] AS [Name], [Project2].[Email] AS [Email], [Project2].[HomePage] AS [HomePage], [Project2].[Ip] AS [Ip], [Project2].[Text1] AS [Text1], [Project2].[PostId] AS [PostId] FROM (SELECT [Extent1].[Id] AS [Id], [Extent1].[Title] AS [Title], [Extent1].[Subtitle] AS [Subtitle], [Extent1].[AllowsComments] AS [AllowsComments], [Extent1].[CreatedAt] AS [CreatedAt], 1 AS [C1], [Project1].[Id] AS [Id1], [Project1].[Title] AS [Title1], [Project1].[Text] AS [Text], [Project1].[PostedAt] AS [PostedAt], [Project1].[BlogId] AS [BlogId], [Project1].[UserId] AS [UserId], [Project1].[Id1] AS [Id2], [Project1].[Name] AS [Name], [Project1].[Email] AS [Email], [Project1].[HomePage] AS [HomePage], [Project1].[Ip] AS [Ip], [Project1].[Text1] AS [Text1], [Project1].[PostId] AS [PostId], CASE WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int) ELSE CASE WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END END AS [C2], CASE WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int) ELSE CASE WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END END AS [C3], [Project1].[C1] AS [C4] FROM [dbo].[Blogs] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id], [Extent2].[Title] AS [Title], [Extent2].[Text] AS [Text], [Extent2].[PostedAt] AS [PostedAt], [Extent2].[BlogId] AS [BlogId], [Extent2].[UserId] AS [UserId], [Extent3].[Id] AS [Id1], [Extent3].[Name] AS [Name], [Extent3].[Email] AS [Email], [Extent3].[HomePage] AS [HomePage], [Extent3].[Ip] AS [Ip], [Extent3].[Text] AS [Text1], [Extent3].[PostId] AS [PostId], 1 AS [C1] FROM [dbo].[Posts] AS [Extent2] LEFT OUTER JOIN [dbo].[Comments] AS [Extent3] ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1] ON [Extent1].[Id] = [Project1].[BlogId] WHERE 1 = [Extent1].[Id]) AS [Project2] ORDER BY [Project2].[Id] ASC, [Project2].[C4] ASC, [Project2].[Id1] ASC, [Project2].[C3] ASC
If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.
What about NHibernate? The following code:
var blogs = s.CreateQuery(
@"from Blog b
left join fetch b.Posts p
left join fetch p.Comments
where b.Id = :id")
.SetParameter("id", 1)
.List<Blog>();
Will generate a much saner statement:
select blog0_.Id as Id7_0_, posts1_.Id as Id0_1_, comments2_.Id as Id2_2_, blog0_.Title as Title7_0_, blog0_.Subtitle as Subtitle7_0_, blog0_.AllowsComments as AllowsCo4_7_0_, blog0_.CreatedAt as CreatedAt7_0_, posts1_.Title as Title0_1_, posts1_.Text as Text0_1_, posts1_.PostedAt as PostedAt0_1_, posts1_.BlogId as BlogId0_1_, posts1_.UserId as UserId0_1_, posts1_.BlogId as BlogId0__, posts1_.Id as Id0__, comments2_.Name as Name2_2_, comments2_.Email as Email2_2_, comments2_.HomePage as HomePage2_2_, comments2_.Ip as Ip2_2_, comments2_.Text as Text2_2_, comments2_.PostId as PostId2_2_, comments2_.PostId as PostId1__, comments2_.Id as Id1__ from Blogs blog0_ left outer join Posts posts1_ on blog0_.Id = posts1_.BlogId left outer join Comments comments2_ on posts1_.Id = comments2_.PostId where blog0_.Id = 1 /* @p0 */
While this is a saner statement, it will also generate a Cartesian product. There are no two ways about it, this is bad bad bad bad.
And the way to do that is quite simple, don’t try to do it in a single query, instead, we can break it up into multiple queries, each loading just a part of the graph and rely on the Identity Map implementation to stitch the graph together. You can read the post about it here. Doing this may require more work on your part, but it will end up being much faster, and it is also something that would be much easier to write, maintain and work with.
