If you are way off in the deep end, there is only so much that tooling can do for you
I get a lot of requests for what I term, the regex problem. Why the regex problem?
Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. — Jamie Zawinski in comp.lang.emacs.
A case in point, which comes up repeatedly, is this question:
Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?
In this case, we have someone who wants to load a blog, all its posts, and all its comments, and do it in the most efficient manner possible. At the same time, they want to have the tool handle that for them.
Let us take a look at how two different OR/Ms handle this task, then discuss what an optimal solution is.
First, Entity Framework, using this code:
db.Blogs
.Include("Posts")
.Include("Posts.Comments")
.Where(x => x.Id == 1)
.ToList();
This code will generate:
SELECT [Project2].[Id] AS [Id],
[Project2].[Title] AS [Title],
[Project2].[Subtitle] AS [Subtitle],
[Project2].[AllowsComments] AS [AllowsComments],
[Project2].[CreatedAt] AS [CreatedAt],
[Project2].[C1] AS [C1],
[Project2].[C4] AS [C2],
[Project2].[Id1] AS [Id1],
[Project2].[Title1] AS [Title1],
[Project2].[Text] AS [Text],
[Project2].[PostedAt] AS [PostedAt],
[Project2].[BlogId] AS [BlogId],
[Project2].[UserId] AS [UserId],
[Project2].[C3] AS [C3],
[Project2].[C2] AS [C4],
[Project2].[Id2] AS [Id2],
[Project2].[Name] AS [Name],
[Project2].[Email] AS [Email],
[Project2].[HomePage] AS [HomePage],
[Project2].[Ip] AS [Ip],
[Project2].[Text1] AS [Text1],
[Project2].[PostId] AS [PostId]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title],
[Extent1].[Subtitle] AS [Subtitle],
[Extent1].[AllowsComments] AS [AllowsComments],
[Extent1].[CreatedAt] AS [CreatedAt],
1 AS [C1],
[Project1].[Id] AS [Id1],
[Project1].[Title] AS [Title1],
[Project1].[Text] AS [Text],
[Project1].[PostedAt] AS [PostedAt],
[Project1].[BlogId] AS [BlogId],
[Project1].[UserId] AS [UserId],
[Project1].[Id1] AS [Id2],
[Project1].[Name] AS [Name],
[Project1].[Email] AS [Email],
[Project1].[HomePage] AS [HomePage],
[Project1].[Ip] AS [Ip],
[Project1].[Text1] AS [Text1],
[Project1].[PostId] AS [PostId],
CASE
WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
ELSE CASE
WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END
END AS [C2],
CASE
WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
ELSE CASE
WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END
END AS [C3],
[Project1].[C1] AS [C4]
FROM [dbo].[Blogs] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id],
[Extent2].[Title] AS [Title],
[Extent2].[Text] AS [Text],
[Extent2].[PostedAt] AS [PostedAt],
[Extent2].[BlogId] AS [BlogId],
[Extent2].[UserId] AS [UserId],
[Extent3].[Id] AS [Id1],
[Extent3].[Name] AS [Name],
[Extent3].[Email] AS [Email],
[Extent3].[HomePage] AS [HomePage],
[Extent3].[Ip] AS [Ip],
[Extent3].[Text] AS [Text1],
[Extent3].[PostId] AS [PostId],
1 AS [C1]
FROM [dbo].[Posts] AS [Extent2]
LEFT OUTER JOIN [dbo].[Comments] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1]
ON [Extent1].[Id] = [Project1].[BlogId]
WHERE 1 = [Extent1].[Id]) AS [Project2]
ORDER BY [Project2].[Id] ASC,
[Project2].[C4] ASC,
[Project2].[Id1] ASC,
[Project2].[C3] ASC
If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.
What about NHibernate? The following code:
var blogs = s.CreateQuery(
@"from Blog b
left join fetch b.Posts p
left join fetch p.Comments
where b.Id = :id")
.SetParameter("id", 1)
.List<Blog>();
Will generate a much saner statement:
select blog0_.Id as Id7_0_,
posts1_.Id as Id0_1_,
comments2_.Id as Id2_2_,
blog0_.Title as Title7_0_,
blog0_.Subtitle as Subtitle7_0_,
blog0_.AllowsComments as AllowsCo4_7_0_,
blog0_.CreatedAt as CreatedAt7_0_,
posts1_.Title as Title0_1_,
posts1_.Text as Text0_1_,
posts1_.PostedAt as PostedAt0_1_,
posts1_.BlogId as BlogId0_1_,
posts1_.UserId as UserId0_1_,
posts1_.BlogId as BlogId0__,
posts1_.Id as Id0__,
comments2_.Name as Name2_2_,
comments2_.Email as Email2_2_,
comments2_.HomePage as HomePage2_2_,
comments2_.Ip as Ip2_2_,
comments2_.Text as Text2_2_,
comments2_.PostId as PostId2_2_,
comments2_.PostId as PostId1__,
comments2_.Id as Id1__
from Blogs blog0_
left outer join Posts posts1_
on blog0_.Id = posts1_.BlogId
left outer join Comments comments2_
on posts1_.Id = comments2_.PostId
where blog0_.Id = 1 /* @p0 */
While this is a saner statement, it will also generate a Cartesian product. There are no two ways about it, this is bad bad bad bad.
And the way to do that is quite simple, don’t try to do it in a single query, instead, we can break it up into multiple queries, each loading just a part of the graph and rely on the Identity Map implementation to stitch the graph together. You can read the post about it here. Doing this may require more work on your part, but it will end up being much faster, and it is also something that would be much easier to write, maintain and work with.