Tuesday, February 09, 2010
#
Performance optimizations, managed code and leaky abstractions
I run into this post from Jeff Atwood, talking about the performance difference between managed and unmanaged code:

There were a lot of optimizations for this along the way, but the C++ version has soundly beaten the C# version. As expected, right?
Well, yes, but with extenuating circumstances.
So am I ashamed by my crushing defeat? Hardly. The managed code achieved a very good result for hardly any effort. To defeat the managed version, Raymond had to: - Write his own file/io stuff
- Write his own string class
- Write his own allocator
- Write his own international mapping
Of course he used available lower level libraries to do this, but that's still a lot of work. Can you call what's left an STL program? I don't think so, I think he kept the std::vector class which ultimately was never a problem and he kept the find function. Pretty much everything else is gone.
So, yup, you can definitely beat the CLR. I think Raymond can make his program go even faster.
I find this interesting, because it isn’t really specific for C++, in my recent performance sprint for the profiler, I had to:
- Write my own paging system
- Write my own string parsing routines
- Write my own allocator
For the most part, performance optimizations fall into four categories:
- Inefficient algorithms – O(N) notation, etc.
- Inefficient execution – not applying caching, doing too much work upfront, doing unneeded work.
- I/O Bound – the execution waits for a file, database, socket, etc.
- CPU Bound – it just takes a lot of calculations to get the result.
I can think of very few problems that are really CPU Bounded, they tend to be very specific and small. And those are just about the only ones that’ll gain any real benefit from a faster code. Of course, in pure math scenarios, which is pretty much where most of the CPU Bound code reside, there isn’t much of a difference between the language that you choose (assuming it is not interpreted, at least, and that you can run directly on the CPU using native instructions). But as I said, those are pretty rare.
In nearly all cases, you’ll find that the #1 cause for perf issues is IO. Good IO strategies (buffering, pre-loading, lazy loading, etc) are usually applicable for specific scenarios, but they are the ones that will make a world of difference between poorly performing code and highly performing code. Caching can also make a huge difference, as well as differing work to when it is actually needed.
I intentionally kept the “optimize the algorithm” for last, because while it can have drastic performance difference, it is also the easiest to do, since there is so much information about it, assuming that you didn’t accidently got yourself into an O(N^2) or worse.
Monday, February 08, 2010
#
Say hello to Uber Prof
I got several requests for this, so I am making Uber Prof itself available for purchasing.
What is Uber Prof?
It is a short hand way of saying: All the OR/M profilers that we make.
An Uber Prof license gives you the ability to use:
And it will automatically give you the ability to use any additional profilers that we will create. And yes, there is an upgrade path if you already purchased a single profiler license and would like to upgrade to Uber Prof.
Sunday, February 07, 2010
#
A marketing mistake: WCF Data Services & WCF RIA Services
There are some things that I just don’t understand, and the decision to name two apparently different technologies working in the same area using those two names is one of them.
The image on the right is from In Search Of Stupidity, an excellent and funny book, which talks about a lot of marketing mistakes that software, dedicate a whole chapter for this error.
Coming back to WCF Data Services & WCF RIA Services, I read this page, and I am still confused. It appears that the major difference in that RIA services will generate the Silverlight client classes as part of the build process, where as using Data Services this is a separate process.
And I am not sure even of that.
Saturday, February 06, 2010
#
Traditional architecture makes me flinch
I just finished drawing the following:
It makes me feel dirty inside, to do so. Mostly because I really don’t like or believe in building applications in this manner anymore. I would really like to be able to do this:
Unfortunately, I am talking about another subject in the context where I am showing the first architectural diagram, and I need to present only a single new concept at a time.
Friday, February 05, 2010
#
If you are way off in the deep end, there is only so much that tooling can do for you
I get a lot of requests for what I term, the regex problem. Why the regex problem?
Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. — Jamie Zawinski in comp.lang.emacs.
A case in point, which comes up repeatedly, is this question:
Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?
In this case, we have someone who wants to load a blog, all its posts, and all its comments, and do it in the most efficient manner possible. At the same time, they want to have the tool handle that for them.
Let us take a look at how two different OR/Ms handle this task, then discuss what an optimal solution is.
First, Entity Framework, using this code:
db.Blogs
.Include("Posts")
.Include("Posts.Comments")
.Where(x => x.Id == 1)
.ToList();
This code will generate:
SELECT [Project2].[Id] AS [Id],
[Project2].[Title] AS [Title],
[Project2].[Subtitle] AS [Subtitle],
[Project2].[AllowsComments] AS [AllowsComments],
[Project2].[CreatedAt] AS [CreatedAt],
[Project2].[C1] AS [C1],
[Project2].[C4] AS [C2],
[Project2].[Id1] AS [Id1],
[Project2].[Title1] AS [Title1],
[Project2].[Text] AS [Text],
[Project2].[PostedAt] AS [PostedAt],
[Project2].[BlogId] AS [BlogId],
[Project2].[UserId] AS [UserId],
[Project2].[C3] AS [C3],
[Project2].[C2] AS [C4],
[Project2].[Id2] AS [Id2],
[Project2].[Name] AS [Name],
[Project2].[Email] AS [Email],
[Project2].[HomePage] AS [HomePage],
[Project2].[Ip] AS [Ip],
[Project2].[Text1] AS [Text1],
[Project2].[PostId] AS [PostId]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title],
[Extent1].[Subtitle] AS [Subtitle],
[Extent1].[AllowsComments] AS [AllowsComments],
[Extent1].[CreatedAt] AS [CreatedAt],
1 AS [C1],
[Project1].[Id] AS [Id1],
[Project1].[Title] AS [Title1],
[Project1].[Text] AS [Text],
[Project1].[PostedAt] AS [PostedAt],
[Project1].[BlogId] AS [BlogId],
[Project1].[UserId] AS [UserId],
[Project1].[Id1] AS [Id2],
[Project1].[Name] AS [Name],
[Project1].[Email] AS [Email],
[Project1].[HomePage] AS [HomePage],
[Project1].[Ip] AS [Ip],
[Project1].[Text1] AS [Text1],
[Project1].[PostId] AS [PostId],
CASE
WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
ELSE CASE
WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END
END AS [C2],
CASE
WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
ELSE CASE
WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END
END AS [C3],
[Project1].[C1] AS [C4]
FROM [dbo].[Blogs] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id],
[Extent2].[Title] AS [Title],
[Extent2].[Text] AS [Text],
[Extent2].[PostedAt] AS [PostedAt],
[Extent2].[BlogId] AS [BlogId],
[Extent2].[UserId] AS [UserId],
[Extent3].[Id] AS [Id1],
[Extent3].[Name] AS [Name],
[Extent3].[Email] AS [Email],
[Extent3].[HomePage] AS [HomePage],
[Extent3].[Ip] AS [Ip],
[Extent3].[Text] AS [Text1],
[Extent3].[PostId] AS [PostId],
1 AS [C1]
FROM [dbo].[Posts] AS [Extent2]
LEFT OUTER JOIN [dbo].[Comments] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1]
ON [Extent1].[Id] = [Project1].[BlogId]
WHERE 1 = [Extent1].[Id]) AS [Project2]
ORDER BY [Project2].[Id] ASC,
[Project2].[C4] ASC,
[Project2].[Id1] ASC,
[Project2].[C3] ASC
If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.
What about NHibernate? The following code:
var blogs = s.CreateQuery(
@"from Blog b
left join fetch b.Posts p
left join fetch p.Comments
where b.Id = :id")
.SetParameter("id", 1)
.List<Blog>();
Will generate a much saner statement:
select blog0_.Id as Id7_0_,
posts1_.Id as Id0_1_,
comments2_.Id as Id2_2_,
blog0_.Title as Title7_0_,
blog0_.Subtitle as Subtitle7_0_,
blog0_.AllowsComments as AllowsCo4_7_0_,
blog0_.CreatedAt as CreatedAt7_0_,
posts1_.Title as Title0_1_,
posts1_.Text as Text0_1_,
posts1_.PostedAt as PostedAt0_1_,
posts1_.BlogId as BlogId0_1_,
posts1_.UserId as UserId0_1_,
posts1_.BlogId as BlogId0__,
posts1_.Id as Id0__,
comments2_.Name as Name2_2_,
comments2_.Email as Email2_2_,
comments2_.HomePage as HomePage2_2_,
comments2_.Ip as Ip2_2_,
comments2_.Text as Text2_2_,
comments2_.PostId as PostId2_2_,
comments2_.PostId as PostId1__,
comments2_.Id as Id1__
from Blogs blog0_
left outer join Posts posts1_
on blog0_.Id = posts1_.BlogId
left outer join Comments comments2_
on posts1_.Id = comments2_.PostId
where blog0_.Id = 1 /* @p0 */
While this is a saner statement, it will also generate a Cartesian product. There are no two ways about it, this is bad bad bad bad.
And the way to do that is quite simple, don’t try to do it in a single query, instead, we can break it up into multiple queries, each loading just a part of the graph and rely on the Identity Map implementation to stitch the graph together. You can read the post about it here. Doing this may require more work on your part, but it will end up being much faster, and it is also something that would be much easier to write, maintain and work with.
Thursday, February 04, 2010
#
What happens behind the scenes: NHibernate, Linq to SQL, Entity Framework scenario analysis
One of the things that I began doing since starting to work on multiple OR/M Profilers is to compare how all of them are handling a particular task. This is by no means a comparative analysis, but it is an interesting data point.
The scenario in question is loading a blog with all its posts and comments.
Let us start with NHibernate:
var blogs = s.CreateQuery(
@"from Blog b
left join fetch b.Posts p
left join fetch p.Comments
where b.Id = :id")
.SetParameter("id", 1)
.List<Blog>();
Will generate the following SQL
select blog0_.Id as Id7_0_,
posts1_.Id as Id0_1_,
comments2_.Id as Id2_2_,
blog0_.Title as Title7_0_,
blog0_.Subtitle as Subtitle7_0_,
blog0_.AllowsComments as AllowsCo4_7_0_,
blog0_.CreatedAt as CreatedAt7_0_,
posts1_.Title as Title0_1_,
posts1_.Text as Text0_1_,
posts1_.PostedAt as PostedAt0_1_,
posts1_.BlogId as BlogId0_1_,
posts1_.UserId as UserId0_1_,
posts1_.BlogId as BlogId0__,
posts1_.Id as Id0__,
comments2_.Name as Name2_2_,
comments2_.Email as Email2_2_,
comments2_.HomePage as HomePage2_2_,
comments2_.Ip as Ip2_2_,
comments2_.Text as Text2_2_,
comments2_.PostId as PostId2_2_,
comments2_.PostId as PostId1__,
comments2_.Id as Id1__
from Blogs blog0_
left outer join Posts posts1_
on blog0_.Id = posts1_.BlogId
left outer join Comments comments2_
on posts1_.Id = comments2_.PostId
where blog0_.Id = 1 /* @p0 */
This result in a fairly simple query plan:
However, you should note that this also result in a Cartesian product, which may not be what you wanted.
Linq to SQL doesn’t really provide a good way to express what I wanted, but it does get the job done:
var dataLoadOptions = new DataLoadOptions();
dataLoadOptions.LoadWith<Blog>(x => x.Posts);
dataLoadOptions.LoadWith<Post>(x => x.Comments);
using (var db = new BlogModelDataContext(conStr)
{
LoadOptions = dataLoadOptions
})
{
db.Blogs.Where(x => x.Id == 1).ToList();
}
Interestingly enough, this does not generate a single query, but two queries:
-- statement #1
SELECT [t0].[Id],
[t0].[Title],
[t0].[Subtitle],
[t0].[AllowsComments],
[t0].[CreatedAt]
FROM [dbo].[Blogs] AS [t0]
WHERE [t0].[Id] = 1 /* @p0 */
-- statement #2
SELECT [t0].[Id],
[t0].[Title],
[t0].[Text],
[t0].[PostedAt],
[t0].[BlogId],
[t0].[UserId],
[t1].[Id] AS [Id2],
[t1].[Name],
[t1].[Email],
[t1].[HomePage],
[t1].[Ip],
[t1].[Text] AS [Text2],
[t1].[PostId],
(SELECT COUNT(* )
FROM [dbo].[Comments] AS [t2]
WHERE [t2].[PostId] = [t0].[Id]) AS [value]
FROM [dbo].[Posts] AS [t0]
LEFT OUTER JOIN [dbo].[Comments] AS [t1]
ON [t1].[PostId] = [t0].[Id]
WHERE [t0].[BlogId] = 1 /* @x1 */
ORDER BY [t0].[Id],
[t1].[Id]
The interesting bit is that while there are two queries here, this method does not generate a Cartesian product, so I have to consider this a plus. What I would like to know is whatever this is intentionally so or just a result of the way Linq to SQL eager loading is structured.
The query plan for this is simple as well:
Finally, Entity Framework (both 3.5 and 4.0), using this code:
db.Blogs
.Include("Posts")
.Include("Posts.Comments")
.Where(x => x.Id == 1)
.ToList();
This code will generate:
SELECT [Project2].[Id] AS [Id],
[Project2].[Title] AS [Title],
[Project2].[Subtitle] AS [Subtitle],
[Project2].[AllowsComments] AS [AllowsComments],
[Project2].[CreatedAt] AS [CreatedAt],
[Project2].[C1] AS [C1],
[Project2].[C4] AS [C2],
[Project2].[Id1] AS [Id1],
[Project2].[Title1] AS [Title1],
[Project2].[Text] AS [Text],
[Project2].[PostedAt] AS [PostedAt],
[Project2].[BlogId] AS [BlogId],
[Project2].[UserId] AS [UserId],
[Project2].[C3] AS [C3],
[Project2].[C2] AS [C4],
[Project2].[Id2] AS [Id2],
[Project2].[Name] AS [Name],
[Project2].[Email] AS [Email],
[Project2].[HomePage] AS [HomePage],
[Project2].[Ip] AS [Ip],
[Project2].[Text1] AS [Text1],
[Project2].[PostId] AS [PostId]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title],
[Extent1].[Subtitle] AS [Subtitle],
[Extent1].[AllowsComments] AS [AllowsComments],
[Extent1].[CreatedAt] AS [CreatedAt],
1 AS [C1],
[Project1].[Id] AS [Id1],
[Project1].[Title] AS [Title1],
[Project1].[Text] AS [Text],
[Project1].[PostedAt] AS [PostedAt],
[Project1].[BlogId] AS [BlogId],
[Project1].[UserId] AS [UserId],
[Project1].[Id1] AS [Id2],
[Project1].[Name] AS [Name],
[Project1].[Email] AS [Email],
[Project1].[HomePage] AS [HomePage],
[Project1].[Ip] AS [Ip],
[Project1].[Text1] AS [Text1],
[Project1].[PostId] AS [PostId],
CASE
WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
ELSE CASE
WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END
END AS [C2],
CASE
WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
ELSE CASE
WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END
END AS [C3],
[Project1].[C1] AS [C4]
FROM [dbo].[Blogs] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id],
[Extent2].[Title] AS [Title],
[Extent2].[Text] AS [Text],
[Extent2].[PostedAt] AS [PostedAt],
[Extent2].[BlogId] AS [BlogId],
[Extent2].[UserId] AS [UserId],
[Extent3].[Id] AS [Id1],
[Extent3].[Name] AS [Name],
[Extent3].[Email] AS [Email],
[Extent3].[HomePage] AS [HomePage],
[Extent3].[Ip] AS [Ip],
[Extent3].[Text] AS [Text1],
[Extent3].[PostId] AS [PostId],
1 AS [C1]
FROM [dbo].[Posts] AS [Extent2]
LEFT OUTER JOIN [dbo].[Comments] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1]
ON [Extent1].[Id] = [Project1].[BlogId]
WHERE 1 = [Extent1].[Id]) AS [Project2]
ORDER BY [Project2].[Id] ASC,
[Project2].[C4] ASC,
[Project2].[Id1] ASC,
[Project2].[C3] ASC
The query plan for this seems overly complicated:
If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.
I am not going to offer commentary on the results, but open a discussion on them.
Mission of Honor is now available
I just finished listening to At All Costs for the second or third time, and now Mission of Honor, the next book in the Honor Harrington series is out!!!!
I have an article to finish and code to write, but don’t expect to hear from me much for the next day or two.
I LOVE Kindle, the best purchase I made in the last 5 years. It means that I can relax in bed and read a book I ordered two minutes ago!!!
And yes, as you can see by the !!!! I am excited.
Wednesday, February 03, 2010
#
Lessons learned from building NHibernate Profiler – 24th Feb, London
Along with the NHibernate course that I’ll be giving in London next month, I’ll be doing a free session about lessons learned from building the NHibernate Profiler.
I am going to talk about architecture, internal design (including showing off the code), distributed team, release per commit, making technical decisions based on business concerns, building real world application infrastructure, etc.
This is a free event, but the number of places is limited, so please register in advance.
Tuesday, February 02, 2010
#
DSLs in Boo discount offer
Manning is running a special today for my book, 50% off print & ebook versions.
Just use the following code: boo50
Note that this code is valid only for today, so hurry up!
It is less expensive to do it inefficiently!
This is a continuation of a twitter conversation that I had with Karl Seguin.
One of the problems for developers is that we tend to have a hard time distinguishing between the right thing from a technical perspective and the right thing from a business perspective.
One of the known issues with the profiler from the very start was the how to handle large amount of data. The profiler use to keep all data in-memory, which put a hard limit to how much data it can manage. The right solution would have been to build persistence into the profiler from the get-go. It would eliminate an entire class of problems, after all, and we knew that we would have to get there in the end.
It is also, quite incidentally, the wrong decision to make. By keeping everything in memory, we significantly reduced the complexity that we had to deal with during the development of the profiler. It let us concentrate on getting features out the door and get to the point where people actually pay for it.
That decision has later cost me about two weeks of complex coding, and I would do it again in a heartbeat. One very important thing to remember is that in a project, every action that you take has a certain ROI value. Spending those two weeks earlier in the game would mean that I wouldn’t be able to provide as much features, get the required feedback and actually get some money in so we could continue development.
Of special interest are those comments:
The problem with this approach is that it makes an erroneous assumption, that your time is free. This is erroneous assumption because even if you don’t pay yourself, your time represent an opportunity cost to implement some other feature.
Let us try playing with Karl’s number for a minute, okay? Let us take a low end outsourcing hourly rate as our example, 20$ per hour.
We have two solutions in front of us, one will take about a week to develop, but would be three times less efficient than the alternative, which would take four weeks to develop. Remember, in most cases, the inefficient solution is simpler. It takes a lot of thought and sometimes complexity to figure out how to do something in the most efficient way possible.
In both cases, we are talking about a two men team, so the total cost for the first solution is 2x40x20$ = 1,600$, which the cost for the second one is 6,400$. The difference is 4,800$. That is a lot of money.
Amazon EC2 Double Extra Large costs (sounds like a McDonlands order, doesn’t it?) are:
- 1.20$ / hour – Linux
- 1.44$ / hour – Windows
We will use the Windows number, since it is higher, and let us see what the results are. In the first solution, we consume 6 hours per day for this feature. Using the second one, we consume 2 hours per day. Let us plot the numbers over a period of two months, shall we?
This is a true no brainer, right? The more efficient solution (blue) handily thumps the less efficient (red) one. But what happens when we factor in the development cost?
Now the story looks far different, right?
It takes over 800 days for the more efficient solution to gain over the less efficient one. Think about this, 800 days in which the less efficient solution is less expensive. That is two years and two months.
Let us plug in more realistic numbers for the cost of labor, shall we? Let us say that the labor cost is 80$ per hour (which is still cheap). At that point, it would take close to 5 years.
Nitpicker corner:
- The inefficient solution has to be adequate, of course, if it isn’t, it isn’t considered.
- Even if the inefficient solution is an order of magnitude worse than the efficient one, it will still be cheaper for a year!
- Yes, I am ignoring the refactoring cost:
- Inefficient – 1 week
- Efficient – 4 weeks
- Inefficient now, refactoring in a year – 1 week now, 7 weeks in a year.
That would look something like this:
But, and this is important, after a year, we know whatever we:
- actually need it?
- can afford it?
I would choose the inefficient solution every time.
In other words, we have technical debt vs. monetary debt and time debt here. And in most cases, getting to the point when you have a cash flow is the most significant target you should approach.
Monday, February 01, 2010
#
My secret project: Alexandria
They say a picture is worth a thousand words:
Just to head off the obvious, this is a sample application, not a real world project.
Three brownie points to the first guy who figure out why I posted this, though.
Sunday, January 31, 2010
#
Linq to SQL Profiler release is upcoming
Following the tradition of choosing meaningful calendar dates (although the first few cases were accidentals) for my releases, the Linq to SQL Profiler will be released in a 1.0 version on the 14th February.
At that time, the beta discount will be discontinued, so hurry up and show Linq to SQL that you love it by buying the profiler.
Saturday, January 30, 2010
#
Linq to SQL Profiler Video
The guys from CodeSmith has just put out a sample video showing how to use Linq to SQL Profiler.
I love it!
Friday, January 29, 2010
#
What is the story behind the Entity Framework vs. NHibernate posts?
A while ago I posted several posts about EF vs. NH. They generated quite a bit of commentary, but while I enjoyed the discussion, I had an ulterior motive for doing so.
I wanted to do this as a way to do a comparative research about the actual features that people would like to see in NHibernate.
Thursday, January 28, 2010
#
NHibernate new feature: No proxy associations
About three weeks ago I introduced the problem of ghost objects in NHibernate. In short, given the following model:

This code will not produce the expected result:
var comment = s.Get<Comment>(8454);
if(comment.Post is Article)
{
//
}
You can check the actual post for the details, it related to proxying and when NHibernate decides to load a lazy loaded instance. In short, however, comment.Post is a lazy loaded object, and NHibernate, at this point in time, has no idea what it is. But since it must return something, it returns a proxy of Post, which will load the actual instance when needed. That leads to some problems when you want to down cast the value.
Well, I got fed up with explaining about this and set about to fix the issue. NHibernate now contains the following option:
<many-to-one name="Post" lazy="no-proxy"/>
When lazy is set to no-proxy, the following things happen:
- The association is still lazy loaded (note that in older versions of NHibernate, setting it to no-proxy would trigger eager loading, this is no longer the case).
- The first time that you access the property the value will be loaded from the database, and the actual type will be returned.
In short, this should completely resolve the issue.
However, not the key phrase here, like lazy properties, this work by intercepting the property load, so if you want to take advantage of this feature you should use the property to access the value.
Wednesday, January 27, 2010
#
NHibernate new feature: Lazy Properties
This feature is now available on the NHibernate trunk. Please note that it is currently only available when using the Castle Proxy Factory.
Lazy properties is a very simple feature. Let us go back to my usual blog example, and take a look at the Post entity:
As you can see, it is pretty simple example, but we have a problem. The Text property may contain a lot of text, and we don’t want to load that unless we explicitly asks for it.
If we would try to execute this code:
var post = session.CreateQuery("from Post")
.SetMaxResults(1)
.UniqueResult<Post>();
You can see from the SQL that NHibernate will load the Text property. In large columns (text, images, etc), the cost of loading a column value is prohibitive, and should be avoided unless absolutely needed.
This new feature allows you to mark a specific property as lazy, like this:
<property name="Text" lazy="true"/>
Once that is done, we can try querying for posts:
var post = session.CreateQuery("from Post")
.SetMaxResults(1)
.UniqueResult<Post>();
System.Console.WriteLine(post.Text);
And the resulting SQL is going to be:
Note that we aren’t loading the Text property when we query for the post, and if we will inspect the stack trace of the second query we can see it being generated from the Console.WriteLine call.
But what if we want to query for posts with their Text property? Doing it this way may very well lead to SELECT N+1 if we need to load all the posts Text properties. NHibernate provide the HQL hint to allow this:
var post = session.CreateQuery("from Post fetch all properties")
.SetMaxResults(1)
.UniqueResult<Post>();
System.Console.WriteLine(post.Text);
Which will result in the following SQL:
What about multiple lazy properties? NHibernate support them, but you need to keep one thing in mind. NHibernate will load all the entity’s lazy properties, not just the one that was immediately accessed. By that same token, you can’t eagerly load just some of an entity’s lazy properties from HQL.
This feature is mostly meant for unique circumstances, such as Person.Image, Post.Text, etc. As usual, be cautious in over using it.
One last word of caution, this feature is implemented via property interception (and not field interception, like in Hibernate). That was a conscious decision, because we didn’t want to add a bytecode weaving requirement to NHibernate. What this means is that if you mark a property as lazy, it must be a virtual automatic property. If you attempt to access the underlying field value, instead of going through the property, you will circumvent the lazy loading of the property, and may get unexpected results.
Tuesday, January 26, 2010
#
Profiler New Feature: Side by Side diff
The profiler could do session diffs (showing the difference between executed statements between two sessions) for a while now, but we got some requests for changing it to follow a more traditional source control diff style.
This is now done, and it should make it easier to understand the changes between two sessions:

Monday, January 25, 2010
#
Encapsulation is the enemy of the user interface
I got this question a while ago from Kyle, and I think is is a great one. It is especially great since it is an exchange of emails that resulted in the following (all of which are Kyle words):
I've been annoyed lately by the MVVM pattern. It seems like it requires that the data on your business classes be public so that the view-model can get at it, and that completely breaks encapsulation and goes against standard OO design theory (in my opinion).
The UI layer should be allowed to reference the data layer. I recalled a post you wrote where your UI needs to basically pull things out of queries and such directly (that's what I understood it to mean, anyway). I'm not sure how to pull this off easily just yet, because it seems like it would still break encapsulation somewhere down the line, but it's an interesting thought.
And yeah, I realized after sending the email about CQS. I've decided that my preferred way is actually having my model be able to create a view-model. It's still not pretty, but it's much better (in my view) than having all public data on my business models. I can use commands to bind directly to the model, and the view-model can cause that to happen correctly.
I thought about CQS more, and have a really nice way of doing the whole shebang, I think. It does kind of use your "Two different models for read vs write" concept. I've even come up with a little pseudo-enterprisey application to write using this design style. You'll like it - it's a Netflix for books [[netflix for books is a library]], essentially.
My answer to that is that Kyle is correct. On the one hand, we have the needs of the UI to show information, and on the other hand, we want to have good encapsulation for our business entities. UI forces us to expose information to the user, and that encourages properties laden models. The problem with this approach is that often we try to make use of the same model for several tasks, such as using business entities for user interface, or even asking the business entities to generate the view models that they represent.
CQS is a design methodology that is aimed at resolving this conflict, at its heart, it is actually very simple. It simply stipulate that you are going to have two different models for representing it. One for reads (queries) and another for writes (commands). Once we accept that, we can see that we can evolve each of those models independently. And then we get to the point where we see that the data storage mechanism that we use for each model can be optimize independently for each use case.
For example, when using commands, we generally perform lookups by primary key alone, so we can avoid the overhead of indexes, or even select a storage format that is suitable for key based lookups (DHT, for example) while updating the query data store as a background process which allow the entire system to stay stable under high degree of stress.
In other words, once we have split the responsibilities of the system up so we don’t overload the responsibilities of a single model to be both read and write capable, we are in a much better position to shape the way we handle our software.
Sunday, January 24, 2010
#
The paradox of choice: best of breed or cheapest of the bunch
Roy Osherove has a few tweets about commercial tools vs. free ones in the .NET space. I’ll let his tweets serve as the background story for this post:
The backdrop is that Roy seems to be frustrated with the lack of adoption of what he considers to be better tools if there are free tools that deal with the same problem even if they are inferior to the commercial tools. The example that he uses is Final Builder vs. NAnt/Rake.
As someone who is writing both commercial and free tools, I am obviously very interested in both sides of the argument. I am going to accept, for the purpose of the argument, that the commercial tool X does more than the free tool Y who deals with the same problem. Now, let us see what the motivations are for picking either one of those.
With a free tool, you can (usually) download it and start playing around with it immediately. With commercial products, you need to pay (usually after the trail is over), which means that in most companies, you need to justify yourself to someone, get approval, and generally deal with things that you would rather not do. In other words, the barrier for entry is significantly higher for commercial products. I actually did the math a while ago, and the conclusion was that good commercial products usually pay for themselves in a short amount of time.
But, when you have a free tool in the same space, the question becomes more complex. Roy seems to think that if the commercial product does more than the free one, you should prefer it. My approach is slightly different. I think that if the commercial product solves a pain point or remove friction that you encounter with the free product, you should get it.
Let us go back to Final Builder vs. NAnt. Let us say that it is going to take me 2 hours to setup a build using Final Builder and 8 hours to setup the same build using NAnt. It seems obvious that Final Builder is the better choice, right? But if I have to spend 4 hours to justify buying Final Builder, the numbers are drastically different. And that is a conservative estimate.
Worse, let us say that I am an open minded guy that have used NAnt in the past. I know that it would take ~8 hours to setup the build using NAnt, and I am pretty sure that I can find a better tool to do the work. However, doing a proper evaluation of all the build tools out there is going to take three weeks. Can I really justify that to my client?
As the author of a commercial product, it is my duty to make sure that people are aware that I am going to fix their pain points. If I have a product that is significantly better than a free product, but isn’t significantly better at reducing pain, I am not going to succeed. The target in the product design (and later in the product marketing) is to identify and resolve pain points for the user.
Another point that I want to bring up is the importance of professional networks to bring information to us. No one can really keep track on all the things that are going on in the industry, and I have come to rely more & more on the opinions of the people in my social network to evaluate and consider alternatives in areas that aren’t offering acute pain. That allows me to be on top of things and learn what is going on at an “executive brief” level. That allows me to concentrate on the things that are acute to me, knowing the other people running into other problems will explore other areas and bring their results to my attention.
Saturday, January 23, 2010
#
DSLs in Boo is out!
It has been quite a journey for me, starting in 2007(!) up until about a month ago, when the final revision is out. I am very happy to announce that my book is now available in its final form.
When I actually got the book in my hands I was ecstatic. That represent about two years worth of work, and some pretty tough hurdles to cross (think about the challenge that editing something the size of a book from my English is). And getting the content right was even harder.
On the one hand, I wanted to write something that is actionable, my success criteria for the book is that after reading it, you can go ahead and write production worthy Domain Specific Languages implementations. On the other hand, I didn’t want to have the reader left without the theoretical foundation that is required to understand what is actually going on.
Looking back at this, I think that I managed to get that done well enough. The total page count is ~350 pages, and without the index & appendixes, it is just about 300 pages. Which, I hope, is big enough to give you working knowledge without bogging you down with too much theory.
Friday, January 22, 2010
#
Rejecting Dependency Injection Inversion
Uncle Bob has a post about why you should limit your use of IoC containers. I read that post with something very close to trepidation, because the first example that I saw told me a lot about the underlying assumptions made when this post was written.
Just to give you an idea about how many problems there are with this example when you want to talk about IoC in general, I made a small (albeit incomplete) list:
- The example is a class that has two dependencies, who themselves has no dependencies.
- There is manual mapping between services and their implementations.
- All services share the same life span.
- The container is used using the Service Locator pattern.
Now, moving to the concrete parts of the post, I mostly agree that this is an anti pattern, but not because of the code is using IoC. The code is actually misusing it quite badly, and trying to draw conclusions about the practice of IoC from that sample (or similar to that) is like saying that we should abolish SQL because of an example using string concatenation has security issues.
I am not really sure about the practices of IoC usage in the Java side, but on the .NET world, that sort of code is frowned upon for at least 4 or 5 years. The .Net IoC community has been very loud about how you should use an IoC. We have been saying for a long time that the appropriate place to get instances from the IoC is deep in the bowels of the application infrastructure. A good example of that is using ASP.Net MVC Controller Factory, that is the only place in the application that will make use of the container directly.
Now, that takes care of the direct dependency on the container, let us talk about a dependency graph that has more than a single level to it. Here is something that is still fairly simplistic:
I colored all the things that share the same instance and those that do not. Trying to keep track of those manually, or through factories, would be a pure nightmare. Just try to imagine just how much code you are going to need to do that.
Furthermore, what about when we have different life spans for different components (logger is singleton, database is per request, tracking service is per session, etc). At this point you raise the complexity of the hand rolled solution by an order of magnitude once again. Using an IoC, on the other hand, means that you just need to configure things properly.
Which leads me to the next issue, manually mapping between services and their implementation is something that we more or less stopped doing circa 2006. All containers in the .Net space supports some form of auto registration, which means that usually we don’t have to do anything to get things working.
As I said, I am not really sure what the status is on the Java world, but I have to say that while the issues that Uncle Bob pointed out in the post are real, the root cause isn’t the use of IoC, it is the example he was working with. And if this is a typical example of IoC usage in the Java world, then he should peek over the fence to see how IoC is commonly implemented in the .Net space.
Thursday, January 21, 2010
#
Core NHibernate Course in London, 24th February
Well, it is about that time again :-)
In about a month I’ll be returning to the UK to give another round of my NHibernate Course. It has been a while since I gave that in London, but the previous two runs were very successful, and I had great time teaching it.
This course is meant to give you working knowledge how to effectively use NHibernate in your applications, based on real world expertise.
You can register here: http://skillsmatter.com/course/open-source-dot-net/core-persistence-with-nhibernate
Stupid support emails, #4
I am having a friendly competition with a friend about the stupidest support questions that we get from random people we never met. I posted about this previous, but I really can’t resist posting the content of something that I just received, it is either that or figure out how to send a nuke via email.
Hi,
Im f[removed] from Malaysia…. I has look at ur website about SOA. Did u do SOA application? Actually im a new learner, my knowledge are 0 about programming. I take an e-commerce course, at here I just do like a practical for 6 month and now I need to do on SOA. If u can help, I need some information about SOA, how to integrate the application, service and database. I help u can help me…
Thank you,
Best Regards;
F[removed]
This is the actual email text, I merely removed this guy name.
Therefore, I unilaterally declare myself the winner of the stupidest support emails contest.
When the design violates the principle of least surprise, you don’t close it as By Design
I don’t actually have an opinion about the actual feature, but I felt that I just have to comment on this post, from Brad Wilson, about the [Required] attribute in ASP.Net MVC 2.
Approximately once every 21.12 seconds, someone will ask this question on the ASP.NET MVC forums
…
The answer is the title of this blog post. ([Required] Doesn’t Mean What You Think It Does)
If this is the case, I have to say that the design of [Required] is misleading, and should be change to match the expectations of the users.
We have a pretty common case of plenty of users finding this behavior problematic, the answer isn’t to try to educate the users, the answer is to fix the design so it isn’t misleading.
I am pretty sure that when the spec for the feature was written, it made sense, but that doesn’t mean that it works in the real world. I think it should either be fixed, or removed. Leaving this in would be a constant tripwire that people will fall into.
Sunday, January 17, 2010
#
Army Reserve Duty
I’m currently on the way to several days of Army Reserve Duty, with limited to none internet connectivity.
I am making this announcement because the last few times I dropped offline for some time people started speculating that I am dead, which I sort of resent.