Ayende @ Rahien

Refunds available at head office

Linq to SQL Profiler release is upcoming

Following the tradition of choosing meaningful calendar dates (although the first few cases were accidentals) for my releases, the Linq to SQL Profiler will be released in a 1.0 version on the 14th February.

At that time, the beta discount will be discontinued, so hurry up and show Linq to SQL that you love it by buying the profiler.

NHibernate new feature: No proxy associations

About three weeks ago I introduced the problem of ghost objects in NHibernate. In short, given the following model:

image

This code will not produce the expected result:

var comment = s.Get<Comment>(8454);
if(comment.Post is Article)
{
   //
}

You can check the actual post for the details, it related to proxying and when NHibernate decides to load a lazy loaded instance. In short, however, comment.Post is a lazy loaded object, and NHibernate, at this point in time, has no idea what it is. But since it must return something, it returns a proxy of Post, which will load the actual instance when needed. That leads to some problems when you want to down cast the value.

Well, I got fed up with explaining about this and set about to fix the issue. NHibernate now contains the following option:

<many-to-one name="Post" lazy="no-proxy"/>

When lazy is set to no-proxy, the following things happen:

  • The association is still lazy loaded (note that in older versions of NHibernate, setting it to no-proxy would trigger eager loading, this is no longer the case).
  • The first time that you access the property the value will be loaded from the database, and the actual type will be returned.

In short, this should completely resolve the issue.

However, not the key phrase here, like lazy properties, this work by intercepting the property load, so if you want to take advantage of this feature you should use the property to access the value.

NHibernate new feature: Lazy Properties

This feature is now available on the NHibernate trunk. Please note that it is currently only available when using the Castle Proxy Factory.

Lazy properties is a very simple feature. Let us go back to my usual blog example, and take a look at the Post entity:

image

As you can see, it is pretty simple example, but we have a problem. The Text property may contain a lot of text, and we don’t want to load that unless we explicitly asks for it.

If we would try to execute this code:

var post = session.CreateQuery("from Post")
    .SetMaxResults(1)
    .UniqueResult<Post>();

You can see from the SQL that NHibernate will load the Text property. In large columns (text, images, etc), the cost of loading a column value is prohibitive, and should be avoided unless absolutely needed.

image

This new feature allows you to mark a specific property as lazy, like this:

<property name="Text" lazy="true"/>

Once that is done, we can try querying for posts:

var post = session.CreateQuery("from Post")
    .SetMaxResults(1)
    .UniqueResult<Post>();

System.Console.WriteLine(post.Text);

And the resulting SQL is going to be:

image

Note that we aren’t loading the Text property when we query for the post, and if we will inspect the stack trace of the second query we can see it being generated from the Console.WriteLine call.

But what if we want to query for posts with their Text property? Doing it this way may very well lead to SELECT N+1 if we need to load all the posts Text properties. NHibernate provide the HQL hint to allow this:

var post = session.CreateQuery("from Post fetch all properties")
    .SetMaxResults(1)
    .UniqueResult<Post>();

System.Console.WriteLine(post.Text);

Which will result in the following SQL:

image

What about multiple lazy properties? NHibernate support them, but you need to keep one thing in mind. NHibernate will load all the entity’s lazy properties, not just the one that was immediately accessed. By that same token, you can’t eagerly load just some of an entity’s lazy properties from HQL.

This feature is mostly meant for unique circumstances, such as Person.Image, Post.Text, etc. As usual, be cautious in over using it.

One last word of caution, this feature is implemented via property interception (and not field interception, like in Hibernate). That was a conscious decision, because we didn’t want to add a bytecode weaving requirement to NHibernate. What this means is that if you mark a property as lazy, it must be a virtual automatic property. If you attempt to access the underlying field value, instead of going through the property, you will circumvent the lazy loading of the property, and may get unexpected results.

Encapsulation is the enemy of the user interface

I got this question a while ago from Kyle, and I think is is a great one. It is especially great since it is an exchange of emails that resulted in the following (all of which are Kyle words):

I've been annoyed lately by the MVVM pattern. It seems like it requires that the data on your business classes be public so that the view-model can get at it, and that completely breaks encapsulation and goes against standard OO design theory (in my opinion).

The UI layer should be allowed to reference the data layer. I recalled a post you wrote where your UI needs to basically pull things out of queries and such directly (that's what I understood it to mean, anyway). I'm not sure how to pull this off easily just yet, because it seems like it would still break encapsulation somewhere down the line, but it's an interesting thought.

And yeah, I realized after sending the email about CQS. I've decided that my preferred way is actually having my model be able to create a view-model. It's still not pretty, but it's much better (in my view) than having all public data on my business models. I can use commands to bind directly to the model, and the view-model can cause that to happen correctly.

I thought about CQS more, and have a really nice way of doing the whole shebang, I think. It does kind of use your "Two different models for read vs write" concept. I've even come up with a little pseudo-enterprisey application to write using this design style. You'll like it - it's a Netflix for books [[netflix for books is a library]], essentially.

My answer to that is that Kyle is correct. On the one hand, we have the needs of the UI to show information, and on the other hand, we want to have good encapsulation for our business entities. UI forces us to expose information to the user, and that encourages properties laden models. The problem with this approach is that often we try to make use of the same model for several tasks, such as using business entities for user interface, or even asking the business entities to generate the view models that they represent.

CQS is a design methodology that is aimed at resolving this conflict, at its heart, it is actually very simple. It simply stipulate that you are going to have two different models for representing it. One for reads (queries) and another for writes (commands). Once we accept that, we can see that we can evolve each of those models independently. And then we get to the point where we see that the data storage mechanism that we use for each model can be optimize independently for each use case.

For example, when using commands, we generally perform lookups by primary key alone, so we can avoid the overhead of indexes, or even select a storage format that is suitable for key based lookups (DHT, for example) while updating the query data store as a background process which allow the entire system to stay stable under high degree of stress.

In other words, once we have split the responsibilities of the system up so we don’t overload the responsibilities of a single model to be both read and write capable, we are in a much better position to shape the way we handle our software.

The paradox of choice: best of breed or cheapest of the bunch

Roy Osherove has a few tweets about commercial tools vs. free ones in the .NET space. I’ll let his tweets serve as the background story for this post:

image

image

The backdrop is that Roy seems to be frustrated with the lack of adoption of what he considers to be better tools if there are free tools that deal with the same problem even if they are inferior to the commercial tools. The example that he uses is Final Builder vs. NAnt/Rake.

As someone who is writing both commercial and free tools, I am obviously very interested in both sides of the argument. I am going to accept, for the purpose of the argument, that the commercial tool X does more than the free tool Y who deals with the same problem. Now, let us see what the motivations are for picking either one of those.

With a free tool, you can (usually) download it and start playing around with it immediately. With commercial products, you need to pay (usually after the trail is over), which means that in most companies, you need to justify yourself to someone, get approval, and generally deal with things that you would rather not do. In other words, the barrier for entry is significantly higher for commercial products. I actually did the math a while ago, and the conclusion was that good commercial products usually pay for themselves in a short amount of time.

But, when you have a free tool in the same space, the question becomes more complex. Roy seems to think that if the commercial product does more than the free one, you should prefer it. My approach is slightly different. I think that if the commercial product solves a pain point or remove friction that you encounter with the free product, you should get it.

Let us go back to Final Builder vs. NAnt. Let us say that it is going to take me 2 hours to setup a build using Final Builder and 8 hours to setup the same build using NAnt. It seems obvious that Final Builder is the better choice, right? But if I have to spend 4 hours to justify buying Final Builder, the numbers are drastically different. And that is a conservative estimate.

Worse, let us say that I am an open minded guy that have used NAnt in the past. I know that it would take ~8 hours to setup the build using NAnt, and I am pretty sure that I can find a better tool to do the work. However, doing a proper evaluation of all the build tools out there is going to take three weeks. Can I really justify that to my client?

As the author of a commercial product, it is my duty to make sure that people are aware that I am going to fix their pain points. If I have a product that is significantly better than a free product, but isn’t significantly better at reducing pain, I am not going to succeed. The target in the product design (and later in the product marketing) is to identify and resolve pain points for the user.

Another point that I want to bring up is the importance of professional networks to bring information to us. No one can really keep track on all the things that are going on in the industry, and I have come to rely more & more on the opinions of the people in my social network to evaluate and consider alternatives in areas that aren’t offering acute pain. That allows me to be on top of things and learn what is going on at an “executive brief” level. That allows me to concentrate on the things that are acute to me, knowing the other people running into other problems will explore other areas and bring their results to my attention.

DSLs in Boo is out!

7i3h.jpgIt has been quite a journey for me, starting in 2007(!) up until about a month ago, when the final revision is out. I am very happy to announce that my book is now available in its final form. 

When I actually got the book in my hands I was ecstatic. That represent about two years worth of work, and some pretty tough hurdles to cross (think about the challenge that editing something the size of a book from my English is). And getting the content right was even harder.

On the one hand, I wanted to write something that is actionable, my success criteria for the book is that after reading it, you can go ahead and write production worthy Domain Specific Languages implementations. On the other hand, I didn’t want to have the reader left without the theoretical foundation that is required to understand what is actually going on.

Looking back at this, I think that I managed to get that done well enough. The total page count is ~350 pages, and without the index & appendixes, it is just about 300 pages. Which, I hope, is big enough to give you working knowledge without bogging you down with too much theory.

Rejecting Dependency Injection Inversion

Uncle Bob has a post about why you should limit your use of IoC containers. I read that post with something very close to trepidation, because the first example that I saw told me a lot about the underlying assumptions made when this post was written.

Just to give you an idea about how many problems there are with this example when you want to talk about IoC in general, I made a small (albeit incomplete) list:

  • The example is a class that has two dependencies, who themselves has no dependencies.
  • There is manual mapping between services and their implementations.
  • All services share the same life span.
  • The container is used using the Service Locator pattern.

Now, moving to the concrete parts of the post, I mostly agree that this is an anti pattern, but not because of the code is using IoC. The code is actually misusing it quite badly, and trying to draw conclusions about the practice of IoC from that sample (or similar to that) is like saying that we should abolish SQL because of an example using string concatenation has security issues.

I am not really sure about the practices of IoC usage in the Java side, but on the .NET world, that sort of code is frowned upon for at least 4 or 5 years. The .Net IoC community has been very loud about how you should use an IoC. We have been saying for a long time that the appropriate place to get instances from the IoC is deep in the bowels of the application infrastructure. A good example of that is using ASP.Net MVC Controller Factory, that is the only place in the application that will make use of the container directly.

Now, that takes care of the direct dependency on the container, let us talk about a dependency graph that has more than a single level to it. Here is something that is still fairly simplistic:

 

image

I colored all the things that share the same instance and those that do not. Trying to keep track of those manually, or through factories, would be a pure nightmare. Just try to imagine just how much code you are going to need to do that.

Furthermore, what about when we have different life spans for different components (logger is singleton, database is per request, tracking service is per session, etc). At this point you raise the complexity of the hand rolled solution by an order of magnitude once again. Using an IoC, on the other hand, means that you just need to configure things properly.

Which leads me to the next issue, manually mapping between services and their implementation is something that we more or less stopped doing circa 2006. All containers in the .Net space supports some form of auto registration, which means that usually we don’t have to do anything to get things working.

As I said, I am not really sure what the status is on the Java world, but I have to say that while the issues that Uncle Bob pointed out in the post are real, the root cause isn’t the use of IoC, it is the example he was working with. And if this is a typical example of IoC usage in the Java world, then he should peek over the fence to see how IoC is commonly implemented in the .Net space.

Core NHibernate Course in London, 24th February

Well, it is about that time again :-)

In about a month I’ll be returning to the UK to give another round of my NHibernate Course. It has been a while since I gave that in London, but the previous two runs were very successful, and I had great time teaching it.

This course is meant to give you working knowledge how to effectively use NHibernate in your applications, based on real world expertise.

You can register here: http://skillsmatter.com/course/open-source-dot-net/core-persistence-with-nhibernate

Stupid support emails, #4

I am having a friendly competition with a friend about the stupidest support questions that we get from random people we never met. I posted about this previous, but I really can’t resist posting the content of something that I just received, it is either that or figure out how to send a nuke via email.

Hi,

Im f[removed] from Malaysia…. I has look at ur website about SOA. Did u do SOA application? Actually im a new learner, my knowledge are 0 about programming. I take an e-commerce course, at here I just do like a practical for 6 month and now I need to do on SOA. If u can help, I need some information about SOA, how to integrate the application, service and database. I help u can help me…

Thank you,

Best Regards;

F[removed]

This is the actual email text, I merely removed this guy name.

Therefore, I unilaterally declare myself the winner of the stupidest support emails contest.

When the design violates the principle of least surprise, you don’t close it as By Design

I don’t actually have an opinion about the actual feature, but I felt that I just have to comment on this post, from Brad Wilson, about the [Required] attribute in ASP.Net MVC 2.

Approximately once every 21.12 seconds, someone will ask this question on the ASP.NET MVC forums

The answer is the title of this blog post. ([Required] Doesn’t Mean What You Think It Does)

If this is the case, I have to say that the design of [Required] is misleading, and should be change to match the expectations of the users.

We have a pretty common case of plenty of users finding this behavior problematic, the answer isn’t to try to educate the users, the answer is to fix the design so it isn’t misleading.

I am pretty sure that when the spec for the feature was written, it made sense, but that doesn’t mean that it works in the real world. I think it should either be fixed, or removed. Leaving this in would be a constant tripwire that people will fall into.

Army Reserve Duty

I’m currently on the way to several days of Army Reserve Duty, with limited to none internet connectivity.

I am making this announcement because the last few times I dropped offline for some time people started speculating that I am dead, which I sort of resent.

Eagerly loading entity associations efficiently with NHibernate

One of the things that seems to pop up frequently is people wanting to load an entity with all of its associations eagerly. That is pretty easy to do when the associations are many-to-one (that is, there is only one of them for each root entity). Example of those would be things like Owner, Site, etc.

Here is an HQL query that would load a blog with its owner and site as well:

from Blog b left join fetch b.Owner left join fetch b.Site

The problem starts when you try to do the same for multiple collection associations. NHibernate allows you to do so, but the result is probably not what you would initially expect. This query, for example:

from Blog b left join fetch b.Posts left join fetch b.Users where b.Id = :id

Will result in the following SQL statement:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       user3_.Id             as Id5_2_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__,
       user3_.Password       as Password5_2_,
       user3_.Username       as Username5_2_,
       user3_.Email          as Email5_2_,
       user3_.CreatedAt      as CreatedAt5_2_,
       user3_.Bio            as Bio5_2_,
       users2_.BlogId        as BlogId1__,
       users2_.UserId        as UserId1__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
       left outer join UsersBlogs users2_
         on blog0_.Id = users2_.BlogId
       left outer join Users user3_
         on users2_.UserId = user3_.Id
where  blog0_.Id = 1 /* @p0 */

Something that may not be apparent immediately is going to result in a Cartesian product. This is pointed out in the documentation, but I think that we can all agree that while there may be reasons for this behavior, it is far from ideal.

Let us look at what other OR/Ms are doing, shall we?

The comparable query using Entity Framework would look like this:

db.Blogs
    .Include("Posts")
    .Include("Users")
    .Where(x => x.Id == i)
    .ToArray();

And the resulting SQL would be:

SELECT   [UnionAll1].[Id]             AS [C1],
         [UnionAll1].[Title]          AS [C2],
         [UnionAll1].[Subtitle]       AS [C3],
         [UnionAll1].[AllowsComments] AS [C4],
         [UnionAll1].[CreatedAt]      AS [C5],
         [UnionAll1].[C2]             AS [C6],
         [UnionAll1].[C1]             AS [C7],
         [UnionAll1].[C3]             AS [C8],
         [UnionAll1].[Id1]            AS [C9],
         [UnionAll1].[Title1]         AS [C10],
         [UnionAll1].[Text]           AS [C11],
         [UnionAll1].[PostedAt]       AS [C12],
         [UnionAll1].[BlogId]         AS [C13],
         [UnionAll1].[UserId]         AS [C14],
         [UnionAll1].[C4]             AS [C15],
         [UnionAll1].[C5]             AS [C16],
         [UnionAll1].[C6]             AS [C17],
         [UnionAll1].[C7]             AS [C18],
         [UnionAll1].[C8]             AS [C19],
         [UnionAll1].[C9]             AS [C20]
FROM     (SELECT CASE 
                   WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
                   ELSE 1
                 END AS [C1],
                 [Extent1].[Id]             AS [Id],
                 [Extent1].[Title]          AS [Title],
                 [Extent1].[Subtitle]       AS [Subtitle],
                 [Extent1].[AllowsComments] AS [AllowsComments],
                 [Extent1].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C2],
                 CASE 
                   WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
                   ELSE 1
                 END AS [C3],
                 [Extent2].[Id]             AS [Id1],
                 [Extent2].[Title]          AS [Title1],
                 [Extent2].[Text]           AS [Text],
                 [Extent2].[PostedAt]       AS [PostedAt],
                 [Extent2].[BlogId]         AS [BlogId],
                 [Extent2].[UserId]         AS [UserId],
                 CAST(NULL AS int)          AS [C4],
                 CAST(NULL AS varbinary(1)) AS [C5],
                 CAST(NULL AS varchar(1))   AS [C6],
                 CAST(NULL AS varchar(1))   AS [C7],
                 CAST(NULL AS datetime)     AS [C8],
                 CAST(NULL AS varchar(1))   AS [C9]
          FROM   [dbo].[Blogs] AS [Extent1]
                 LEFT OUTER JOIN [dbo].[Posts] AS [Extent2]
                   ON [Extent1].[Id] = [Extent2].[BlogId]
          WHERE  [Extent1].[Id] = 1 /* @p__linq__1 */
          UNION ALL
          SELECT 2                          AS [C1],
                 [Extent3].[Id]             AS [Id],
                 [Extent3].[Title]          AS [Title],
                 [Extent3].[Subtitle]       AS [Subtitle],
                 [Extent3].[AllowsComments] AS [AllowsComments],
                 [Extent3].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C2],
                 CAST(NULL AS int)          AS [C3],
                 CAST(NULL AS int)          AS [C4],
                 CAST(NULL AS varchar(1))   AS [C5],
                 CAST(NULL AS varchar(1))   AS [C6],
                 CAST(NULL AS datetime)     AS [C7],
                 CAST(NULL AS int)          AS [C8],
                 CAST(NULL AS int)          AS [C9],
                 [Join2].[Id]               AS [Id1],
                 [Join2].[Password]         AS [Password],
                 [Join2].[Username]         AS [Username],
                 [Join2].[Email]            AS [Email],
                 [Join2].[CreatedAt]        AS [CreatedAt1],
                 [Join2].[Bio]              AS [Bio]
          FROM   [dbo].[Blogs] AS [Extent3]
                 INNER JOIN (SELECT [Extent4].[UserId]    AS [UserId],
                                    [Extent4].[BlogId]    AS [BlogId],
                                    [Extent5].[Id]        AS [Id],
                                    [Extent5].[Password]  AS [Password],
                                    [Extent5].[Username]  AS [Username],
                                    [Extent5].[Email]     AS [Email],
                                    [Extent5].[CreatedAt] AS [CreatedAt],
                                    [Extent5].[Bio]       AS [Bio]
                             FROM   [dbo].[UsersBlogs] AS [Extent4]
                                    INNER JOIN [dbo].[Users] AS [Extent5]
                                      ON [Extent5].[Id] = [Extent4].[UserId]) AS [Join2]
                   ON [Extent3].[Id] = [Join2].[BlogId]
          WHERE  [Extent3].[Id] = 1 /* @p__linq__1 */) AS [UnionAll1]
ORDER BY [UnionAll1].[Id] ASC,
         [UnionAll1].[C1] ASC

At this point, I am pretty sure, your eyes shut down in self defense. This is one complex query. But, basically, this is a complex query because EF is executing the following two queries and unioning them.

Eager load Blog Posts:

SELECT CASE 
       WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
       ELSE 1
     END AS [C1],
     [Extent1].[Id]             AS [Id],
     [Extent1].[Title]          AS [Title],
     [Extent1].[Subtitle]       AS [Subtitle],
     [Extent1].[AllowsComments] AS [AllowsComments],
     [Extent1].[CreatedAt]      AS [CreatedAt],
     1                          AS [C2],
     CASE 
       WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int)
       ELSE 1
     END AS [C3],
     [Extent2].[Id]             AS [Id1],
     [Extent2].[Title]          AS [Title1],
     [Extent2].[Text]           AS [Text],
     [Extent2].[PostedAt]       AS [PostedAt],
     [Extent2].[BlogId]         AS [BlogId],
     [Extent2].[UserId]         AS [UserId],
     CAST(NULL AS int)          AS [C4],
     CAST(NULL AS varbinary(1)) AS [C5],
     CAST(NULL AS varchar(1))   AS [C6],
     CAST(NULL AS varchar(1))   AS [C7],
     CAST(NULL AS datetime)     AS [C8],
     CAST(NULL AS varchar(1))   AS [C9]
FROM   [dbo].[Blogs] AS [Extent1]
     LEFT OUTER JOIN [dbo].[Posts] AS [Extent2]
       ON [Extent1].[Id] = [Extent2].[BlogId]
WHERE  [Extent1].[Id] = 1 /* @p__linq__1 */

Eager load Blog Users:

SELECT 2                          AS [C1],
     [Extent3].[Id]             AS [Id],
     [Extent3].[Title]          AS [Title],
     [Extent3].[Subtitle]       AS [Subtitle],
     [Extent3].[AllowsComments] AS [AllowsComments],
     [Extent3].[CreatedAt]      AS [CreatedAt],
     1                          AS [C2],
     CAST(NULL AS int)          AS [C3],
     CAST(NULL AS int)          AS [C4],
     CAST(NULL AS varchar(1))   AS [C5],
     CAST(NULL AS varchar(1))   AS [C6],
     CAST(NULL AS datetime)     AS [C7],
     CAST(NULL AS int)          AS [C8],
     CAST(NULL AS int)          AS [C9],
     [Join2].[Id]               AS [Id1],
     [Join2].[Password]         AS [Password],
     [Join2].[Username]         AS [Username],
     [Join2].[Email]            AS [Email],
     [Join2].[CreatedAt]        AS [CreatedAt1],
     [Join2].[Bio]              AS [Bio]
FROM   [dbo].[Blogs] AS [Extent3]
     INNER JOIN (SELECT [Extent4].[UserId]    AS [UserId],
                        [Extent4].[BlogId]    AS [BlogId],
                        [Extent5].[Id]        AS [Id],
                        [Extent5].[Password]  AS [Password],
                        [Extent5].[Username]  AS [Username],
                        [Extent5].[Email]     AS [Email],
                        [Extent5].[CreatedAt] AS [CreatedAt],
                        [Extent5].[Bio]       AS [Bio]
                 FROM   [dbo].[UsersBlogs] AS [Extent4]
                        INNER JOIN [dbo].[Users] AS [Extent5]
                          ON [Extent5].[Id] = [Extent4].[UserId]) AS [Join2]
       ON [Extent3].[Id] = [Join2].[BlogId]
WHERE  [Extent3].[Id] = 1 /* @p__linq__1 */) 

The query may be complex, but it get the job done and does so without bothering the us much. The question is, can we do the same with NHibernate?

As it run out, we can, pretty easily too. The following code will do just that:

 var blogs = s.CreateQuery("from Blog b left join fetch b.Posts where b.Id = :id")
     .SetParameter("id", 1)
     .Future<Blog>();

 s.CreateQuery("from Blog b left join fetch b.Users where b.Id = :id1")
     .SetParameter("id1", 1)
     .Future<Blog>();

So, what is going on here? We are actually issuing two queries, each of them to eagerly load a single collection. The trick is that we are using future queries to do so. That means that the query that is going to be sent to the database to get those results is:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
where  blog0_.Id = 1 /* @p0 */
select blog0_.Id             as Id7_0_,
       user2_.Id             as Id5_1_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       user2_.Password       as Password5_1_,
       user2_.Username       as Username5_1_,
       user2_.Email          as Email5_1_,
       user2_.CreatedAt      as CreatedAt5_1_,
       user2_.Bio            as Bio5_1_,
       users1_.BlogId        as BlogId0__,
       users1_.UserId        as UserId0__
from   Blogs blog0_
       left outer join UsersBlogs users1_
         on blog0_.Id = users1_.BlogId
       left outer join Users user2_
         on users1_.UserId = user2_.Id
where  blog0_.Id = 1 /* @p1 */

Note that we are essentially discarding the results of the second query. The reason for that is that we aren’t actually interested in those results, we execute this query solely to get NHibernate to fill the Users’ collection of the relevant entity.

I generally use this method so I would have the first query to eager load all the many-to-one associations, and then a series of queries (one per required loaded collection) to load one-to-many or many-to-many associations.

WPF WebBrowser and JavaScript interaction

I found myself needing to interact with JavaScript API in a WebBrowser hosted in a WPF application. That turned out to be quite a challenge to figure out, but I prevailed :-)

Given the following XAML:

<Window x:Class="WpfApplication1.Window1"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    Title="Window1" Height="300" Width="300">
    <Grid>
      <WebBrowser x:Name="Browser" />
    </Grid>
</Window>

You can interact with the JavaScript API using this code:

public partial class Window1 : Window
{
    public Window1()
    {
        InitializeComponent();
        Browser.LoadCompleted += BrowserOnLoadCompleted;
        Browser.Navigate(new Uri("http://example.com"));
    }

    private void BrowserOnLoadCompleted(object sender, NavigationEventArgs navigationEventArgs)
    {
        var doc = (HTMLDocument)Browser.Document;
        var head = doc.getElementsByTagName("head").Cast<HTMLHeadElement>().First();
        var script = (IHTMLScriptElement)doc.createElement("script");
        script.text = "alert('hi');";
        head.appendChild((IHTMLDOMNode)script);
        script.text = "alert('bye');";
    }
}

This allow you to inject JavaScript commands to the browser by manipulation the script.text property. It isn’t generally useful, I’ll admit, but for my scenario, it seems to be doing the job.

My Java Experience

As I mentioned before, I am actively trying to find what I don’t know, and plug it. As part of that endeavor, I spent the last week learning Java EE. I decided that in the interest of saving time I am going to invest some money and I took a week long course in it. That allowed me to kill several birds in a single stone, I got to experience Java in an environment that had an instructor that I could call on for help, it allowed me to learn Java EE and it allowed to me experience how people using Hibernate Profiler feel like.

Now that it is over, I can say that the course has met my expectations to a T. But that isn’t the point of this post. I wanted to talk about what I learned and my experience.

Nitpicker corner: please note that I am talking only about Java EE and EJB 3.0. I am well aware of the existence of frameworks that build on top of that (and earlier versions), but right now I intend to discuss Naked Java EE.

From language perspective, working with Java was incredibly frustrating. I had not really realized how much I come to rely on such things as object initializers, the var keyword or null coalescing operators.

As a platform, I have a much better understanding for how the Java EE eco system developed. Basically, it works like this (the concepts does not translate exactly, mind you):

Java EE .NET
Servlet Http Handler – but with singleton semantics & with threading issues.
JSP ASPX – but without any code behind.
Listener Http Module

One of the things that really drove me crazy is the notion of making everything a singleton. I wonder if there was a time where object creation in Java was very expensive, that might result in this design. The problem in making everything a singleton is that it become shared state, which means that you need to handle concurrency yourself. This is a bad thing. In fact, I would go further and say that any framework that requires users to handle concurrency for standard scenarios is flawed.

When I learned that you had to wire each independent servlet in the web.xml, I was quite shocked. The sheer friction that this is going to add to a project is mind numbing. Imagine the camera unzooming, and the full picture showing up. There are usually very few servlets in an application, often only one. They handle the request by sending the data to an EJB to handle the business end and a JSP to handle the HTML generation end. If it sounds a lot like MVC, it should. The interesting bit is that I don’t think that all the Java MVC frameworks came out because of anything unique in the Java sphere. They came out of sheer self defense out of the amount of toil & trouble that you have to go through using what the bare bones Java EE gave you.

Moving on to what I would call the controller side of things, we have EJBs, Enterprise Java Beans. There are Session Beans and Message Driven Beans. The MDB reminded me quite strongly of a very basic service bus. Considering something like Rhino ServiceBus or NServiceBus, it looks very similar in concept and execution, but without a lot of the things that the those two do for you. Session Beans are supposed to handle requests, they are divided into stateful and stateless beans. In general, I seems that you would generally use the stateless beans.

A stateless bean is a class implementing a specific interface, and that is where a lot of the interesting things are happening, dependency injection, automatic transaction management, session management, etc. It is interesting to note that with .NET we have gotten to the same result, but without having the overreaching presence of the container everywhere. I really like the fact that there is no attempt to disguise how they are doing things, the Java’s language decision to make everything virtual by default has really paid off in here.

Still, I can’t say that I like the default architecture, it seems very inflexible. I do like the ideas of full blown servers that you can just deploy a file into. It is a very nice concept, but it has some sever downsides, the time that it takes to do a modify-deploy-test cycle is significantly larger than what I noticed on the .NET side of things. And keep in mind that I am talking about projects that had about 5 classes in them all told.

During the course, the instructor said something that I found very insightful, “the tooling help you deal with… [wiring the xml, inheriting from the appropriate interfaces, etc]”. I found this very telling, because up until then I was quite puzzled by the behavior of all the standard wizards in Eclipse. They seem to violate the KISS principle, especially after getting used to the Yes, Dear experience when using R# on VS. It was only after I realized just how much work those poor wizards had to do for me that I understood what was going on.

After the course, I took a look at some of the MVC frameworks in the Java market. Just reading the tutorials is fascinating, from a code archaeology perspective. You can clearly see that Struts came early on, since while I am sure it is an improvement over Naked Jave EE, the amount of XML you have to deal with is not funny.

All in all, I find myself unimpressed by the amount of work that was shuffled to the tools, it doesn’t seem right. And it seems like a justification of a bad practice. When I consider my own design principles (Zero Friction!) in light of this, I am much happier that I am mainly working in the .NET world. But I think that having this understanding is going to be very helpful moving forward.

Designing the Entity Framework 2nd level cache

One of the things that I am working on is another commercial extension to EF, a 2nd level cache. At first, I thought to implement something similar to the way NHibernate does this, that is, to create two layers of caching, one for entity data and the second for query results where I would store only scalar information and ids.

That turned out to be quite hard. In fact, it turned out to be hard enough that I almost gave up on that. Sometimes I feel that extending EF is like hitting your head against the wall, eventually you either collapse or the wall fall down, but either way you are left with a headache.

At any rate, I eventually figured out a way to get EF to tell me about entities in queries and now the following works:

// will hit the DB
using (var db = new Entities(conStr))
{
    db.Blogs.Where(x => x.Title.StartsWith("The")).FirstOrDefault();
}

// will NOT hit the DB, will use cached data for that
using(var db = new Entities(conStr))
{
   db.Blogs.Where(x => x.Id == 1).FirstOrDefault();
}

The ability to handle such scenarios is an important part of what makes the 2nd level cache useful, since it means that you aren’t limited to just caching a query, but can perform far more sophisticated caching. It means better cache freshness and a lot less unnecessary cache cleanups.

Next, I need to handle partially cached queries, cached query invalidation and a bunch of other minor details, but the main hurdle seems to be have been dealt with (I am willing to lay odds that I will regret this statement).

UberProf new feature: Query Plan Cache Misuse

image This is a new feature available for NHibernate Profiler*, Linq to SQL Profiler and Entity Profiler. Basically, it detects when the same query is executed with different parameter sizes, which generate different query plan in the query cache.

Let us say that we issue two queries, to find users by name. (Note that I am using a syntax that will show you the size of the parameters, to demonstrate the problem).

We can do this using the following queries.

exec sp_executesql 
      N'SELECT * FROM Users WHERE Username = @username',
      N'@username nvarchar(3)',
      @username=N'bob'
exec sp_executesql 
      N'SELECT * FROM Users WHERE Username = @username',
      N'@username nvarchar(4)',
      @username=N'john'

This sort of code result in two query plans stored in the database query cache, because of the different parameter sizes. In fact, if we assume that the Username column has a length of 16, this single query may take up 16 places in the query cache.

Worse, if you have two parameters whose size change, such as username (length 16) and password (length 16), you may take up to 256 places in the query cache. Obviously, if you use more parameters, or if their length is higher, the number of places that a single query can take in the query cache goes up rapidly.

This can cause performance problems as the database need to keep track of more query plans (uses more memory) may need evict query plans from the cache, which would result in having to rebuild the query plan (increase server load and query time).

* Please note that detecting this in NHibernate requires the trunk version of NHibernate. And it is pretty useless there, since on the trunk, NHibernate will never generate this issue.

The NIH dance

I started thinking about all the type of stuff that I had to write or participated at, and I find it… interesting.

  1. Database – Rhino.DivanDB (hobby project).
  2. Data Access – Multitude of those.
  3. OR/M – NHibernate, obviously, but a few others as well.
  4. Distributed caching systems – NMemcached – several of those.
  5. Distributed queuing systems – Rhino Queues actually have ~7 different implementations.
  6. Distributed hash table – Rhino DHT is in production.
  7. Persistent hash tables – Rhino PHT, of course, but I actually had to write a different implementation for the profiler as well.
  8. Mocking framework – Rhino Mocks, obviously.
  9. Web frameworks– I am referring to MonoRail, although I only dabbled there, to be truthful. Rhino Igloo was a lot of fun, too, if only because I had to.
  10. Text templating language – Brail
  11. Inversion of Control containers – Windsor, and a few custom ones.
  12. AOP – I actually built several implementation, the most fun was with the Code DOM approach :-)
  13. Dynamic Proxies & IL weaving – Castle Dynamic Proxy, not the recommended way to learn IL, I must say.
  14. CMS systems – several, but I really like Impleo and the concept behind it.
  15. ETL system – Took 3 times to get right.
  16. Security system – Rhino Security was fun to design, and quite interesting to implement.
  17. Licensing framework – because trying to buy one commercially just didn’t work.
  18. Service Bus – which I consider to be one of my best coding efforts.
  19. CI Server – so I can get good GitHub integration.
  20. Domain Specific Language framework – well, I did write the book on that :-)
  21. Source control server – SvnBridge

I haven’t written a testing framework, though.

I am probably forgetting a lot of stuff, actually…

How to opt out of Program Compatibility Assistant?

A recent change in the profiler has resulted in the following dialog showing up whenever you close the application on x64 Vista/Win7 machines.

image

Just to be clear, I am not using flash in any way, but something is triggering this check.

Basically, I think that somewhere a call like the one described here is made. Checking for the presence of flash, and that is what triggers the PCA dialog. That makes a sort of sense, mostly because we now shell out to IE to do some stuff for us (we use WPF’s builtin WebBrowser control).

Now, the documentation for that says:

PCA is intended to detect issues with older programs and not intended to monitor programs developed for Windows Vista and Windows Server 2008. The best option to exclude a program from PCA is to include, with the program, an application manifest with run level (either Administrator or as limited users) marking for UAC. This marking means the program is tested to work under UAC (and Windows Vista and Windows Server 2008). PCA checks for this manifest and will exclude the program. This process applies for both installer and regular programs.

The problem, however, is that even after I included the $@#$(@# manifest, it is still showing the bloody dialog.

I find it quite annoying. Here is the custom manifest that comes with the profiler.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <v3:trustInfo xmlns:v3="urn:schemas-microsoft-com:asm.v3">
    <v3:security>
      <v3:requestedPrivileges>
        <v3:requestedExecutionLevel level="asInvoker" uiAccess="false" />
      </v3:requestedPrivileges>
    </v3:security>
  </v3:trustInfo>
</assembly>

As far as I can see, it should work.

Any ideas?

And twitter came to the rescue and told me that I need to specify that I am compatible on Win7, the current manifest, which fixes the issue, is:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <v3:trustInfo xmlns:v3="urn:schemas-microsoft-com:asm.v3">
    <v3:security>
      <v3:requestedPrivileges>
        <v3:requestedExecutionLevel level="asInvoker" uiAccess="false" />
      </v3:requestedPrivileges>
    </v3:security>
  </v3:trustInfo>
  <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
    <application>
      <!--The ID below indicates application support for Windows Vista -->
      <supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
      <!--The ID below indicates application support for Windows 7 -->
      <supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
    </application>
  </compatibility>
</assembly>

I would like to Paul Betts for handing me the answer in less than 3 minutes.

As to the methods there may be a million, but principles are few

I run across the following quote a while ago, and I found it quite interesting.

“As to the methods there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.”

- Ralph Waldo Emerson (1803-1882)

I have been programming, in one form or another, for about fifteen years, but I can put my finger on the precise moment in which I moved from a mere dabbler to a professional. That was in 1999, and I decided that I had enough of toying with Pascal, VB6 & Java Applets. It was the height of the bubble, and I wanted to learn just enough to be able to get a job doing something that I enjoyed. I had about a year opened to me, and I registered myself to a C/C++ course in a local college.

In hindsight, that was one of the best things that I have ever done. That course taught me C and pointers, and then C++ and OO. I also introduced me to concepts that I have been using ever since. Admittedly, I don’t want to look at any of my code from that time period, but that is probably a good thing :-) The most important part of the course was that it taught me how computers work, by introducing C first and forcing me to write my own implementation of any system call that I wanted to make.

I studied programming in High School as well, and I distinctly remember being utterly and completely baffled by strange things like dynamic memory and pointers. I mean, why don’t you just allocate a bigger array. During that course I actually grasped pointers for the first time, and even looking back over the last couple of weeks, a lot of my performance work recently is directly based on things that I learned there.

After completing that course, I got several books to help me understand the fundamentals better. Operating Systems Design and Implementation, Modern Operating Systems and Operating System Concepts to understand how an operating system works, not just a single program. Win32 System Programming, which I read mainly to understand the environment in which I was working and Windows Sockets Network Programming, from which I learned the basic concepts of networking.

The common thread throughout all of them is that initially I focused on understanding the technical environment in which I was working, getting to understand how things are working at a very low level. And while it may appear that understand those low level details would be nice in terms of general education but have little relevance to where I am spending most of my time, that is quite inaccurate. The recently built serialization system built for the profiler was heavily influenced from my reading of the OS books, for example.

For that matter, other good books are Practical File System Design, talking about the BeOS file system, which I found utterly fascinating or Virtual Machine Design and Implementation C/C++ which is a horrible book, but one that gave me the confidence to do a lot of things, since I saw how trivially simple it was to build such things.

Coming back to the quote in the beginning of this post, understanding the underlying principles has allowed me to do approach a new technology with the confidence that I understand how it must work, because I understand the environment in which it works. Oh, there are a lot of details that you need to get, but once you have the conceptual model of a technology in mind, it is so much easier to get to grips with it.

Interestingly enough, I only got to software design books at a much later stage, and even today, I find the low level details quite fascinating, when I can get new material in a subject that is interesting to me.

Tags:

Published at

Originally posted at

Comments (18)

The problem with compression & streaming

I spent some time today trying to optimize the amount of data the profiler is sending on the wire. My first thought was that I could simply wrap the output stream with a compressing stream and use that, indeed, in my initial testing, it proved to be quite simple to do and reduced the amount of data being sent by a factor of 5. I played around a bit more and discovered that different compression implementation can bring me up to a factor of 50!

Unfortunately, I did all my initial testing on files, and while the profiler is able to read files just fine, it is most commonly used for live profiling, to see what is going on in the application right now. The problem here is that adding compression is a truly marvelous way to screw that up. Basically, I want to compress live data, and most compression libraries are not up for that task. It gets a bit more complex when you realize that what I actually wanted was a way to get compression to work on relatively small data chunks.

When you think how most compression algorithm works (there is a dictionary in there somewhere), you realize what the problem is. You need to keep updating the dictionary while you are compressing the stream, and at the same time, you need the dictionary to uncompress things. That make it… difficult to handle things. I thought about compressing small chunks (say, every 256Kb), but then I run into problems of figuring out when exactly I am supposed to be flushing them, how to handle partial messages, and more.

In the end, I decided that while it was a very interesting trial run, this is not something that is likely to show good ROI.

NHibernate, polymorphic associations and ghost objects

image

One of the more interesting points of my posts about Entity Framework & NHibernate is the discovery of things that Entity Framework can do that NHibernate cannot. In fact, if you’ll read the posts, instead of the comments, you can see that this is precisely what I asked, but people didn’t really read the text.

I wanted to dedicate this post to ghost objects, and how NHibernate deals with them.

Before we start, let me explain what ghost objects are. Let us say that you have a many to one polymorphic association, such as the one represented as Comment.Post.

A post may be either a Post or an Article, and since NHibernate by default will lazy load the association, NHibernate will generate a proxy object (also called a ghost object). That, in turn, result in several common issues: Leaking this and the inability to cast to the proper type are the most common ones.

In practice, this is something that you would generally run into when you are violating the Liskov Substitution Principle, so my general recommendation is to just fix your design.

Nevertheless, since the question pop up occasionally, I thought that I might write a bit more details on how to resolve this. Basically, the main issue is that at the point in time where we are loading the Comment entity, we don’t have enough information to know what the actual entity type is. The simplest way to work around this issue is to tell NHibernate to load the associated entity as part of the parent entity load.

In the case of the comment, we can do it like this:

<many-to-one name="Post" 
             lazy="false"
             outer-join="true"
             column="PostId"/>

The lazy=”false” tell NHibernate to load the association eagerly, while the outer-join will add a join to load it in a single query. One thing to note, however, is that (by design) HQL queries will ignore any hints in the mapping, so you would have to specify join fetch explicitly in the mapping, otherwise it would generate a separate query for that.

Since we eagerly load the associated entity, and we know its type, we don’t have to deal with any proxies, and can avoid the ghost objects problem completely.