Ayende @ Rahien

It's a girl

NHibernate vs. Entity Framework: Usage

This is in a response to a question on twitter:

image

In general, for applications, I would always use NHibernate. Mostly because I am able to do so much more with it.

For one off utilities and such, where I just need to get the data and I don’t really care how, I would generally just use the EF wizard to generate a model and do something with it. Mostly, it is so I can get the code gen stuff ASAP and have the data working.

For example, the routine to import the data from the Subtext database to the RacconBlog database is using Entity Framework. Mostly because it is easier and it is a one time thing.

I guess that if I was running any of the tools for generating NHibernate model from the database, I would be using it, but honestly, it just doesn’t really matter at that point. I just don’t care for those sort of tools.

Is OR/M an anti pattern?

This article thinks so, and I was asked to comment on that. I have to say that I agree with a lot in this article. It starts by laying out what an anti pattern is:

  1. It initially appears to be beneficial, but in the long term has more bad consequences than good ones
  2. An alternative solution exists that is proven and repeatable

And then goes on to list some of the problems with OR/M:

  • Inadequate abstraction - The most obvious problem with ORM as an abstraction is that it does not adequately abstract away the implementation details. The documentation of all the major ORM libraries is rife with references to SQL concepts.
  • Incorrect abstraction – …if your data is not relational, then you are adding a huge and unnecessary overhead by using SQL in the first place and then compounding the problem by adding a further abstraction layer on top of that.
    On the the other hand, if your data is relational, then your object mapping will eventually break down. SQL is about relational algebra: the output of SQL is not an object but an answer to a question.
  • Death by a thousand queries – …when you are fetching a thousand records at a time, fetching 30 columns when you only need 3 becomes a pernicious source of inefficiency. Many ORM layers are also notably bad at deducing joins, and will fall back to dozens of individual queries for related objects.

If the article was about pointing out the problems in OR/M I would have no issues in endorsing it unreservedly. Many of the problems it points out are real. They can be mitigated quite nicely by someone who knows what they are doing, but that is beside the point.

I think that I am in a pretty unique position to answer this question. I have over 7 years of being heavily involved in the NHibernate project, and I have been living & breathing OR/M for all of that time. I have also created RavenDB, a NoSQL database, that gives me a good perspective about what it means to work with a non relational store.

And like most criticisms of OR/M that I have heard over the years, this article does only half the job. It tells you what is good & bad (most bad) in OR/M, but it fails to point out something quite important.

To misquote Churchill, Object Relational Mapping is the worst form of accessing a relational database, except all of the other options when used for OLTP.

When I see people railing against the problems in OR/M, they usually point out quite correctly problems that are truly painful. But they never seem to remember all of the other problems that OR/M usually shields you from.

One alternative is to move away from Relational Databases. RavenDB and the RavenDB Client API has been specifically designed by us to overcome a lot of the limitations and pitfalls inherit to OR/M. We have been able to take advantage of all of our experience in the area and create what I consider to be a truly awesome experience.

But if you can’t move away from Relational Databases, what are the alternative? Ad hoc SQL or Stored Procedures? You want to call that better?

A better alternative might be something like Massive, which is a very thin layer over SQL. But that suffers from a whole host of other issues (no unit of work means aliasing issues, no support for eager load means better chance for SELECT N+1, no easy way to handle migrations, etc). There is a reason why OR/M have reached where they have. There are a lot of design decisions that simply cannot be made any other way without unacceptable tradeoffs.

From my perspective, that means that if you are using Relational Databases for OLTP, you are most likely best served with an OR/M. Now, if you want to move away from Relational Databases for OLTP, I would be quite happy to agree with you that this is the right move to make.

Moving away from the Active Record model

Fowler defines Active Record as:

An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.

Note that I am talking about the Active Record pattern and not a particular Active Record implementation.

A typical Active Record class will look like this:

image

By design, AR entities suffer from one major problem, they violate SRP, since the class is now responsible for its own persistence and for the business operations that relates to it. But that is an acceptable decision in many cases, because it leads to code that is much simpler to understand.

In most cases, that is.

The problem with AR is that it also treat each individual entity independently. In other words, its usage encourage code like this:

var person = Person.FindBy("Id", 1);
person.Name = "Oren";
person.Save();

This is a very procedural way of going about things, but more importantly, it is a way that focus on work done on a single entity.

When I compare the abilities of the Active Record style to the abilities of a modern ORM, there is really nothing that can compare. Things like automatic change tracking or batching writes to the database are part of the package with an ORM.

The sort of applications we write today aren’t touching a single row in the database, we tend to work with entities (especially root aggregates) with operations that can touch many rows. Writing those sort of applications using Active Record leads to complications, precisely because the pattern is meant to simplify data access. The problem is that our data access needs aren’t so simple anymore.

Active Record can still be useful for very simple scenarios, but I think that a major drive behind it was that it was so easy to implement compared to a more full featured solution. Since we don’t need to implement a full featured solution (it comes for free with your ORM of choice), that isn’t really a factor anymore. Using a full blown ORM is likely to be easier and simpler.

Playing with Entity Framework Code Only

After making EF Prof work with EF Code Only, I decided that I might take a look at how Code Only actually work from the perspective of the application developer. I am working on my own solution based on the following posts:

But since I don’t like to just read things, and I hate walkthroughs, I decided to take that into a slightly different path. In order to do that, I decided to set myself the following goal:

image

  • Create a ToDo application with the following entities:
    • User
    • Actions (inheritance)
      • ToDo
      • Reminder
  • Query for:
    • All actions for user
    • All reminders for today across all users

That isn’t really a complex system, but my intention is to get to grips with how things work. And see how much friction I encounter along the way.

We start by referencing “Microsoft.Data.Entity.Ctp” & “System.Data.Entity”

There appears to be a wide range of options to define how entities should be mapped. This include building them using a fluent interface, creating map classes or auto mapping. All in all, the code shows a remarkable similarity to Fluent NHibernate, in spirit if not in actual API.

I don’t like some of the API:

  • HasRequired and HasKey for example, seems to be awkwardly named to me, especially when they are used as part of a fluent sentence. I have long advocated avoiding the attempt to create real sentences in a fluent API (StructureMap was probably the worst in this regard). Dropping the Has prefix would be just as understandable, and look better, IMO.
  • Why do we have both IsRequired and HasRequired? The previous comment apply, with the addition that having two similarly named methods that appears to be doing the same thing is probably not a good idea.

But aside from that, it appears very nice.

ObjectContext vs. DbContext

I am not sure why there are two of them, but I have a very big dislike of ObjectContext, the amount of code that you have to write to make it work is just ridiculous, when you compare that to the amount of code you have to write for DbContext.

I also strongly dislike the need to pass a DbConnection to the ObjectContext. The actual management of the connection is not within the scope of the application developer. That is within the scope of the infrastructure. Messing with DbConnection in application code should be left to very special circumstances and require swearing an oath of nonmaleficence. The DbContext doesn’t require that, so that is another thing that is in favor of it.

Using the DbContext is nice:

public class ToDoContext : DbContext
{
    private static readonly DbModel model;

    static ToDoContext()
    {
        var modelBuilder = new ModelBuilder();
        modelBuilder.DiscoverEntitiesFromContext(typeof(ToDoContext));
        modelBuilder.Entity<User>().HasKey(x => x.Username);
        model = modelBuilder.CreateModel();
    }

    public ToDoContext():base(model)
    {
        
    }

    public DbSet<Action> Actions { get; set; }

    public DbSet<User> Users { get; set; }
}

Note that we can mix & match the configuration styles, some are auto mapped, some are explicitly stated. It appears that if you fully follow the builtin conventions, you don’t even need ModelBuilder, as that will be build for you automatically.

Let us try to run things:

using(var ctx = new ToDoContext())
{
    ctx.Users.ToList();
}

The connection string is specified in the app.config, by defining a connection string with the name of the context.

Then I just run it, without creating a database. I expected it to fail, but it didn’t. Instead, it created the following schema:

image

That is a problem, DDL should never run as an implicit step. I couldn’t figure out how to disable that, though (but I didn’t look too hard). To be fair, this looks like it will only run if the database doesn’t exists (not only if the tables aren’t there). But I would still make this an explicit step.

The result of running the code is:

image

Now the time came to try executing my queries:

var actionsForUser = 
    (
        from action in ctx.Actions
        where action.User.Username == "Ayende"
        select action
    )
    .ToList();

var remindersForToday =
    (
        from reminder in ctx.Actions.OfType<Reminder>()
        where reminder.Date == DateTime.Today
        select reminder
    )
    .ToList();

Which resulted in:

image

That has been a pretty brief overview of Entity Framework Code Only, but I am impressed, the whole process has been remarkably friction free, and the time to go from nothing to a working model has been extremely short.

Slaying relational dragons

I recently had a fascinating support call, talking about how to optimize a very big model and an access pattern that basically required to have the entire model in memory for performing certain operations.

A pleasant surprise was that it wasn’t horrible (when I get called, there is usually a mess), which is what made things interesting. In the space of two hours, we managed to:

  • Reduced number of queries by 90%.
  • Reduced size of queries by 52%.
  • Increased responsiveness by 60%, even for data set an order of magnitude.

My default answer whenever I am asked when to use NHibernate is: Whenever you use a relational database.

My strong recommendation at the end of that support call? Don’t use a relational DB for what you are doing.

The ERD just below has absolutely nothing to do with the support call, but hopefully it will help make the example. Note that I dropped some of the association tables, to make it simpler.

image

And the scenario we have to deal with is this one:

image

Every single table in the ERD is touched by this screen.  Using a relational database, I would need something like the following to get all this data:

SELECT * 
FROM   Users 
WHERE  Id = @UserID 

SELECT * 
FROM   Subscriptions 
WHERE  UserId = @UserId 
       AND GETDATE() BETWEEN StartDate AND EndDate 

SELECT   MIN(CheckedBooks.CheckedAt), 
         Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     CheckedBooks 
         JOIN Books 
           ON BookId 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    CheckedBooks.UserId = @UserId 
GROUP BY BookId 

SELECT   Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     Books 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    BookId IN (SELECT BookID 
                    FROM   QueuedBooks 
                    WHERE  UserId = @UserId) 
GROUP BY BookId 

SELECT   Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     Books 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    BookId IN (SELECT BookID 
                    FROM   RecommendedBooks 
                    WHERE  UserId = @UserId) 
GROUP BY BookId 

SELECT   Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     Books 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    Books.Name LIKE @search 
          OR Categories.Name LIKE @search 
          OR Reviews.Review LIKE @search 
GROUP BY BookId

Yes, this is a fairly simplistic approach, without de-normalization, and I would never perform searches in this manner, but… notice how complex things are getting. For bonus points, look at the forth query, the queued books are ordered, try to figure out how we can get the order in a meaningful way. I shudder to thing about the execution plan of this set of queries. Even if we ignore the last one that does full text searching in the slowest possible way. And this is just for bringing the data for a single screen, assuming that magically it will show up (you need to do a lot of manipulation at the app level to make this happen).

The problem is simple, our data access pattern and the data storage technology that we use are at odds with one another. While relational modeling dictate normalization, our actual data usage means that we don’t really deal with a single-row entity, with relatively rare access to associations, which is the best case for OLTP. Nor are we dealing with set based logic, which is the best case for OLAP / Relational based queries.

Instead, we are dealing an aggregate that spans multiple tables, mostly because we have no other way to express lists and many to many associations in a relational database.

Let us see how we could handle things if we were using a document or key/value database. We would have two aggregates, User and Book.

GetUser(userId) –> would result in:

image

We can now issue another query, to bring the associated books. GetBooks(153, 1337) would result in:

image

Note that the entire data structure is different, we haven’t just copied the normalized relational model, we have a totally different model. An aggregate (similar to DDD’s aggregate) is a single entity that contains anything except other aggregates. References to other aggregates are allowed (from user to all the books), but most of the entity’s data is stored as a single value.

That has several interesting implications. First, we need two queries to get the data for the screen. One to get the user’s data, and the second to get the books that we need to display. Reducing remote calls is something that you really care about, and simplifying the queries to mere query by ids is going to have a significant effect as well.

By changing the data storage technology, we also enforced a very rigid aggregate boundary. Transactions becomes much simpler as well, since most transactions will now modify only a single aggregate, which is a single operation, no matter how many actual operations we perform on that aggregate. And by tailoring the data structure that we use to match our needs, we have natural aggregate boundaries.

The end result is a far simpler method of working with the data. It may mean that we have to do more work upfront, but look at the type of work we would have to do in order to try to solve our problems using the relational model. I know what model I would want for this sort of a problem.

If you are way off in the deep end, there is only so much that tooling can do for you

I get a lot of requests for what I term, the regex problem. Why the regex problem?

Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems.Jamie Zawinski in comp.lang.emacs.

A case in point, which comes up repeatedly, is this question:

Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?

In this case, we have someone who wants to load a blog, all its posts, and all its comments, and do it in the most efficient manner possible. At the same time, they want to have the tool handle that for them.

Let us take a look at how two different OR/Ms handle this task, then discuss what an optimal solution is.

First, Entity Framework, using this code:

db.Blogs
    .Include("Posts")
    .Include("Posts.Comments")
    .Where(x => x.Id == 1)
    .ToList();

This code will generate:

SELECT   [Project2].[Id]             AS [Id],
         [Project2].[Title]          AS [Title],
         [Project2].[Subtitle]       AS [Subtitle],
         [Project2].[AllowsComments] AS [AllowsComments],
         [Project2].[CreatedAt]      AS [CreatedAt],
         [Project2].[C1]             AS [C1],
         [Project2].[C4]             AS [C2],
         [Project2].[Id1]            AS [Id1],
         [Project2].[Title1]         AS [Title1],
         [Project2].[Text]           AS [Text],
         [Project2].[PostedAt]       AS [PostedAt],
         [Project2].[BlogId]         AS [BlogId],
         [Project2].[UserId]         AS [UserId],
         [Project2].[C3]             AS [C3],
         [Project2].[C2]             AS [C4],
         [Project2].[Id2]            AS [Id2],
         [Project2].[Name]           AS [Name],
         [Project2].[Email]          AS [Email],
         [Project2].[HomePage]       AS [HomePage],
         [Project2].[Ip]             AS [Ip],
         [Project2].[Text1]          AS [Text1],
         [Project2].[PostId]         AS [PostId]
FROM     (SELECT [Extent1].[Id]             AS [Id],
                 [Extent1].[Title]          AS [Title],
                 [Extent1].[Subtitle]       AS [Subtitle],
                 [Extent1].[AllowsComments] AS [AllowsComments],
                 [Extent1].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C1],
                 [Project1].[Id]            AS [Id1],
                 [Project1].[Title]         AS [Title1],
                 [Project1].[Text]          AS [Text],
                 [Project1].[PostedAt]      AS [PostedAt],
                 [Project1].[BlogId]        AS [BlogId],
                 [Project1].[UserId]        AS [UserId],
                 [Project1].[Id1]           AS [Id2],
                 [Project1].[Name]          AS [Name],
                 [Project1].[Email]         AS [Email],
                 [Project1].[HomePage]      AS [HomePage],
                 [Project1].[Ip]            AS [Ip],
                 [Project1].[Text1]         AS [Text1],
                 [Project1].[PostId]        AS [PostId],
                 CASE 
                   WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
                   ELSE CASE 
                          WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
                          ELSE 1
                        END
                 END AS [C2],
                 CASE 
                   WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
                   ELSE CASE 
                          WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
                          ELSE 1
                        END
                 END AS [C3],
                 [Project1].[C1]            AS [C4]
          FROM   [dbo].[Blogs] AS [Extent1]
                 LEFT OUTER JOIN (SELECT [Extent2].[Id]       AS [Id],
                                         [Extent2].[Title]    AS [Title],
                                         [Extent2].[Text]     AS [Text],
                                         [Extent2].[PostedAt] AS [PostedAt],
                                         [Extent2].[BlogId]   AS [BlogId],
                                         [Extent2].[UserId]   AS [UserId],
                                         [Extent3].[Id]       AS [Id1],
                                         [Extent3].[Name]     AS [Name],
                                         [Extent3].[Email]    AS [Email],
                                         [Extent3].[HomePage] AS [HomePage],
                                         [Extent3].[Ip]       AS [Ip],
                                         [Extent3].[Text]     AS [Text1],
                                         [Extent3].[PostId]   AS [PostId],
                                         1                    AS [C1]
                                  FROM   [dbo].[Posts] AS [Extent2]
                                         LEFT OUTER JOIN [dbo].[Comments] AS [Extent3]
                                           ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1]
                   ON [Extent1].[Id] = [Project1].[BlogId]
          WHERE  1 = [Extent1].[Id]) AS [Project2]
ORDER BY [Project2].[Id] ASC,
         [Project2].[C4] ASC,
         [Project2].[Id1] ASC,
         [Project2].[C3] ASC

If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.

What about NHibernate? The following code:

var blogs = s.CreateQuery(
    @"from Blog b 
        left join fetch b.Posts p 
        left join fetch p.Comments 
    where b.Id = :id")
    .SetParameter("id", 1)
    .List<Blog>();

Will generate a much saner statement:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       comments2_.Id         as Id2_2_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__,
       comments2_.Name       as Name2_2_,
       comments2_.Email      as Email2_2_,
       comments2_.HomePage   as HomePage2_2_,
       comments2_.Ip         as Ip2_2_,
       comments2_.Text       as Text2_2_,
       comments2_.PostId     as PostId2_2_,
       comments2_.PostId     as PostId1__,
       comments2_.Id         as Id1__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
       left outer join Comments comments2_
         on posts1_.Id = comments2_.PostId
where  blog0_.Id = 1 /* @p0 */

While this is a saner statement, it will also generate a Cartesian product. There are no two ways about it, this is bad bad bad bad.

And the way to do that is quite simple, don’t try to do it in a single query, instead, we can break it up into multiple queries, each loading just a part of the graph and rely on the Identity Map implementation to stitch the graph together.  You can read the post about it here. Doing this may require more work on your part, but it will end up being much faster, and it is also something that would be much easier to write, maintain and work with.

Designing the Entity Framework 2nd level cache

One of the things that I am working on is another commercial extension to EF, a 2nd level cache. At first, I thought to implement something similar to the way NHibernate does this, that is, to create two layers of caching, one for entity data and the second for query results where I would store only scalar information and ids.

That turned out to be quite hard. In fact, it turned out to be hard enough that I almost gave up on that. Sometimes I feel that extending EF is like hitting your head against the wall, eventually you either collapse or the wall fall down, but either way you are left with a headache.

At any rate, I eventually figured out a way to get EF to tell me about entities in queries and now the following works:

// will hit the DB
using (var db = new Entities(conStr))
{
    db.Blogs.Where(x => x.Title.StartsWith("The")).FirstOrDefault();
}

// will NOT hit the DB, will use cached data for that
using(var db = new Entities(conStr))
{
   db.Blogs.Where(x => x.Id == 1).FirstOrDefault();
}

The ability to handle such scenarios is an important part of what makes the 2nd level cache useful, since it means that you aren’t limited to just caching a query, but can perform far more sophisticated caching. It means better cache freshness and a lot less unnecessary cache cleanups.

Next, I need to handle partially cached queries, cached query invalidation and a bunch of other minor details, but the main hurdle seems to be have been dealt with (I am willing to lay odds that I will regret this statement).

NHibernate vs. Entity Framework 4.0

This is a question that I get very frequently, and I always tried to dodged the bullet, but I get it so much that I feel that I have to provide an answer. Obviously, I am (not so) slightly biased toward NHibernate, so while you read it, please keep it in mind.

EF 4.0 has done a lot to handle the issues that were raised with the previous version of EF. Thinks like transparent lazy loading, POCO classes, code only, etc. EF 4.0 is a much nicer than EF 1.0.

The problem is that it is still a very young product, and the changes that were added only touched the surface. I already talked about some of my problems with the POCO model in EF, so I won’t repeat that, or my reservations with the Code Only model. But basically, the major problem that I have with those two is that there seems to be a wall between what experience of the community and what Microsoft is doing. Both of those features shows much of the same issues that we have run into with NHibernate and Fluent NHibernate. Issues that were addressed and resolved, but show up in the EF implementations.

Nevertheless, even ignoring my reservations about those, there are other indications that NHibernate’s maturity makes itself known. I run into that several times while I was writing the guidance for EF Prof, there are things that you simple can’t do with EF, that are a natural part of NHibernate.

I am not going to try to do a point by point list of the differences, but it is interesting to look where we do find major differences between the capabilities of NHibernate and EF 4.0. Most of the time, it is in the ability to fine tune what the framework is actually doing. Usually, this is there to allow you to gain better performance from the system without sacrificing the benefits of using an OR/M in the first place.

Here is a small list:

  • Write batching – NHibernate can be configured to batch all writes to the database so that when you need to write several statements to the database, NHibernate will only make a single round trip, instead of going to the database per each statement.
  • Read batching / multi queries / futures – NHibernate allows to batch several queries into a single round trip to the database, instead of separate roundtrip per each query.
  • Batched collection loads – When you lazy load a collection, NHibernate can find other collections of the same type that weren’t loaded, and load all of them in a single trip to the database. This is a great way to avoid having to deal with SELECT N+1.
  • Collection with lazy=”extra” – Lazy extra means that NHibernate adapts to the operations that you might run on top of your collections. That means that blog.Posts.Count will not force a load of the entire collection, but rather would create a “select count(*) from Posts where BlogId = 1” statement, and that blog.Posts.Contains() will likewise result in a single query rather than paying the price of loading the entire collection to memory.
  • Collection filters and paged collections  - this allows you to define additional filters (including paging!) on top of your entities collections, which means that you can easily page through the blog.Posts collection, and not have to load the entire thing into memory.
  • 2nd level cache – managing the cache is complex, I touched on why this is important before, so I’ll skip if for now.
  • Tweaking – this is something that is critical whenever you need something that is just a bit beyond what the framework provides. With NHibernate, in nearly all the cases, you have an extension point, with EF, you are completely and utterly blocked.
  • Integration & Extensibility – NHibernate has a lot of extension projects, such as NHibernate Search, NHibernate Validator, NHibernate Shards, etc. Such projects not only do not exists for EF, but they cannot be written, for the most part, because EF has no extension points to speak of.

On the other side, however:

  • EF 4.0 has a better Linq provider than the current NHibernate implementation. This is something being actively worked on and the NH 3.0 will fix this gap.
  • EF is from Microsoft.

Responding to how EF is better NH commentary…

My last post caused quite a bit of furor, and I decided that I wanted to respond to all the comments in a single place.

  • EF has a designer, NHibernate does not.
    This is actually false, NHibernate has multiple designers available for it. Active Writer (Free, OSS, integrated into VS), Visual NHibernate (Commercial, Beta) and LLBLGen (Commercial, forthcoming in early 2010). I would say that using a designer with NHibernate isn’t something that is very common, most people tend to either code gen the entire model and tweak that (minority) or hand craft the model & mapping. That isn’t for lack of options, it is because it is simply more efficient to do so in most cases.
  • EF is from Microsoft.
    Yes, it is. That is a good point, because it reflect on support & documentation. Unfortunately, the fact that this was one of the most prominent reasons quoted in the replies is also interesting. It says a lot about the relative quality of the products themselves. Another issue with a data access framework from Microsoft is that history teaches us that few data access methods from Microsoft survive the 2 years mark, and none survive the 5 years mark. RDO, ADO, ADO.Net, Object Spaces, Linq To Sql, Entity Framework – just to name a few.
  • Downstream integration.
    That was mentioned a few times, as in integration with data services, WCF RIA, etc. I want to divide my comment to that into two parts. First, everything that exists right now can be use with NHibernate. Second, EF is supposed to come with reporting / BI tools in the future. Currently, everything that came up was easily usable with NHibernate, so I am not really worried about it.
  • In the Future it will be awesome.
    Some people pointed out that Microsoft is able to allocate more resources for EF than an OSS project can. That is true to a point. One of the problems that Microsoft is facing is that it has to pay a huge amount of taxes in the way to create a releasable product. That means that it is typically easier for an OSS project to release faster than a comparable Microsoft project.

So far, I find it really interesting that no one came up with any concrete feature that EF can do that NHibernate can’t. I am going to let you in on a secret, when EF was announced, there were exactly two things that it could do that NHibernate could not. Both were fixed before EF 1.0 shipped, just because that fact annoyed me.

Are you telling me that no such thing exists for the new version?

What can EF 4.0 do that NHibernate can’t?

I am trying to formulate my formal response to “NH vs. EF” question, and while I have a pretty solid draft ready, I found out that my response is pretty biased. I am not happy with that, so I wanted to pose this as a real question.

So far, I came up with:

  • EF 4.0 has a better Linq provider than the current NHibernate implementation. This is something being actively worked on and the NH 3.0 will fix this gap.

My sense of fairness says that this can’t be it, so please educate me.

Tags:

Published at

Originally posted at

Comments (60)

Chasing the SQL Injection that never was

So, I am sitting there quietly trying to get EF Prof to work in a way that I actually like, when all of a sudden I realize that I am missing something very important, I can’t see the generated queries parameters in the profiler.

Looking closely, I started investigating what appear to be a possible SQL injection issue with EF. My issue was that this query:

entities.Posts.Where(x=>x.Title == “hello”)

Generated the following SQL:

SELECT
1 AS [C1],
[Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title],
[Extent1].[Text] AS [Text],
[Extent1].[PostedAt] AS [PostedAt],
[Extent1].[BlogId] AS [BlogId],
[Extent1].[UserId] AS [UserId]
FROM [dbo].[Posts] AS [Extent1]
WHERE N'hello' = [Extent1].[Title]

It literally drove me crazy. Eventually I tried this query:

var hello = "hello";
entities.Posts.Where(x=>x.Title==hello);

Which generated:

SELECT
1 AS [C1],
[Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title],
[Extent1].[Text] AS [Text],
[Extent1].[PostedAt] AS [PostedAt],
[Extent1].[BlogId] AS [BlogId],
[Extent1].[UserId] AS [UserId]
FROM [dbo].[Posts] AS [Extent1]
WHERE [Extent1].[Title] = @p__linq__1

This was more like it.

It seems (and Julie Lerman confirmed it) that EF is sticking constant expressions directly into the SQL, and treating parameters differently.

I am not quite sure why, but from security standpoint, it is obviously not a problem if it does so for constants. It have a lot less hair now, though.

Entity Framework Profiler

I was sitting with Julie Lerman today, and we got into a discussion on how to provide more information to EF users. It appears that there is much need for that.

This is what I do, more or less, so we decided to tackle that problem in the bar. Some drinks later, we had a working version of EF Prof that was actually able to intercept all queries coming from the Entity Framework. Initial testing also shows that I’ll be able to provide much more information about EF than I’ll be able to do with Linq to SQL.

Unfortunately, the current way of intercepting EF traffic is extremely invasive, and I don’t really like it. I consider it a good proof of concept, but I am going to spend some of next week trying to figure out a less invasive approach.

In the meanwhile, take a look (not the final look & feel):

image

image

And here is the project that we profile:

image

It supports all the usual niceties, so you get stack trace support, tracking selects (including lazy loading), updates, inserts & deletes. And I tested it on both EF 3.5 and EF 4.0.

I expect to have a private beta starting next week…

What is up with the Entity Framework vNext?

Every now and then I do a quick check on the EF blog, just to see what there status is. My latest peek had caused me to gulp. I am not sure where the EF is going with things, I just know that I don’t really like it.

For a start, take a look at the follow sample from their Code Only mapping (basically Fluent NHibernate):

.Case<Employee>(
e => new {
manager = e.Manager.Id
thisIsADiscriminator = “E”
}
)

There are several things wrong here: “manager” and “thisIsADiscriminator” are strings for all intent and purposes. The compiler isn’t going to check them, they aren’t there to do something, they are just there to avoid being a literal string. But they are strings.

Worse, “thisIsADiscriminator” is a magic string.

Second, and far more troubling, I am looking at this class definition and I cringe:

public class Category{
public int ID {get;set;}
public string Name {get;set;}
public List<Product> Products {get;set;}
}

The problem is quite simple, this class has no hooks for lazy loading. You have to load everything here in one go. Worse, you probably need to load the entire graph in one go. That is scaring me on many levels.

I am not sure how the EF is going to handle it, but short of IL Rewrite techniques, which I don’t think the EF is currently using, this is a performance nightmare waiting to happen.

Answer: The lazy loaded inheritance many to one association OR/M conundrum

Update: It appears that I am wrong, and NHibernate can support this functionality by eagerly loading the association at load time. You can do by specifying lazy="false" (and optionally, outer-join="true") on the many to one association.

Yesterday I presented an interesting problem that pop up with any OR/M that supports inheritance and lazy loading.

Let us say that we have the following entity model:

image_thumb

Backed by the following data model:

image_thumb[2]

As you can see, we map the Animal hierarchy to the Animals table, and we have a polymorphic association between Animal Lover and his/her animal. Where does the problem starts?

Well, let us say that we want to load the animal lover. We do that using the following SQL:

SELECT Name,
	 Animal,
	 Id
FROM AnimalLover
WHERE Id = 1 /* @p0 */

And now we have an animal lover instance:

var animalLover = GetAnimalLoverById(1);
var isDog = animalLover.Animal is Dog;
var isCat = animalLover.Animal is Cat;

Can you guess what would be the result of this code?

The answer is that both isDog nor isCat would be… false.

But how is that?

To answer that question, let us take a look at the SQL that was used to load the animal lover, and let us take a look at a typical example of hydrating entities. I am using Davy’s DAL here to show off the problem, because the code is simple and it demonstrate that the problem is not unique to a particular OR/M, but is shared among all of them (Davy’s DAL doesn’t even support inheritance, for example).

private void SetReferenceProperties<TEntity>(
	TableInfo tableInfo, 
	TEntity entity, 
	IDictionary<string, object> values)
{
	foreach (var referenceInfo in tableInfo.References)
	{
		if (referenceInfo.PropertyInfo.CanWrite == false)
			continue;
		
		object foreignKeyValue = values[referenceInfo.Name];

		if (foreignKeyValue is DBNull)
		{
			referenceInfo.PropertyInfo.SetValue(entity, null, null);
			continue;
		}

		var referencedEntity = sessionLevelCache.TryToFind(
			referenceInfo.ReferenceType, foreignKeyValue);
			
		if(referencedEntity == null)
			referencedEntity = CreateProxy(tableInfo, referenceInfo, foreignKeyValue);
								   
		referenceInfo.PropertyInfo.SetValue(entity, referencedEntity, null);
	}
}

Take a look at what the code is doing, we are currently processing the Aminal property on the AnimalLover class. And we try to find an Animal that was loaded with a primary key matching to the value of the Animal column in the AnimalLovers table.

When we can’t find it, we have to create a lazy loading proxy for the referenced entity. And here is where the conundrum kicks into play. When we have inheritance, we have a real problem here. What is the type of the referenced entity?

From the model, we know that it must be a derivation of Animal of some sort, and we have its PK, but we have no way of knowing which without going to the database for it.

So what are we going to do? Because we don’t have enough information to create a lazy loading proxy of the appropriate type, we actually generate a lazy loading proxy of the type that we do know about, Animal.

But what about when it is being loaded?

Well, that is where the lack of #become in .NET becomes painful, we already have an instance, and we can’t change its types. And we can’t replace the reference on the AnimalLover because someone might have grab a reference to the animal before the lazy load.

The way to handle it is by turning the lazy loading proxy into a real one. We load a new instance that represent the entity, now with the correct type, since we query the DB to find out what it is (along with the rest of the entity’s data).

And the lazy loading proxy that we originally used is now loaded, and any call made on it will be forwarded to the new instance that was loaded.

animalLover.Animal stays an AnimalProxy, and cannot be cast to a Dog or a Cat, even if the actual row it is pointing to is a Dog or a Cat.

Challenge: The lazy loaded inheritance many to one association OR/M conundrum

Here is an interesting problem that pop up with any OR/M that supports inheritance and lazy loading.

Let us say that we have the following entity model:

image

Backed by the following data model:

image

As you can see, we map the Animal hierarchy to the Animals table, and we have a polymorphic association between Animal Lover and his/her animal. Where does the problem starts?

Well, let us say that we want to load the animal lover. We do that using the following SQL:

SELECT Name,
	 Animal,
	 Id
FROM AnimalLover
WHERE Id = 1 /* @p0 */

And now we have an animal lover instance:

var animalLover = GetAnimalLoverById(1);
var isDog = animalLover.Animal is Dog;
var isCat = animalLover.Animal is Cat;

Can you guess what would be the result of this code?

I’ll post the actual answer tomorrow…

A guide into OR/M implementation challenges: Custom Queries

Continuing to shadow Davy’s series about building your own DAL, this post is about the Executing Custom Queries.

The post does a good job of showing how you can create a good API on top of what is pretty much raw SQL. If you do have to create your own DAL, or need to use SQL frequently, please use something like that rater than using ADO.Net directly.

ADO.Net is not a data access library, it is the building blocks you use to build a data access library.

A few comments on queries in NHibernate. NHibernate uses a more complex model for queries, obviously. We have a set of abstractions between the query and generated SQL, because we are supporting several query options and a lot more mapping options. But while a large amount of effort goes into translating the user desires to SQL, there is an almost equivalent amount of work going into hydrating the queries. Davy’s support only a very flat model, but with NHibernate, you can specify eager load options, custom DTOs or just plain value queries.

In order to handle those scenarios, NHibernate tracks the intent behind each column, and know whatever a column set represent an entity an association, a collection or a value. That goes through a fairly complex process that I’ll not describe here, but once the first stage hydration process is done, NHibernate has a second stage available. This is why you can write queries such as “select new CustomerHealthIndicator(c.Age,c.SickDays.size()) from Customer c”.

The first stage is recognizing that we aren’t loading an entity (just fields of an entity), the second is passing them to the CustomerHealthIndicator constructor. You can actually take advantage of the second stage yourself, it is called Result Transformer, and you can provide you own and set it a query using SetResultTransformer(…);

Tags:

Published at

Originally posted at

Comments (16)

A guide into OR/M implementation challenges: Lazy loading

Continuing to shadow Davy’s series about building your own DAL, this post is about the Lazy Loading. Davy support lazy loading is something that I had great fun reading, it is simple, elegant and beautiful to read.

I don’t have much to say about the actual implementation, NHibernate does things in much the same way (we can quibble about minor details such as who holds the identifier and other stuff, but they aren’t really important). A major difference between Davy’s lazy loading approach and NHibernate’s is that Davy doesn’t support inheritance. Inheritance and lazy loading plays some nasty games with NHibernate implementation of lazy loaded entities.

While Davy can get away with loading the values into the same proxy object that he is using, NHibernate must load them into a separate object. Why is that? Let us say that we have a many to one association from AnimalLover to Animal. That association is lazy loaded, so we put a AnimalProxy (which inherit from Animal) as the value in the Animal property. Now, when we want to load the Animal property, NHibernate has to load the entity, at that point, it discovers that the entity isn’t actually an Animal, but a Dog.

Obviously, you cannot load a Dog into an AnimalProxy. What NHibernate does in this case is load the entity into another object (a Dog instance) and once that instance is loaded, direct all methods calls to the new instance. It sounds complicated, but it is actually quite elegant and transparent from the user perspective.

A guide into OR/M implementation challenges: The Session Level Cache

Continuing to shadow Davy’s series about building your own DAL, this post is about the Session Level Cache.

The session cache (first level cache in NHibernate terms) exists to support one main scenario, in a single session, a row in the database is represented by a single instance. That means that the session needs to track all the instances that it loads and be able to search through them. Davy does a good job covering how it is used, and the implementation is quite similar to the way it is done in NHibernate.

Davy’s implementation is to use a nested dictionary to hold instances per entity type. This is done mainly to support  RemoveAllInstancesOf<TEntity>(), a method that is unique to Day’s DSL. The reasoning for that method are interesting:

When a user executes a custom DELETE statement, there is no way for us to know which entities were actually removed from the database. But if any of those deleted entities happen to remain in the SessionLevelCache, this could lead to buggy application code whenever a piece of code tries to retrieve a specific entity which has already been removed from the database, but is still present in the SessionLevelCache. In order to deal with this scenario, the SessionLevelCache has a ClearAll and a RemoveAllInstancesOf method which you can use from your application code to either clear the entire SessionLevelCache, or to remove all instances of a specific entity type from the cache.

Personally, I think this is using an ICBM to crack eggshells. But I am probably being unfair. NHibernate has much the same issue, if you issue a Delete via SQL or HQL queries, NHibernate doesn’t have a way to track what was actually delete and deal with it. With NHibernate, it doesn’t tend to be a problem for the session level cache, mostly because of usage habits than anything else. The session used to do so rarely have to deal with entities loaded that were deleted by the query issue (and if it does, the user needs to handle that by calling Evict() on all the objects manually). NHibernate doesn’t try to support this scenario explicitly for the session cache. It does support this very feature for the second level cache.

It make sense, though. With NHibernate, in the vast majority of cases deleting is going to be done using NHibernate itself, rather than a special query. With Davy’s DAL, the usage of SQL queries for deletes is going to be much higher.

Another interesting point in Davy’s post is the handling of queries:

When a custom query is executed, or when all instances are retrieved, there is no way for us to exclude already cached entity instances from the result of the query. Well, theoretically speaking you could attempt to do this by adding a clause to the WHERE statement of each query that would prevent cached entities from being loaded. But then you might have to add the cached entity instances to the resulting list of entities anyways if they would otherwise satisfy the other query conditions. Obviously, trying to get this right is simply put insane and i don’t think there’s any DAL or ORM that actually does this (even if there was, i can’t really imagine any of them getting this right in every corner case that will pop up).

So a good compromise is to simply check for the existence of a specific instance in the cache before hydrating a new instance. If it is there, we return it from the cache and we skip the hydration for that database record. In this way, we avoid having to modify the original query, and while we could potentially return a few records that we already have in memory, at least we will be sure that our users will always have the same reference for any particular database record.

This is more or less how NHibernate operates, and for much the same reasoning. But there is a small twist. In order to ensure query coherency between the data base queries and in memory entities, NHibernate will optionally try to flush all the items in the session level cache that have been changed that may be affected by the query. A more detailed description of this can be found here, Davy’s DAL doesn’t do automatic change tracking, so this is not a feature that can be easily added with this prerequisite.

Tags:

Published at

Originally posted at

Comments (7)

A guide into OR/M implementation challenges: Hydrating Entities

Continuing to shadow Davy’s series about building your own DAL, this post is about hydrating entities.

Hydrating Entities is the process of taking a row from the database and turning it into an entity, while de-hydrating in the reverse process, taking an entity and turning it into a flat set of values to be inserted/updated.

Here, again, Davy’s has chosen to parallel NHibernate’s method of doing so, and it allows us to take a look at a very simplified version and see what the advantages of this approach is.

First, we can see how the session level cache is implemented, with the check being done directly in the entity hydration process. Davy has some discussion about the various options that you can choose at that point, whatever to just use the values from the session cache, to update the entity with the new values or to throw if there is a change conflict.

NHibernate’s decision at this point was to assume that the entity that we have is correct, and ignore any changes made in the meantime to the database. That turn out to be a good approach, because any optimistic concurrency checks that we might want will run when we commit the transaction, so there isn’t much difference from the result perspective, but it does simplify the behavior of NHibernate.

Next, there is the treatment of reference properties, what NHibernate call many-to-one associations. Here is the relevant code (editted slightly so it can fit on the blog width):

private void SetReferenceProperties<TEntity>(
	TableInfo tableInfo, 
	TEntity entity, 
	IDictionary<string, object> values)
{
	foreach (var referenceInfo in tableInfo.References)
	{
		if (referenceInfo.PropertyInfo.CanWrite == false)
			continue;
		
		object foreignKeyValue = values[referenceInfo.Name];

		if (foreignKeyValue is DBNull)
		{
			referenceInfo.PropertyInfo.SetValue(entity, null, null);
			continue;
		}

		var referencedEntity = sessionLevelCache.TryToFind(
			referenceInfo.ReferenceType, foreignKeyValue);
			
		if(referencedEntity == null)
			referencedEntity = CreateProxy(tableInfo, referenceInfo, foreignKeyValue);
								   
		referenceInfo.PropertyInfo.SetValue(entity, referencedEntity, null);
	}
}

There are a lot of things going on here, so I’ll take them one at a time.

You can see how the uniquing process is going on. If we already have the referenced entity loaded, we will get it directly from the session cache, instead of creating a separate instance of it.

It also shows something that Davy’s promised to touch in a separate post, lazy loading. I had an early look at his implementation and it is pretty. So I’ll skip that for now.

This piece of code also demonstrate something that is very interesting. The lazy loaded inheritance many to one association conundrum.  Which I’ll touch on a future post.

There are a few other implications of the choice of hydrating entities in this fashion. For a start, we are working with detached entities this way, the entity doesn’t have to have a reference to the session (except to support lazy loading). It also means that our entities are pure POCO, we handle it all completely externally to the entity itself.

It also means that if we would like to handle change tracking (with Davy’s DAL currently doesn’t do), we have a much more robust way of doing so, because we can simply dehydrate the entity and compare its current state to its original state. That is exactly how NHibernate is doing it. This turn out to be a far more robust approach, because it is safe in the face of method modifying state internally, without going through properties or invoking change tracking logic.

I wanted to also touch about a few things that makes the NHibernate implementation of the same thing a bit more complex. NHibernate supports reflection optimization and multiple ways of actually setting the values on the entity, is also support things like components and multi column properties, which means that there isn’t a neat ordering between properties and columns that make the Davy’s code so orderly.

Tags:

Published at

Originally posted at

Comments (1)

A guide into OR/M implementation challenges: CRUD

Continuing the series about Davy’s build your own DAL, this post is talking about CRUD functionality.

One of the more annoying things about any DAL is dealing with the repetitive nature of talking to the data source. One of Davy’s stated goal in going this route is to totally eliminate those. CRUD functionality shouldn’t be something that you have to work with for each entity, it is just something that exists and that you get for free whenever you are using it.

I think he was very successful there, but I also want to talk about his method in doing so. Not surprisingly, Davy’s DAL is taking a lot of concepts from NHibernate, simplifying them a bit and then applying them. His approach for handling CRUD operations is reminiscent of how NHibernate itself works.

He divided the actual operations into distinct classes, called DatabaseActions, and he has things like FindAllAction, InsertAction, GetByIdAction.

This architecture gives you two major things, maintaining things is now far easier, and playing around with the action implementations is far easier. It is just a short step away from Davy’s Database Actions to NHibernate’s event & listeners approach.

This is also a nice demonstrations of the Single Responsibility Principle. It makes maintaining the software much easier than it would be otherwise.

Tags:

Published at

Originally posted at

A guide into OR/M implementation challenges: Mapping

Continuing my shadowing of Davy’s Build Your Own DAL series, this post is shadowing Davy’s mapping post. 

Looking at the mapping, it is fairly clear that Davy has (wisely) made a lot of design decisions along the way that drastically reduce the scope of work he had to deal with. The mapping model is attribute base and essentially supports a simple one to one relation between the classes and the tables. This make things far simpler to deal with internally.

The choice has been made to fix the following:

  • Only support SQL Server
  • Attributed model
  • Primary key is always:
    • Numeric
    • Identity
    • Single Key

Using those rules, Davy create a really slick implementation. About the only complaint that I can make about it is that he doesn’t support having a limit clause of selects.

Take a look at the code he have, it should give you a good idea about what is involved in dealing with mapping between objects and classes.

The really fun part about redoing things that we are familiar with is that we get to ignore all the other things that we don’t want to do which introduce complexity. Davy’s solution works for his scenario, but I want to expand a bit on the additional features that NHibernate has at that layer. It should give you some understanding on the task that NHibernate is solving.

  1. Inheritance, Davy’s DAL is supporting only Table Per Class inheritance. NHibernate support 4 different inheritance model (+ mixed models that we won’t get into here).
  2. Eager loading, something that would add significantly to the complexity of the solution is the ability to load a Product with its Category. That requires that you’ll be able to change the generated SQL dynamically, and more importantly, that you will be able to read it correctly. That is far from simple.
  3. Property that spans multiple columns, it seems like a simple thing, but in actually it affects just about every part of the mapping layer, since it means that all property to column conversion has to take into account multiple columns. It is not so much complex as it is annoying. Especially since you have to either carry the column names or the column indexes all over the place.
  4. Collections, they are complex enough on their own (try to think about the effort involved in syncing changes in a collection in an efficient manner) but the part that really kills with them is trying to do eager loading with them. Oh, but I forgot about one to many vs. many to many. And let us not get into the distinctions between different types of collections.
  5. Optimistic Concurrency, this is actually a feature that would be relatively easy to add, I think. At least, if all you care about is a single type of versioning / optimistic concurrency. NHibernate supports several (detailed here).

I could probably go on, but I think the point was made. As I said before, the problem with taking on something like this is that it either take a day to get the basic functionality going or several months to really get it into shape to handle more than a single scenario.

This series of posts should give you a chance to appreciate what is going on behind the scenes, because you’ll have Davy’s effort at the base functionality, and my comments on what is required to take it to the next level.

A guide into OR/M implementation challenges: Reasoning

image Davy Brion, one of NHibernate’s committers, is running a series of posts discussing building your own DAL. The reasons for doing so varies, in Davy’s case, he has clients who are adamantly against using anything but the Naked CLR. I have run into similar situations before, sometimes it is institutional blindness, sometimes it is legal reasons, sometimes there are actually good reasons for it, although it is rare.

Davy’s approach to the problem was quite sensible. Deprived of his usual toolset, he was unwilling to give up the advantages inherent to them, so he set up to build them. I did much the same when I worked on SvnBridge, I couldn’t use one of the existing OSS containers, but I wasn’t willing to build a large application without one, so I created one.

I touched about this topic in more details in the past. But basically, a single purpose framework is significantly simpler than a general purpose one. That sounds simple and obvious, right? But it is actually quite significant.

You might not get the same richness that you will get with the real deal, but you get enough to get you going. In fact, since you have only a single scenario to cover, it is quite easy to get the features that you need out the door. You are allowed to cheat, after all :-)

In Davy’s case, he made the decision that using POCO and not worrying about persistence all over the place are the most important things, and he set out to get them.

I am going to shadow his posts in the series, talking about the implications of the choices made and the differences between Davy’s approach and NHibernate. I think it would be a good approach to show the challenges inherit in building an OR/M.

Why Defer Loading in Entity Framework isn’t going to work

When I finished reading this post I let out a heavy sigh. It is not going to work. Basically, the EF is going the same way that NHibernate was in NHibernate 1.0 (circa 2005!).

Let me show you how. in the post, the example given is:

public class Category
{
    public int CategoryID { get; set; }
    public string CategoryName { get; set; }
    public virtual List<Product> Products { get; set; }
    ...
}

This looks like it would work, right? The way the EF is doing this is by creating a proxy of your class (similar to the way that NHibernate is doing that) and intercepting the call to get_Products.

It has a few problems, however. Let us start from the most obvious one, program to interface not to implementation. Exposing List<T> means that you have no control whatsoever about what is going on with the collection. It also means that they have to intercept the access to the property, not using the collection. That is bad, it means that they are going to have to do eager loading in all too many cases where NHibernate can just ignore it.

Let us more on a bit, and talk about the inability to support any interesting scenario. If we look at collections with NHibernate, we can take this:

Console.WriteLine(category.Products.Count);

To this SQL:

select count(*) from Products
where CategoryId = 1

With the approach that the EF uses, they will have to load the entire collection.

But all of those are just optimizations, not the make-or-break for the feature. Here is a common scenario that is going to break it:

public class Category
{
    private List<Product> _products;
    public int CategoryID { get; set; }
    public string CategoryName { get; set; }
    public virtual List<Product> Products { get { return _products; } set { _products = value; } }

    public void AddProduct(Product p)
    {
        // do something interesting here
        _products.Add(p);
    }
    ...
}

There is a reason why the default proxies for NHibernate force all members of the entities to be virtual. It is not just because we think everything should be virtual (it should, but that is not a discussion for now). It is all about being able to allow the user to use a real POCO.

In the scenario outlined above, what do you think is going to happen?

AddProduct is a non virtual method call, so it cannot be intercepted. Accessing the _products field also cannot be intercepted.

The end result is a NullReferenceException that will seem to appear randomly, based on whatever something else touched the Property or not. And however nice auto properties are, there are plenty of reason to use fields. Using auto properties just mask the problem, but it is still there.

Oh, and if we want to use our POCO class with EF, forget about the new operator, you have to use:

Category category = context.CreateObject<Category>();

Hm, this just breaks just about anything that relies on creating new classes. Want to use your entity as parameter for ASP.Net MVC action, to be automatically bounded? Forget about it. You have to create your instances where you have access to the context, and that is a database specific thing, not something that you want to have all over the place.

And I really liked this as well:

The standard POCO entities we have talked about until now rely on snapshot based change tracking – i.e. the Entity Framework will maintain snapshots of before values and relationships of the entities so that they can be compared with current values later during Save. However, this comparison is relatively expensive when compared to the way change tracking works with EntityObject based entities.

Hm, this is how NHibernate and Hibernate have always worked. Somehow, I don’t see this showing up as a problem very often.

Final thought, why is the EF calling it Defer Loading? Everyone else call it lazy loading.

Another Entity Framework opinion

I run across this post, which I found interesting:

If someone give me a set of classes that doesn't bring a A to Z solution to a problem, sorry but I don't call it a Framework ... I call it a Base Class Library. I've been very supportive to the Entity Framework Team (I gave design feedback through multiple channels) but now I think I'm done.

This resonate very well with what Udi Dahan has said as a comment to Frans' post.

Microsoft is a platform company.

They build technologies that partners can build stuff on top of.

It's what they've always done.

It's what they're continuing to do.

Unfortunately, many developers in the Microsoft community don't know/understand this, thinking that these technologies are supposed to be used to build applications directly. This often causes overly complex codebases.

The EF's team's decision is consistent with providing a platform for partners to build their own ORMs on.

That being said, I don't care very much for a platform - just as I wouldn't drive the chassis of a car.

Microsoft kills Linq to SQL

In a typical management speak post, the ADO.Net team has killed the Linq to SQL project. I think this is a mistake from Microsoft part. Regardless of how this affects NHibernate (more users to us, yeah!), I think that this is a big mistake

Doing something like this is spitting in the face of everyone who investment time and money in the Linq to SQL framework, only to be left hanging in the wind, with a dead end software and a costly porting process if they ever want to see new features. Linq to SQL is a decent base level OR/M, and I had had several people tell me that they are willing to accept its current deficiencies, knowing that this will be fixed in the next version. Now, there isn't going to be a next version, and that is really bad for Microsoft reputation.

From my point of view, this is going to be an example that I will use whenever someone tries to sell me half baked solutions from Microsoft (just to note, I don't consider Linq to SQL half baked) and tell me to wait for vNext with all the features in the world.

It doesn't matter how I turn this decision, I can't find any way in which it make sense from Microsoft perspective.