Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 189 words

This is in a response to a question on twitter:

image

In general, for applications, I would always use NHibernate. Mostly because I am able to do so much more with it.

For one off utilities and such, where I just need to get the data and I don’t really care how, I would generally just use the EF wizard to generate a model and do something with it. Mostly, it is so I can get the code gen stuff ASAP and have the data working.

For example, the routine to import the data from the Subtext database to the RacconBlog database is using Entity Framework. Mostly because it is easier and it is a one time thing.

I guess that if I was running any of the tools for generating NHibernate model from the database, I would be using it, but honestly, it just doesn’t really matter at that point. I just don’t care for those sort of tools.

time to read 4 min | 686 words

This article thinks so, and I was asked to comment on that. I have to say that I agree with a lot in this article. It starts by laying out what an anti pattern is:

  1. It initially appears to be beneficial, but in the long term has more bad consequences than good ones
  2. An alternative solution exists that is proven and repeatable

And then goes on to list some of the problems with OR/M:

  • Inadequate abstraction - The most obvious problem with ORM as an abstraction is that it does not adequately abstract away the implementation details. The documentation of all the major ORM libraries is rife with references to SQL concepts.
  • Incorrect abstraction – …if your data is not relational, then you are adding a huge and unnecessary overhead by using SQL in the first place and then compounding the problem by adding a further abstraction layer on top of that.
    On the the other hand, if your data is relational, then your object mapping will eventually break down. SQL is about relational algebra: the output of SQL is not an object but an answer to a question.
  • Death by a thousand queries – …when you are fetching a thousand records at a time, fetching 30 columns when you only need 3 becomes a pernicious source of inefficiency. Many ORM layers are also notably bad at deducing joins, and will fall back to dozens of individual queries for related objects.

If the article was about pointing out the problems in OR/M I would have no issues in endorsing it unreservedly. Many of the problems it points out are real. They can be mitigated quite nicely by someone who knows what they are doing, but that is beside the point.

I think that I am in a pretty unique position to answer this question. I have over 7 years of being heavily involved in the NHibernate project, and I have been living & breathing OR/M for all of that time. I have also created RavenDB, a NoSQL database, that gives me a good perspective about what it means to work with a non relational store.

And like most criticisms of OR/M that I have heard over the years, this article does only half the job. It tells you what is good & bad (most bad) in OR/M, but it fails to point out something quite important.

To misquote Churchill, Object Relational Mapping is the worst form of accessing a relational database, except all of the other options when used for OLTP.

When I see people railing against the problems in OR/M, they usually point out quite correctly problems that are truly painful. But they never seem to remember all of the other problems that OR/M usually shields you from.

One alternative is to move away from Relational Databases. RavenDB and the RavenDB Client API has been specifically designed by us to overcome a lot of the limitations and pitfalls inherit to OR/M. We have been able to take advantage of all of our experience in the area and create what I consider to be a truly awesome experience.

But if you can’t move away from Relational Databases, what are the alternative? Ad hoc SQL or Stored Procedures? You want to call that better?

A better alternative might be something like Massive, which is a very thin layer over SQL. But that suffers from a whole host of other issues (no unit of work means aliasing issues, no support for eager load means better chance for SELECT N+1, no easy way to handle migrations, etc). There is a reason why OR/M have reached where they have. There are a lot of design decisions that simply cannot be made any other way without unacceptable tradeoffs.

From my perspective, that means that if you are using Relational Databases for OLTP, you are most likely best served with an OR/M. Now, if you want to move away from Relational Databases for OLTP, I would be quite happy to agree with you that this is the right move to make.

time to read 3 min | 452 words

Fowler defines Active Record as:

An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.

Note that I am talking about the Active Record pattern and not a particular Active Record implementation.

A typical Active Record class will look like this:

image

By design, AR entities suffer from one major problem, they violate SRP, since the class is now responsible for its own persistence and for the business operations that relates to it. But that is an acceptable decision in many cases, because it leads to code that is much simpler to understand.

In most cases, that is.

The problem with AR is that it also treat each individual entity independently. In other words, its usage encourage code like this:

var person = Person.FindBy("Id", 1);
person.Name = "Oren";
person.Save();

This is a very procedural way of going about things, but more importantly, it is a way that focus on work done on a single entity.

When I compare the abilities of the Active Record style to the abilities of a modern ORM, there is really nothing that can compare. Things like automatic change tracking or batching writes to the database are part of the package with an ORM.

The sort of applications we write today aren’t touching a single row in the database, we tend to work with entities (especially root aggregates) with operations that can touch many rows. Writing those sort of applications using Active Record leads to complications, precisely because the pattern is meant to simplify data access. The problem is that our data access needs aren’t so simple anymore.

Active Record can still be useful for very simple scenarios, but I think that a major drive behind it was that it was so easy to implement compared to a more full featured solution. Since we don’t need to implement a full featured solution (it comes for free with your ORM of choice), that isn’t really a factor anymore. Using a full blown ORM is likely to be easier and simpler.

time to read 7 min | 1268 words

After making EF Prof work with EF Code Only, I decided that I might take a look at how Code Only actually work from the perspective of the application developer. I am working on my own solution based on the following posts:

But since I don’t like to just read things, and I hate walkthroughs, I decided to take that into a slightly different path. In order to do that, I decided to set myself the following goal:

image

  • Create a ToDo application with the following entities:
    • User
    • Actions (inheritance)
      • ToDo
      • Reminder
  • Query for:
    • All actions for user
    • All reminders for today across all users

That isn’t really a complex system, but my intention is to get to grips with how things work. And see how much friction I encounter along the way.

We start by referencing “Microsoft.Data.Entity.Ctp” & “System.Data.Entity”

There appears to be a wide range of options to define how entities should be mapped. This include building them using a fluent interface, creating map classes or auto mapping. All in all, the code shows a remarkable similarity to Fluent NHibernate, in spirit if not in actual API.

I don’t like some of the API:

  • HasRequired and HasKey for example, seems to be awkwardly named to me, especially when they are used as part of a fluent sentence. I have long advocated avoiding the attempt to create real sentences in a fluent API (StructureMap was probably the worst in this regard). Dropping the Has prefix would be just as understandable, and look better, IMO.
  • Why do we have both IsRequired and HasRequired? The previous comment apply, with the addition that having two similarly named methods that appears to be doing the same thing is probably not a good idea.

But aside from that, it appears very nice.

ObjectContext vs. DbContext

I am not sure why there are two of them, but I have a very big dislike of ObjectContext, the amount of code that you have to write to make it work is just ridiculous, when you compare that to the amount of code you have to write for DbContext.

I also strongly dislike the need to pass a DbConnection to the ObjectContext. The actual management of the connection is not within the scope of the application developer. That is within the scope of the infrastructure. Messing with DbConnection in application code should be left to very special circumstances and require swearing an oath of nonmaleficence. The DbContext doesn’t require that, so that is another thing that is in favor of it.

Using the DbContext is nice:

public class ToDoContext : DbContext
{
    private static readonly DbModel model;

    static ToDoContext()
    {
        var modelBuilder = new ModelBuilder();
        modelBuilder.DiscoverEntitiesFromContext(typeof(ToDoContext));
        modelBuilder.Entity<User>().HasKey(x => x.Username);
        model = modelBuilder.CreateModel();
    }

    public ToDoContext():base(model)
    {
        
    }

    public DbSet<Action> Actions { get; set; }

    public DbSet<User> Users { get; set; }
}

Note that we can mix & match the configuration styles, some are auto mapped, some are explicitly stated. It appears that if you fully follow the builtin conventions, you don’t even need ModelBuilder, as that will be build for you automatically.

Let us try to run things:

using(var ctx = new ToDoContext())
{
    ctx.Users.ToList();
}

The connection string is specified in the app.config, by defining a connection string with the name of the context.

Then I just run it, without creating a database. I expected it to fail, but it didn’t. Instead, it created the following schema:

image

That is a problem, DDL should never run as an implicit step. I couldn’t figure out how to disable that, though (but I didn’t look too hard). To be fair, this looks like it will only run if the database doesn’t exists (not only if the tables aren’t there). But I would still make this an explicit step.

The result of running the code is:

image

Now the time came to try executing my queries:

var actionsForUser = 
    (
        from action in ctx.Actions
        where action.User.Username == "Ayende"
        select action
    )
    .ToList();

var remindersForToday =
    (
        from reminder in ctx.Actions.OfType<Reminder>()
        where reminder.Date == DateTime.Today
        select reminder
    )
    .ToList();

Which resulted in:

image

That has been a pretty brief overview of Entity Framework Code Only, but I am impressed, the whole process has been remarkably friction free, and the time to go from nothing to a working model has been extremely short.

time to read 10 min | 1903 words

I recently had a fascinating support call, talking about how to optimize a very big model and an access pattern that basically required to have the entire model in memory for performing certain operations.

A pleasant surprise was that it wasn’t horrible (when I get called, there is usually a mess), which is what made things interesting. In the space of two hours, we managed to:

  • Reduced number of queries by 90%.
  • Reduced size of queries by 52%.
  • Increased responsiveness by 60%, even for data set an order of magnitude.

My default answer whenever I am asked when to use NHibernate is: Whenever you use a relational database.

My strong recommendation at the end of that support call? Don’t use a relational DB for what you are doing.

The ERD just below has absolutely nothing to do with the support call, but hopefully it will help make the example. Note that I dropped some of the association tables, to make it simpler.

image

And the scenario we have to deal with is this one:

image

Every single table in the ERD is touched by this screen.  Using a relational database, I would need something like the following to get all this data:

SELECT * 
FROM   Users 
WHERE  Id = @UserID 

SELECT * 
FROM   Subscriptions 
WHERE  UserId = @UserId 
       AND GETDATE() BETWEEN StartDate AND EndDate 

SELECT   MIN(CheckedBooks.CheckedAt), 
         Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     CheckedBooks 
         JOIN Books 
           ON BookId 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    CheckedBooks.UserId = @UserId 
GROUP BY BookId 

SELECT   Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     Books 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    BookId IN (SELECT BookID 
                    FROM   QueuedBooks 
                    WHERE  UserId = @UserId) 
GROUP BY BookId 

SELECT   Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     Books 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    BookId IN (SELECT BookID 
                    FROM   RecommendedBooks 
                    WHERE  UserId = @UserId) 
GROUP BY BookId 

SELECT   Books.Name, 
         Books.ImageUrl, 
         AVG(Reviews.NumberOfStars), 
         GROUP_CONCAT(', ',Authors.Name), 
         GROUP_CONCAT(', ',Categories.Name) 
FROM     Books 
         JOIN BookToAuthors 
           ON BookId 
         JOIN Authors 
           ON AuthorId 
         JOIN Reviews 
           ON BookId 
         JOIN BooksCategories 
           ON BookId 
         JOIN Categories 
           ON CategoryId 
WHERE    Books.Name LIKE @search 
          OR Categories.Name LIKE @search 
          OR Reviews.Review LIKE @search 
GROUP BY BookId

Yes, this is a fairly simplistic approach, without de-normalization, and I would never perform searches in this manner, but… notice how complex things are getting. For bonus points, look at the forth query, the queued books are ordered, try to figure out how we can get the order in a meaningful way. I shudder to thing about the execution plan of this set of queries. Even if we ignore the last one that does full text searching in the slowest possible way. And this is just for bringing the data for a single screen, assuming that magically it will show up (you need to do a lot of manipulation at the app level to make this happen).

The problem is simple, our data access pattern and the data storage technology that we use are at odds with one another. While relational modeling dictate normalization, our actual data usage means that we don’t really deal with a single-row entity, with relatively rare access to associations, which is the best case for OLTP. Nor are we dealing with set based logic, which is the best case for OLAP / Relational based queries.

Instead, we are dealing an aggregate that spans multiple tables, mostly because we have no other way to express lists and many to many associations in a relational database.

Let us see how we could handle things if we were using a document or key/value database. We would have two aggregates, User and Book.

GetUser(userId) –> would result in:

image

We can now issue another query, to bring the associated books. GetBooks(153, 1337) would result in:

image

Note that the entire data structure is different, we haven’t just copied the normalized relational model, we have a totally different model. An aggregate (similar to DDD’s aggregate) is a single entity that contains anything except other aggregates. References to other aggregates are allowed (from user to all the books), but most of the entity’s data is stored as a single value.

That has several interesting implications. First, we need two queries to get the data for the screen. One to get the user’s data, and the second to get the books that we need to display. Reducing remote calls is something that you really care about, and simplifying the queries to mere query by ids is going to have a significant effect as well.

By changing the data storage technology, we also enforced a very rigid aggregate boundary. Transactions becomes much simpler as well, since most transactions will now modify only a single aggregate, which is a single operation, no matter how many actual operations we perform on that aggregate. And by tailoring the data structure that we use to match our needs, we have natural aggregate boundaries.

The end result is a far simpler method of working with the data. It may mean that we have to do more work upfront, but look at the type of work we would have to do in order to try to solve our problems using the relational model. I know what model I would want for this sort of a problem.

time to read 18 min | 3414 words

I get a lot of requests for what I term, the regex problem. Why the regex problem?

Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems.Jamie Zawinski in comp.lang.emacs.

A case in point, which comes up repeatedly, is this question:

Can you show us an example for loading collections of collections.
How would you write a query and avoid a Cartesian product multiple levels deep ?

In this case, we have someone who wants to load a blog, all its posts, and all its comments, and do it in the most efficient manner possible. At the same time, they want to have the tool handle that for them.

Let us take a look at how two different OR/Ms handle this task, then discuss what an optimal solution is.

First, Entity Framework, using this code:

db.Blogs
    .Include("Posts")
    .Include("Posts.Comments")
    .Where(x => x.Id == 1)
    .ToList();

This code will generate:

SELECT   [Project2].[Id]             AS [Id],
         [Project2].[Title]          AS [Title],
         [Project2].[Subtitle]       AS [Subtitle],
         [Project2].[AllowsComments] AS [AllowsComments],
         [Project2].[CreatedAt]      AS [CreatedAt],
         [Project2].[C1]             AS [C1],
         [Project2].[C4]             AS [C2],
         [Project2].[Id1]            AS [Id1],
         [Project2].[Title1]         AS [Title1],
         [Project2].[Text]           AS [Text],
         [Project2].[PostedAt]       AS [PostedAt],
         [Project2].[BlogId]         AS [BlogId],
         [Project2].[UserId]         AS [UserId],
         [Project2].[C3]             AS [C3],
         [Project2].[C2]             AS [C4],
         [Project2].[Id2]            AS [Id2],
         [Project2].[Name]           AS [Name],
         [Project2].[Email]          AS [Email],
         [Project2].[HomePage]       AS [HomePage],
         [Project2].[Ip]             AS [Ip],
         [Project2].[Text1]          AS [Text1],
         [Project2].[PostId]         AS [PostId]
FROM     (SELECT [Extent1].[Id]             AS [Id],
                 [Extent1].[Title]          AS [Title],
                 [Extent1].[Subtitle]       AS [Subtitle],
                 [Extent1].[AllowsComments] AS [AllowsComments],
                 [Extent1].[CreatedAt]      AS [CreatedAt],
                 1                          AS [C1],
                 [Project1].[Id]            AS [Id1],
                 [Project1].[Title]         AS [Title1],
                 [Project1].[Text]          AS [Text],
                 [Project1].[PostedAt]      AS [PostedAt],
                 [Project1].[BlogId]        AS [BlogId],
                 [Project1].[UserId]        AS [UserId],
                 [Project1].[Id1]           AS [Id2],
                 [Project1].[Name]          AS [Name],
                 [Project1].[Email]         AS [Email],
                 [Project1].[HomePage]      AS [HomePage],
                 [Project1].[Ip]            AS [Ip],
                 [Project1].[Text1]         AS [Text1],
                 [Project1].[PostId]        AS [PostId],
                 CASE 
                   WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
                   ELSE CASE 
                          WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
                          ELSE 1
                        END
                 END AS [C2],
                 CASE 
                   WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int)
                   ELSE CASE 
                          WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int)
                          ELSE 1
                        END
                 END AS [C3],
                 [Project1].[C1]            AS [C4]
          FROM   [dbo].[Blogs] AS [Extent1]
                 LEFT OUTER JOIN (SELECT [Extent2].[Id]       AS [Id],
                                         [Extent2].[Title]    AS [Title],
                                         [Extent2].[Text]     AS [Text],
                                         [Extent2].[PostedAt] AS [PostedAt],
                                         [Extent2].[BlogId]   AS [BlogId],
                                         [Extent2].[UserId]   AS [UserId],
                                         [Extent3].[Id]       AS [Id1],
                                         [Extent3].[Name]     AS [Name],
                                         [Extent3].[Email]    AS [Email],
                                         [Extent3].[HomePage] AS [HomePage],
                                         [Extent3].[Ip]       AS [Ip],
                                         [Extent3].[Text]     AS [Text1],
                                         [Extent3].[PostId]   AS [PostId],
                                         1                    AS [C1]
                                  FROM   [dbo].[Posts] AS [Extent2]
                                         LEFT OUTER JOIN [dbo].[Comments] AS [Extent3]
                                           ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1]
                   ON [Extent1].[Id] = [Project1].[BlogId]
          WHERE  1 = [Extent1].[Id]) AS [Project2]
ORDER BY [Project2].[Id] ASC,
         [Project2].[C4] ASC,
         [Project2].[Id1] ASC,
         [Project2].[C3] ASC

If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.

What about NHibernate? The following code:

var blogs = s.CreateQuery(
    @"from Blog b 
        left join fetch b.Posts p 
        left join fetch p.Comments 
    where b.Id = :id")
    .SetParameter("id", 1)
    .List<Blog>();

Will generate a much saner statement:

select blog0_.Id             as Id7_0_,
       posts1_.Id            as Id0_1_,
       comments2_.Id         as Id2_2_,
       blog0_.Title          as Title7_0_,
       blog0_.Subtitle       as Subtitle7_0_,
       blog0_.AllowsComments as AllowsCo4_7_0_,
       blog0_.CreatedAt      as CreatedAt7_0_,
       posts1_.Title         as Title0_1_,
       posts1_.Text          as Text0_1_,
       posts1_.PostedAt      as PostedAt0_1_,
       posts1_.BlogId        as BlogId0_1_,
       posts1_.UserId        as UserId0_1_,
       posts1_.BlogId        as BlogId0__,
       posts1_.Id            as Id0__,
       comments2_.Name       as Name2_2_,
       comments2_.Email      as Email2_2_,
       comments2_.HomePage   as HomePage2_2_,
       comments2_.Ip         as Ip2_2_,
       comments2_.Text       as Text2_2_,
       comments2_.PostId     as PostId2_2_,
       comments2_.PostId     as PostId1__,
       comments2_.Id         as Id1__
from   Blogs blog0_
       left outer join Posts posts1_
         on blog0_.Id = posts1_.BlogId
       left outer join Comments comments2_
         on posts1_.Id = comments2_.PostId
where  blog0_.Id = 1 /* @p0 */

While this is a saner statement, it will also generate a Cartesian product. There are no two ways about it, this is bad bad bad bad.

And the way to do that is quite simple, don’t try to do it in a single query, instead, we can break it up into multiple queries, each loading just a part of the graph and rely on the Identity Map implementation to stitch the graph together.  You can read the post about it here. Doing this may require more work on your part, but it will end up being much faster, and it is also something that would be much easier to write, maintain and work with.

time to read 2 min | 348 words

One of the things that I am working on is another commercial extension to EF, a 2nd level cache. At first, I thought to implement something similar to the way NHibernate does this, that is, to create two layers of caching, one for entity data and the second for query results where I would store only scalar information and ids.

That turned out to be quite hard. In fact, it turned out to be hard enough that I almost gave up on that. Sometimes I feel that extending EF is like hitting your head against the wall, eventually you either collapse or the wall fall down, but either way you are left with a headache.

At any rate, I eventually figured out a way to get EF to tell me about entities in queries and now the following works:

// will hit the DB
using (var db = new Entities(conStr))
{
    db.Blogs.Where(x => x.Title.StartsWith("The")).FirstOrDefault();
}

// will NOT hit the DB, will use cached data for that
using(var db = new Entities(conStr))
{
   db.Blogs.Where(x => x.Id == 1).FirstOrDefault();
}

The ability to handle such scenarios is an important part of what makes the 2nd level cache useful, since it means that you aren’t limited to just caching a query, but can perform far more sophisticated caching. It means better cache freshness and a lot less unnecessary cache cleanups.

Next, I need to handle partially cached queries, cached query invalidation and a bunch of other minor details, but the main hurdle seems to be have been dealt with (I am willing to lay odds that I will regret this statement).

time to read 4 min | 768 words

This is a question that I get very frequently, and I always tried to dodged the bullet, but I get it so much that I feel that I have to provide an answer. Obviously, I am (not so) slightly biased toward NHibernate, so while you read it, please keep it in mind.

EF 4.0 has done a lot to handle the issues that were raised with the previous version of EF. Thinks like transparent lazy loading, POCO classes, code only, etc. EF 4.0 is a much nicer than EF 1.0.

The problem is that it is still a very young product, and the changes that were added only touched the surface. I already talked about some of my problems with the POCO model in EF, so I won’t repeat that, or my reservations with the Code Only model. But basically, the major problem that I have with those two is that there seems to be a wall between what experience of the community and what Microsoft is doing. Both of those features shows much of the same issues that we have run into with NHibernate and Fluent NHibernate. Issues that were addressed and resolved, but show up in the EF implementations.

Nevertheless, even ignoring my reservations about those, there are other indications that NHibernate’s maturity makes itself known. I run into that several times while I was writing the guidance for EF Prof, there are things that you simple can’t do with EF, that are a natural part of NHibernate.

I am not going to try to do a point by point list of the differences, but it is interesting to look where we do find major differences between the capabilities of NHibernate and EF 4.0. Most of the time, it is in the ability to fine tune what the framework is actually doing. Usually, this is there to allow you to gain better performance from the system without sacrificing the benefits of using an OR/M in the first place.

Here is a small list:

  • Write batching – NHibernate can be configured to batch all writes to the database so that when you need to write several statements to the database, NHibernate will only make a single round trip, instead of going to the database per each statement.
  • Read batching / multi queries / futures – NHibernate allows to batch several queries into a single round trip to the database, instead of separate roundtrip per each query.
  • Batched collection loads – When you lazy load a collection, NHibernate can find other collections of the same type that weren’t loaded, and load all of them in a single trip to the database. This is a great way to avoid having to deal with SELECT N+1.
  • Collection with lazy=”extra” – Lazy extra means that NHibernate adapts to the operations that you might run on top of your collections. That means that blog.Posts.Count will not force a load of the entire collection, but rather would create a “select count(*) from Posts where BlogId = 1” statement, and that blog.Posts.Contains() will likewise result in a single query rather than paying the price of loading the entire collection to memory.
  • Collection filters and paged collections  - this allows you to define additional filters (including paging!) on top of your entities collections, which means that you can easily page through the blog.Posts collection, and not have to load the entire thing into memory.
  • 2nd level cache – managing the cache is complex, I touched on why this is important before, so I’ll skip if for now.
  • Tweaking – this is something that is critical whenever you need something that is just a bit beyond what the framework provides. With NHibernate, in nearly all the cases, you have an extension point, with EF, you are completely and utterly blocked.
  • Integration & Extensibility – NHibernate has a lot of extension projects, such as NHibernate Search, NHibernate Validator, NHibernate Shards, etc. Such projects not only do not exists for EF, but they cannot be written, for the most part, because EF has no extension points to speak of.

On the other side, however:

  • EF 4.0 has a better Linq provider than the current NHibernate implementation. This is something being actively worked on and the NH 3.0 will fix this gap.
  • EF is from Microsoft.
time to read 3 min | 488 words

My last post caused quite a bit of furor, and I decided that I wanted to respond to all the comments in a single place.

  • EF has a designer, NHibernate does not.
    This is actually false, NHibernate has multiple designers available for it. Active Writer (Free, OSS, integrated into VS), Visual NHibernate (Commercial, Beta) and LLBLGen (Commercial, forthcoming in early 2010). I would say that using a designer with NHibernate isn’t something that is very common, most people tend to either code gen the entire model and tweak that (minority) or hand craft the model & mapping. That isn’t for lack of options, it is because it is simply more efficient to do so in most cases.
  • EF is from Microsoft.
    Yes, it is. That is a good point, because it reflect on support & documentation. Unfortunately, the fact that this was one of the most prominent reasons quoted in the replies is also interesting. It says a lot about the relative quality of the products themselves. Another issue with a data access framework from Microsoft is that history teaches us that few data access methods from Microsoft survive the 2 years mark, and none survive the 5 years mark. RDO, ADO, ADO.Net, Object Spaces, Linq To Sql, Entity Framework – just to name a few.
  • Downstream integration.
    That was mentioned a few times, as in integration with data services, WCF RIA, etc. I want to divide my comment to that into two parts. First, everything that exists right now can be use with NHibernate. Second, EF is supposed to come with reporting / BI tools in the future. Currently, everything that came up was easily usable with NHibernate, so I am not really worried about it.
  • In the Future it will be awesome.
    Some people pointed out that Microsoft is able to allocate more resources for EF than an OSS project can. That is true to a point. One of the problems that Microsoft is facing is that it has to pay a huge amount of taxes in the way to create a releasable product. That means that it is typically easier for an OSS project to release faster than a comparable Microsoft project.

So far, I find it really interesting that no one came up with any concrete feature that EF can do that NHibernate can’t. I am going to let you in on a secret, when EF was announced, there were exactly two things that it could do that NHibernate could not. Both were fixed before EF 1.0 shipped, just because that fact annoyed me.

Are you telling me that no such thing exists for the new version?

time to read 1 min | 103 words

I am trying to formulate my formal response to “NH vs. EF” question, and while I have a pretty solid draft ready, I found out that my response is pretty biased. I am not happy with that, so I wanted to pose this as a real question.

So far, I came up with:

  • EF 4.0 has a better Linq provider than the current NHibernate implementation. This is something being actively worked on and the NH 3.0 will fix this gap.

My sense of fairness says that this can’t be it, so please educate me.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}