Ayende @ Rahien

Refunds available at head office

How to pay 3 times for the same flight ticket

My current mood is somewhere between mad, pissed off, frustrated and exhausted.

All I want to do it arrive home, but dark forces has conspired against me. It works like this:

  • I got a ticket both ways to London, and I left to London quite happily, taking the train to the airport. At that point things started going horribly wrong. An idiot has suicided on the train trucks, leading to about 2 hours of delays, which, naturally, caused me to miss my flight. We will leave that aside and just point out that I somehow managed to get to London without being late to my course the next morning.
  • That is when I heard that since I was a no show for the outgoing flight, the airline cancelled my flight back.  I had to order a new one, but because I was busy actually doing the course, I asked the company to order that for me.
  • They did, and I went to the airport quite satisfied, except that on arrival, the stupid machine told me that my booking is invalid, which is the point that I discovered that the ticket was ordered on Ayende Rahien. Unfortunately, legally speaking, that guy doesn’t exists. The airline was quite insistent that they can’t put me on board with the different name. And they made me buy a new ticket.

That was the point (after I paid for the same bloody seat for the third time) that they told me that they oversold the flight, and that they put me on waiting list.

I tried to explain that they could use the second ticket that I bought, and just give me the same seat, but they told me that they already sold that to someone else.

I am currently waiting to hear whatever I’ll get to go home tonight or not.

Your truly,

Pissed off,

Oren Eini (because being Ayende is currently dangerous, given my mood)

Find the bug: RavenDB HiLo implementation

The follow code is part of RavenDB HiLo implementation:

        private long NextId()
        {
            long incrementedCurrentLow = Interlocked.Increment(ref currentLo);
            if (incrementedCurrentLow > capacity)
            {
                lock (generatorLock)
                {
                    if (Thread.VolatileRead(ref currentLo) > capacity)
                    {
                        currentHi = GetNextHi();
                        currentLo = 1;
                        incrementedCurrentLow = 1;
                    }
                }
            }
            return (currentHi - 1)*capacity + (incrementedCurrentLow);
        }

It contains a bug, can you see it? I took a long time to figure it out, I am ashamed to say.

BTW, you can safely assume that GetNextHi is correct.

NHibernate course in Australia?

I am going to be visiting Australia in December, and I thought that I might take advantage of that to give my 3 days NHibernate course.

Before doing that, I thought that I should test the waters. Thoughts?

RavenDB: Includes

When I set out to build RavenDB, I had a very clear idea about what I wanted to do. I wanted to build an integrated, opinionated, solution. Something that will do the right thing most of the time, and let you override that if you really want to.

One of the things that really drove the design was 6 – 7 years of experience in building applications based on RDBMS using ORMs. Let me put it gently, I am… well acquainted with the problems that people may run into when they use an ORM. One of the things that I wanted to avoid was duplicating the possibility of error with RavenDB.

One of the major design decisions that I made was to disallow associations between documents. This is part of the core design of the system.

Let us take the following example:

 image

As you can see, we have two documents, an Order and a Customer. The order references a customer, but unlike in RDBMS, we use a denormalized reference, with both the Id and the Name of the customer stored inside the Order document. That is advantageous because it allows us to perform most operations on the Order document without having to load the Customer document.

From the C# model, it looks like this:

public class Order
{
      public string Id { get;set; }
      public Address ShippingAddress { get;set; }
      public Address BillingAddress { get;set; }
      public DenormalizedReference Customer { get;set; }
}

public class DenormalizedReference
{
      public string Id { get;set; }
      public string Name { get;set; }
}

public class Customer
{
      public string Id { get;set; }
      public string Name { get;set; }
      public string Email { get;set; }

}

Note that there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedReference, which holds the interesting bits from Customer that we need to process requests on Order.

So far so good, but, and this is important, you can’t always set things up this way. There is a set of cases where you do want to be able to access the associated document.

Well, that is easy enough, isn’t it? All we need to do is load it:

var order = session.Load<Order>("orders/9432");
var customer = session.Load<Customer>(order.Customer.Id);

This is simple, easy to read, easy to understand and make me want to curl into a ball and weep. The problem is, of course, that this is going to generate two calls to Raven. And if there is one thing that I pay attention to is the number of remote calls that I am making.

I started to think about how I can make this scenario work better, and I came up with the following design.

Given the two documents

Then for GET /docs/orders/9432

image

And for GET /docs/orders/9432?include=Customer.Id

image 

Note that in the second case, we get the full customer data merged into the order document.

From an implementation perspective, this would be very easy to do. The problem is how to represent this in the client API. We had a very interesting discussion on the topic in the mailing list.

Let me explain the problem in detail. Given the C# classes above, how do you express this notion of the include? You can’t use the Order model above, because that Customer property is going to of type DenormalizedReference. We can’t make that property of type Customer, either, because then the Customer data would be embedded inside the Order document, which isn’t what we wanted.

In the mailing list, there were a lot of proposal being raised, the one that seemed to be the most popular was to drop the 1:1 mapping between the C# model and the document model and move to something like this:

[RootAggregate]
public class Order
{
    public string Id { get; set; }
    public string Name { get; set; }
    public Customer Customer { get; set; }
}

[RootAggregate]
public class Customer
{
    public string Id { get; set; }
    [Denormalized]
    public string Name { get; set; }
    public string Email { get; set; }

}

And then make the client API smart enough to understand the attribute. The model above would generate the same documents as the previous model, but would allow much easier time when working on features such as this. This way, we can normally access the data that is embedded in the document, but also include the associated document when we need it.

There are several problems here:

  • This creates a misleading API, making people think that things are normalized when they aren’t.
  • It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
  • It goes directly against the way I believe you should work with a document database.

But I couldn’t think of any other way, nor could anyone else.

Until Frank Schwieterman come to our rescue:

Maybe rather then join the documents into one result, such a request would cause the 'joined' entities to be preloaded instead.
From the client API perspective, I would load do the joined load of the user object, and the user is returned in its original form.  But now the session has the customer object preloaded, so when I try to load the customer object via the client API no request is made to the server.  From the caller's perspective, the only change to usage has been the preload hint passed in the original request.

Yes!

The problem was (from my perspective) never with the way the model is structured, the problem was that the way the documents were model caused a performance problem. Frank’s suggestion completely eliminated that issue.

It took some interesting coding to get it to work properly, but essentially, it is just an application of the Future usage for loading large object graph in NHibernate. Now we can do:

var order = session
    .Include("Customer.Id")
    .Load<Order>("orders/1");

var customer = session.Load<Customer>(order.Customer.Id);

And this code will only go to the server once!

We get to keep the separate model, and we can manipulate how we are loading associations easily. I really like this solution.

Playing with Entity Framework Code Only

After making EF Prof work with EF Code Only, I decided that I might take a look at how Code Only actually work from the perspective of the application developer. I am working on my own solution based on the following posts:

But since I don’t like to just read things, and I hate walkthroughs, I decided to take that into a slightly different path. In order to do that, I decided to set myself the following goal:

image

  • Create a ToDo application with the following entities:
    • User
    • Actions (inheritance)
      • ToDo
      • Reminder
  • Query for:
    • All actions for user
    • All reminders for today across all users

That isn’t really a complex system, but my intention is to get to grips with how things work. And see how much friction I encounter along the way.

We start by referencing “Microsoft.Data.Entity.Ctp” & “System.Data.Entity”

There appears to be a wide range of options to define how entities should be mapped. This include building them using a fluent interface, creating map classes or auto mapping. All in all, the code shows a remarkable similarity to Fluent NHibernate, in spirit if not in actual API.

I don’t like some of the API:

  • HasRequired and HasKey for example, seems to be awkwardly named to me, especially when they are used as part of a fluent sentence. I have long advocated avoiding the attempt to create real sentences in a fluent API (StructureMap was probably the worst in this regard). Dropping the Has prefix would be just as understandable, and look better, IMO.
  • Why do we have both IsRequired and HasRequired? The previous comment apply, with the addition that having two similarly named methods that appears to be doing the same thing is probably not a good idea.

But aside from that, it appears very nice.

ObjectContext vs. DbContext

I am not sure why there are two of them, but I have a very big dislike of ObjectContext, the amount of code that you have to write to make it work is just ridiculous, when you compare that to the amount of code you have to write for DbContext.

I also strongly dislike the need to pass a DbConnection to the ObjectContext. The actual management of the connection is not within the scope of the application developer. That is within the scope of the infrastructure. Messing with DbConnection in application code should be left to very special circumstances and require swearing an oath of nonmaleficence. The DbContext doesn’t require that, so that is another thing that is in favor of it.

Using the DbContext is nice:

public class ToDoContext : DbContext
{
    private static readonly DbModel model;

    static ToDoContext()
    {
        var modelBuilder = new ModelBuilder();
        modelBuilder.DiscoverEntitiesFromContext(typeof(ToDoContext));
        modelBuilder.Entity<User>().HasKey(x => x.Username);
        model = modelBuilder.CreateModel();
    }

    public ToDoContext():base(model)
    {
        
    }

    public DbSet<Action> Actions { get; set; }

    public DbSet<User> Users { get; set; }
}

Note that we can mix & match the configuration styles, some are auto mapped, some are explicitly stated. It appears that if you fully follow the builtin conventions, you don’t even need ModelBuilder, as that will be build for you automatically.

Let us try to run things:

using(var ctx = new ToDoContext())
{
    ctx.Users.ToList();
}

The connection string is specified in the app.config, by defining a connection string with the name of the context.

Then I just run it, without creating a database. I expected it to fail, but it didn’t. Instead, it created the following schema:

image

That is a problem, DDL should never run as an implicit step. I couldn’t figure out how to disable that, though (but I didn’t look too hard). To be fair, this looks like it will only run if the database doesn’t exists (not only if the tables aren’t there). But I would still make this an explicit step.

The result of running the code is:

image

Now the time came to try executing my queries:

var actionsForUser = 
    (
        from action in ctx.Actions
        where action.User.Username == "Ayende"
        select action
    )
    .ToList();

var remindersForToday =
    (
        from reminder in ctx.Actions.OfType<Reminder>()
        where reminder.Date == DateTime.Today
        select reminder
    )
    .ToList();

Which resulted in:

image

That has been a pretty brief overview of Entity Framework Code Only, but I am impressed, the whole process has been remarkably friction free, and the time to go from nothing to a working model has been extremely short.

If I run Dev Div…

I found this post interesting, so I thought that I would try to answer it.

If I run Dev Div, I would:

  • Utilize CodePlex Foundation to a much higher degree. Right now CPF manages about 6 projects, all of them seems to originate from Microsoft. None of them are actually useful for CPF in the sense of creating an excitement around it. If I run Dev Div, and given the indication of the lack of will from within Microsoft to maintain the DLR, I would give the CPF the DLR, along with a small budget (1 million annually should more than cover that) and set it free. That 1 million? I would put it down as marketing, because the amount of good will that you’ll generate from doing that is enormous. And the level of publicity that you’ll get for CPF is huge.

 

  • Sit down with marketing and create a new set of personas, based on what is actually happening in the real world. Mort, Einstein and that Elvis dude might be good when you are building the product and designing the API, but there are other aspects to consider, and acceptance in the community is important. A lot of the public blunders that Microsoft has made recently has been just that, marketing mistakes more than anything else.

 

  • Build for each project a set of scenarios that appeal to each group. For the PHP crowd, point out how WebMatrix makes their life simple. For the ALT.Net crowd, show them how they can use IIS Express to run their unit tests without requiring anything installed. Right now what is going on is that pushing to one side alienate the other.
    Let me put this another way, when someone asks me why use NHibernate instead of Entity Framework, I am going to inform them that Microsoft continues their trend of new Data Access framework every two years, and they have gone retro with Microsoft.Data.

 

  • Kill duplicated effort before it goes out the door. Linq to SQL & Entity Framework are a great example of things that should have been developed concurrently (because either project had a chance of failure, so hedging your bets is good, if you can afford that), but they should have never been allowed to both see the light of day. The marketing damage from moving Linq to SQL to maintenance mode was huge.

 

  • Simplify, simplify, simplify! Microsoft does a lot of good things, but they tend to provide solutions that are wide scoped and complex. Often enough, so complex that people just give up on them. Case in point, if at all possible, I wouldn’t use WCF for communication. I would rather role my own RPC stack, because that makes things easier. That is a Fail for Microsoft in many cases, and one of the reasons that they are leaking developers to other platforms.

 

  • Forbid developers from arguing with customers publically about policy decisions. The reasoning is simple, devs don’t have a lot (if any) impact on policy decisions, they can explain them, sure. But when there is disagreement, having an argument about it tends to end up with bad blood in the water. Such discussions should be done at a level that can actually do something. Case in point, recent WebMatrix arguments in twitter & blogs.

 

  • Focus some love & attention toward the non entry level segment of the market. There has been a worrying set of events recently that indicate a shift in focus toward the entry level market, and the rest of developers are feeling abandoned. That shouldn’t be allowed to remain this way for long.

If you’ll not, except for the suggestion about turning simplification to an art form and applying it ruthlessly, most of the suggestions are actually about marketing and positioning, not about product development.

Breaking the C# compiler, again

Take a look at the following code:

   1:  var docs = new dynamic[0];
   2:  var q = from doc in docs
   3:          where doc["@metadata"]["Raven-Entity-Name"] == "Cases"
   4:          where doc.AssociatedEntities != null
   5:          from entity in doc.AssociatedEntities
   6:          where entity.Tags != null // COMPILER ERROR HERE
   7:          from tag in entity.Tags
   8:          where tag.ReferencedAggregate != null
   9:          select new {tag.ReferencedAggregate.Id, doc.__document_id};        

It doesn’t compiles, complaining about an error in line 6 (The property '<>h__TransparentIdentifier0' cannot be used with type arguments).

Leaving aside the issue with the error itself, which is about as useful as C++ template errors, I have a strongly issue with this.

What do you think can fix this error? Well, if you remove line 8 (!), it just works:

   1:  var docs = new dynamic[0];
   2:  var q = from doc in docs
   3:          where doc["@metadata"]["Raven-Entity-Name"] == "Cases"
   4:          where doc.AssociatedEntities != null
   5:          from entity in doc.AssociatedEntities
   6:          where entity.Tags != null
   7:          from tag in entity.Tags
   8:          select new { tag.ReferencedAggregate.Id, doc.__document_id };

There are several things that I can’t figure out:

  • Why am I getting an error?
  • Why removing a line after the error fixes the error in the line before it.

What I suspect is that the error reporting (which in C# is usually damn good) is getting confused with the error reporting location.

To make things worse, this code will crash VS 2010 every single time. I am actually editing this in notepad!!!

Here is the set of extensions methods that I use here:

public static class LinqOnDynamic
{
    private static IEnumerable<dynamic> Select(this object self)
    {
        if (self == null)
            yield break;
        if (self is IEnumerable == false || self is string)
            throw new InvalidOperationException("Attempted to enumerate over " + self.GetType().Name);

        foreach (var item in ((IEnumerable) self))
        {
            yield return item;
        }
    }

    public static IEnumerable<dynamic> SelectMany(this object source,
                                                    Func<dynamic, int, IEnumerable<dynamic>> collectionSelector,
                                                    Func<dynamic, dynamic, dynamic> resultSelector)
    {
        return Enumerable.SelectMany(Select(source), collectionSelector, resultSelector);
    }

    public static IEnumerable<dynamic> SelectMany(this object source,
                                                    Func<dynamic, IEnumerable<dynamic>> collectionSelector,
                                                    Func<dynamic, dynamic, dynamic> resultSelector)
    {
        return Enumerable.SelectMany(Select(source), collectionSelector, resultSelector);
    }

    public static IEnumerable<dynamic> SelectMany(this object source,
                                                    Func<object, IEnumerable<dynamic>> selector)
    {
        return Select(source).SelectMany<object, object>(selector);
    }

    public static IEnumerable<dynamic> SelectMany(this object source,
                                                                    Func<object, int, IEnumerable<dynamic>> selector)
    {
        return Select(source).SelectMany<object, object>(selector);

    }
}

WTF?!

Tags:

Published at

Originally posted at

Comments (18)

Financial analysis

I decided to spend some time with Excel trying to do a look back at how things are doing. I am not going to give you numbers, but the trends in the data are interesting. All the data is relevant to this year only.

image 

image

Tell me that the top one doesn’t remind you of a rude gesture. As you can see, NHibernate rules the roost here, on the bottom, it is even clearer. I think that a lot of that is that I am closely related to NHibernate that is making the difference.

image

image

Looking at the data over time gives me a very interesting perspective about the introduction of subscriptions. That was expected, but I don’t think that I really grokked that. Looking at the bottom image, I can tell you that subscriptions are pretty big, from the point of view of license numbers, but they are much weaker from the point of view of money in the bank.I knew that, I even wanted that, since the whole point of subscriptions was to get sustainable revenue stream rather than money today. But I wasn’t really ready for that.

image

This one is particularly annoying, because there is a dry spot around the 20th every single month, and I get abandonment anxiety at that time.

That is it for now :-)

NHibernate Tooling Review: LLBLGen Pro 3.0

This is part of what looks to be long series of posts about tooling relating to NHibernate.

In this case, I want to talk about LLBLGen Pro.

LLBLGen Pro 3.0 is an OR/M tool that also comes with an NHibernate option. Because it is the GUI tool to manage NHibernate mapping that I am testing, I am going to show a lot of screenshots. Also, please note that I am using this without any foreknowledge, reading the docs, or anything. I am also not going to do an in depth discussion, just use it to see how those things are working.

After installation, we start a new project:

image

Which give us the following (scary) dialog. Note all the tabs in the bottom. There is a huge number of options and knows that you can tweak.

image

The next step that I wanted to do is to see how it handles an existing data model, so I decided to add the test model that I use for NHibernate:

image image

This takes about zero time and gives us the entities. It properly recognize 1:m associations, but I had to define the m:n associations myself. I am not sure if this is what I am supposed to do or if I just missed something. What is impressive is that the schema in question is actually being generated from NHibernate through the SchemaExport tool. So we went a full cycle, NHibernate mapping, schema, extracting everything to the following model:

model

I wonder if I can get LLBLGen Pro to read a set of NHibernate mapping as well, because this view of the mapping can be very useful.

But enough talking about the UI, there are a lot of tools that have great UI, I want to see what code it generates.

As an aside, I like this part of the code generation:

image

Not so much because what there is there, but because it implies how the code generation process itself is structured.

I was surprised when I saw this:

image image

It brought back flashbacks of 1.0, but they are empty, I assume that this is vestigial remain in the template.

I don’t like that CategoriesPost is there, but I couldn’t figure out how to remove it, this isn’t an entity, it is an association table.

Looking at the code, there is one thing that really bothers me, take a look at the property definitions. How do you set that?

image

I am pretty sure that I did something to cause that, but I really don’t have any idea what.

Update: (This is written before the post is published, but after I had time to review what I did). It appears that I played with too many settings, there is an option called EmitFieldSetter that I accidently set to false. The idea behind this setting that you want to allow setting properties via methods, not via direct properties.

I’ll ignore that for now, and look at the rest of the code. In addition to the Model project, LLBLGen Pro also generates the Persistence project:

image

I was quite surprised to see that it uses class maps, but then I looked at the code:

/// <summary>Represents the mapping of the 'Blog' entity, represented by the 'Blog' class.</summary>
public class BlogMap : ClassMap<Blog>
{
    /// <summary>Initializes a new instance of the <see cref="BlogMap"/> class.</summary>
    public BlogMap()
    {
        Table("[dbo].[Blogs]");
        OptimisticLock.Version();
        LazyLoad();

        Id(x=>x.Id)
            .Access.CamelCaseField(Prefix.Underscore)
            .Column("Id")
            .GeneratedBy.Identity();
        Map(x=>x.AllowsComments).Access.CamelCaseField(Prefix.Underscore);
        Map(x=>x.CreatedAt).Access.CamelCaseField(Prefix.Underscore);
        Map(x=>x.Subtitle).Access.CamelCaseField(Prefix.Underscore);
        Map(x=>x.Title).Access.CamelCaseField(Prefix.Underscore);

        HasMany(x=>x.Posts)
            .Access.CamelCaseField(Prefix.Underscore)
            .Cascade.AllDeleteOrphan()
            .Fetch.Select()
            .AsSet()
            .Inverse()
            .LazyLoad()
            .KeyColumns.Add("BlogId");
        HasManyToMany(x=>x.Users)
            .Access.CamelCaseField(Prefix.Underscore)
            .Cascade.AllDeleteOrphan()
            .Fetch.Select()
            .AsSet()
            .Table("[dbo].[UsersBlogs]")
            .ParentKeyColumns.Add("BlogId")
            .ChildKeyColumns.Add("UserId");

    } 
} 

That is highly readable, and a lot of people prefer that, and I guess it is easier than generating the XML.

Did you see the SessionManager class? It also handles all the wiring of NHibernate, so you can start using it by just getting the session:

SessionManager.OpenSession()

Overall, very impressive. From installation to working code (except that annoying no setter thingie), it took mere minutes, and the code is clean. That is one of the things that I always hated with code generators, they generally produce crappy code, so I was very happy to see how the code looked like.

Now, after the whirlwind tour, let me see if I can look at more of the options.

Oh, it looks like it can generate HBM files, let us give that a try:

<class name="Blog" table="[dbo].[Blogs]" optimistic-lock="version" >
    <id name="Id" column="Id" access="field.camelcase-underscore" >
        <generator class="identity"/>
    </id>
    <property name="AllowsComments" column="AllowsComments" access="field.camelcase-underscore"/>
    <property name="CreatedAt" column="CreatedAt" access="field.camelcase-underscore"/>
    <property name="Subtitle" column="Subtitle" access="field.camelcase-underscore"/>
    <property name="Title" column="Title" access="field.camelcase-underscore"/>
    <set name="Posts" access="field.camelcase-underscore" cascade="all-delete-orphan" inverse="true" fetch="select">
        <key>
            <column name="BlogId"/>
        </key>
        <one-to-many class="Post"/>
    </set>
    <set name="Users" access="field.camelcase-underscore" table="[dbo].[UsersBlogs]" cascade="all-delete-orphan" fetch="select">
        <key>
            <column name="BlogId"/>
        </key>
        <many-to-many class="User">
            <column name="UserId"/>
        </many-to-many>
    </set>
</class>

There is one main thing to note here, this looks pretty much like it was written by a human. There are still some things to improve, but compare that to the output of Fluent NHibernate, and you’ll see that this is generating something that is much easier to work with.

Some things that still bothers me:

  • Kick all the access=”field.camelcase-underscore” to the default-access on the <hibernate-mapping> element, that would remove a lot of duplication.
  • The table=”[dbo].[UsersBlogs]” should be table=”`UsersBlogs`” schema=”`dbo`”, and I would skip the catalog if it is dbo already.
  • The class contains an optimistic-lock=”version”, but there is no version column, so this should go away as well. The same error appears in the fluent nhibernate mapping as well.

Let us see what else I can play with.

There is something called Typed Views, which I didn’t understand at first:

image

It appears that instead of generating an entity (which has associations, collections, references, etc), this build just a bare bone class to hold the data. The purpose, I believe, is to use this for reporting/binding scenarios, in which case a view model is much more appropriate. This is one features that the CQRS guys are probably going to like.

Oh, I found the m:n option:

image

With that, I get:

image

I probably need to change the name generation pattern, though :-)

I would suggest running the labels on the UI through PascalCaseToSentence, though. That is, instead of having to read AutoAddManyToManyRelationships, turn that to “Auto add many to many relationships”.

There is another feature, called TypedList, which gives you a query designer for building NHibernate queries (they are translated to HQL) which looked nice, but I haven’t looked at too closely. The way I usually work, queries are too context sensitive to allow to create the sort of query library that would make this feature useful.

This has been a very short run through LLBLGen Pro in its NHibernate mode. Overall, it looks very impressive, and it generates the best code that I have seen yet out of a tool. Good enough to take and use as is.

The only detraction that I could find is the name, (yes, I know that I talked about this before), LLBLGen Pro just doesn’t roll off the tongue.

LumenQ: API

I am posting this because I really have no choice, I don’t want to build another product right now, but the design is literally killing me, bouncing in my head all the time.

The following is the low level API that you would use with LumenQ, a Silverlight queuing framework.

var postOFfice = new PostOFfice("http://my-server/lunemq/messages")
    .RegisterForTopic("/customers/939192")
    .RegisterForTopic("/notifications")
    .OnMessageArrived(HandleMessageFromServer)
    .Start();

postOFfice.Send(new MyRecommendations
{
    UserId = 9393
});

Note that OnMessageArrived is a low level API, there is also an API similar to the Rhino Service Bus, which allows you to inherit from ConsumerOf<T> and gain strongly typed access to what is going on there.

Messages to this client will include:

  • Any message on the /customers/939192 and /notifications topics
  • Any message posted specifically to this client

I am still struggling to decide what sort of infrastructure support I want to give the library, things like error handling are drastically different than in the Rhino Service Bus world, for example.

Why am I tired? Answer for week of 1 – 6 Aug, 2010

  • Blog:
    • 23 posts written
    • 258 comments.
    • 32 of said comments by me.
  • RavenDB documentation: 17 entries created / updated.
  • RavenDB: 46 commits.
  • NH Prof: 31 commits.
  • Email: Over 600 threads (a lot more in actual emails).

My head feel like it has been overheating for a while. Time to quit this infernal machine and watch some TV, maybe I’ll lose some IQ points.

Data access is contextual, a generic approach will fail

In my previous post, I discussed the following scenario, a smart client application for managing Machines, Parts & Maintenance History:

image_thumb

From the client perspective, Machine,Parts & Maintenance History where all part of the same entity, and they wanted to be able to be able return that to the user in a single call. They run into several problems in doing that, mostly, from what I could see, because they tried to do something that I strongly discourage, sending entities on the wire so they could be displayed on the smart client application. The client uses CSLA, so the actual business logic is sitting in Factory objects (I discussed my objection to that practice here), but the entities are still doing triple duty here. They need to be saved to the database, serve as DTOs on the wire and also handle being presented in the UI.

When I objected to that practice, I was asked: “Are we supposed to have three duplicates of everything in the application? One for the database, one for the wire and one for the UI? That is insane!”

The answer to this question is:

This is the symbol for Mu, which is the answer you give when the question is meaningless.

The problem with the question as posed is that it contains an invalid assumption. The assumption is that the three separate models in the application are identical to one another. They aren’t, not by a long shot.

Let me see if I can explain it better with an example. Given the Machine –>> Part –>> Maintenance History model, and an application to manage those as our backdrop. We will take two sample scenarios to discuss, the first would be to record replacing a part in a machine, and the second would be looking at a list of machines that need fixing.

For the first scenario, we need to show the user the Maintenance screen for the machine, which looks like this:

image

And on the second one, machines requiring fixes, we have:

image

Now, both of those screens are driven by data in the Machine –>> Part –>> Maintenance History model. But they have very different data requirements.

In the first case, we need the Machine.Name, Machine.NextMaintenanceDate, and a list of Part.SKU, Part.Name, Part.Broken, Part.Notes.

In the second case, we need Machine.Owner, Machine.Car, Machine.Age, Machine.Preferred.

The data set that they require is completely different, not only that, but sending the full object graph to each of those is a waste. If you noticed, no one actually even needed Maintenance History for any of those screens. It is likely that Maintenance History will be only needed rarely, but by saying that there is only one model that needs to serve all those different scenarios, we are tying ourselves into knots, because we need to serve Maintenance History every time that we are asked for a machine, even though it is not needed.

The way I would design the system, I would have:

Entities: Machine, Part, Maintenance History

This is the full data model that we use. Note that I am not discussing issues like CQRS here, because the application doesn’t warrant it. A single model for everything would work well here.

DTO: MachineHeaderDTO, MachineAndPartsDTO, PartAndMaintenanceHistoryDTO, etc.

We have specific DTOs for each scenario, only giving us the data that we need, so we can spare a lot of effort in the server side in deciding how to fill the data.

View Models

The view models would be built on top of the DTOs, usually providing some additional services like INotifyPropertyChanged, calculated properties, client side validation, etc.

I am in favor of each screen having its own View Models, because that lead to a self contained application, but I wouldn’t have a problem with common view models shared among the entire application.

Sample application

And because I know you’ll want to see some code, the Alexandria sample application demonstrate those concepts quite well: http://github.com/ayende/alexandria

LightSwitch: The Return Of The Secretary

image Microsoft LightSwitch is a new 4GL tool from Microsoft, this is another in the series of “you don’t have to write any code” tools that I have seen.

Those are the tools that will give the secretary the ability to create applications and eliminate the need for coders. The industry has been chasing those tools since the 80s (does anyone remember the promises of the CASE tools?). We have seen many attempts at doing this, and all of them have run into a wall pretty quickly.

Oh, you can build a tool that gives you UI on top of a data store pretty easily. And you can go pretty far with it, but eventually your ability to point & click hit the limit, and you have to write code. And that is things totally breaks down.

LightSwitch is not yet publically available, so I have to rely on the presentation that Microsoft published. And I can tell you that I am filled with dread, based on what I have seen.

First of all, I strongly object to the following slide. Because I have the experience to know that working with a tool like that is akin to do back flips with a straightjacket on.

image

The capabilities of the tools that were shown in the presentation have strongly underwhelmed me in terms of newness, complexity or applicability.

Yeah, a meta data driven UI. Yeah, it can do validation on a phone number automatically (really, what happen with my Israeli based phone number?), etc. What is worse, even through the demo, I get the very strong feeling that the whole things is incredibly slow, you can see in the presentation multi second delays between screen repaints.

Then there are things like “it just works as a web app or windows app” which is another pipe dream that the industry has been chasing for a while. And the only piece of code that I have seen is this guy:

image 

Which makes me want to break down and cry.

Do you know why? Because this is going to be the essence of a SELECT N+1 in any system, because this code is going to run once per each row in the grid. And when I can find bugs from watching a presentation, you know that there are going to be more issues.

So, just for fun sake, since I don’t have the bits and I can rely only on the presentation, I decided to make a list of all the things that are likely to be wrong with LightSwitch.

I’ll review it when it comes out, and if it does manage to do everything that it does and still be a tool usable by developers, I’ll have to eat crow (well, Raven :-) ), but I am not overly worried.

Here are a few areas where I am feeling certain things will not work right:

  • Source control – how do I diff two versions of the app to see what changes? Are all changes diffable?
  • Programmatic modifications:
    • what happen when I want to write some code to do custom validation of property (for instance, calling a web service)?
    • what happen when I want to put a custom control on the screen (for instance, a google maps widget)?
  • Upsizing – when it gets to a 1,000 users and we need a full blown app, how hard it is to do?
  • Performance – as I said, I think it is slow from the demo.
  • Data access behavior – from what I have seen so far, I am willing to be that it hits its data store pretty heavily.

I fully recognize that there is a need for such a tool, make no mistake. And giving users the ability to do that is important. What I strongly object to is the notion that it would be useful for developers writing real apps, not forms over data. To put it simply, simple forms over data is a solved problem. There is a large number of tools out there to do that. From Access to Oracle Apex to FoxPro. Hell, most CRM solutions will give you just that.

My concern is that there seems to be an emphasis on that being useful for developers as well, and I strongly doubt that.

ALT.NET Israel Autumn 2010

It is my great pleasure to invite you to the fourth ALT.NET Israel unconference, which will take place inALT.NET Israel September, Thursday-Friday (2-3/09/2010).

You can register to the conference here.

Tags:

Published at

NHibernate is lazy, just live with it

At a client site, I found the following in all the mapping files:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" default-lazy="false">
    <class name="Machine" table="Machines" lazy="false">
        <id name="Id">
            <generator class="identity"/>
        </id>
        <property name="Name"/>
        <set name="Parts" table="Parts" lazy="false">
            <key column="BlogId"/>
            <one-to-many class="Part"/>
        </set>
    </class>
</hibernate-mapping>

I think that you can figure out by now what is wrong. It is too bad that the <blink/> tag has gone out of use, because that would be an accurate reflection of how I responded when I saw that. After I finished frothing at the mouth, I explained that you never ever do things like, because it leads to pain, mayhem and tears.

The lead dev heard me out, then explained that their situation wasn’t a standard web application, their server had some business functionality, but it served mainly as a way for clients to come in and get some data. The architecture looks like this:

image

As you can imagine, NHibernate is sitting in the Application tier, but a lot of the work is done on the smart client application.

The lead dev explained that in their scenario, they needed to send the full entity to the smart client app, so they didn’t want lazy loading. Their scenario also precluded the need to do lazy loading over the network, so all was good, and they made the decision to use lazy=”false” intentionally. Their model was pretty good about not having deeply connected object graphs, too.

I hit Google and searched for the relevant posts:

* Googling that made me very nervous.

I include the search terms as a reminder to myself that post titles are important, because that is how I search for them after the fact. And I got some really strong (and strange) looks when I did that.

After we discussed that we put it aside and moved to other topics. Until we came to the part where we profiled the system.

We immediately found some trouble spots that we needed to resolve. One of them was getting a list of machines that required fixing.

The problem was that the code looked something like this:

public IEnumerable<Machine> GetMachinesRequiringFixes()
{
    return (from machine in session.Linq<Machine>()
           where machine.Status == Statuses.Broken
           select machine).Take(10);
}

Looks easy, right?

Except that this seemingly innocent piece of code was responsible for 400 queries.

What went wrong? How could loading 10 machines result in that many queries?

Well, let us look what NHibernate did, shall we?

select top 10 * from Machines where Status = 'Broken'

This gives NHibernate ten machines instances, except that when NHibernate investigate the mapping for a Machine, it realizes that it needs to load the Parts collection, since it was marked lazy=”false”, so NHibernate will execute ten queries for loading each Machine’s Parts.

select * from Parts where MachineId = 40
select * from Parts where MachineId = 41
select * from Parts where MachineId = 42
select * from Parts where MachineId = 43
select * from Parts where MachineId = 44
select * from Parts where MachineId = 45
select * from Parts where MachineId = 46
select * from Parts where MachineId = 47
select * from Parts where MachineId = 48
select * from Parts where MachineId = 49

But each Part also has an association for a MaintenanceHistory, which was also marked lazy=”false”, so NHibernate had to load those as well. (I’ll spare you that one :-) ).

From the client perspective, Machine,Parts & Maintenance History where all part of the same entity, and they wanted to be able to be able return that to the user in a single call. (More on that fallacy in my next post).

The problem is that when you set lazy=”false”, you literally ties NHibernate’s hands. Now, you could start playing with the fetch mode option, to change that to join or subselect, but the problem is that this works when you have only one collection association (in the entire graph), because otherwise you run into sever Cartesian products issues.

Now, NHibernate has the ability to handle just that scenario, but by setting lazy=”false”, you prevent NHibernate from having the chance to actually utilize them.

There is a good reason why lazy is set to true by default, and while I am sure that there is some limited number of scenarios where lazy=”false” is the appropriate choice, it isn’t for your scenario.

Tags:

Published at

Originally posted at

Comments (23)

Microsoft.Data and Positioning

I just read a very insightful post from Evan Nagle about the topic. In it, Evan hit a key point:

Technically (and commercially) speaking, Microsoft has a slew of perfectly legitimate reasons for splattering the landscape with these "watered down" technologies. But, culturally speaking, it's painful to the professional Microsoft developer. It is. Because the Microsoft developer's personal and professional identity is tied up in the Microsoft stack, and that stack is now catering to a bunch of cat ladies and acne-laden teenagers. That's the real issue here.

If Microsoft thinks that they can get the market for bad developers, good for them. But that needs to be highly segmented. It should be clear that going with that route is leading you to a walled garden and that writing real apps this way is not good. The best comparison I can think of is Access apps in the 90s. There was a clear separation between “just wanna write some forms over data app over the weekend” and “I want to build a real application”. When you built an app in Access, you had very clear idea about the limitations of the application, and you knew that if you wanted something more, it would be a complete re-write.

That was a good thing, because it meant that you could do the quick & dirty things, but when things got messy, you knew that you had to make the jump.

The problem with things like Microsoft.Data is that there is no such line in the sand. And when you call it “Microsoft.*” you give it a seal of approval for everything. And when you have a piece of code that is explicitly designed to support bad coding practices, it is like peeing in the pool. If there is only one pool, it is going to affect everyone. There wouldn’t be nearly as much objection if it was called WebMatrix.Data, because that would clearly put it in someone else’s pool, and it that turn into a putrid swamp, I don’t really care.

There is another issue here, and it is not just that the community is afraid of inheriting horrible Microsoft.Data projects. It is much more than that.

Salary data is from the UK, because that is the first site I found with the numbers)

Now, I really can’t think of a good technical reason why VB.Net programmers are paid so much less, but those data points match what I have seen about salaries for developers in both positions.

In other words, VB.Net developers are getting paid a lot less for doing the same job.

Now, why is that? Is it because of the legacy of VisualBasic still haunts the VB guys? Because there is still the impression that VB is a toy language? I don’t know, but I think that at least in part, that is true. And that is what happen when a platform gets a reputation for being messy. People know in their bones that building stuff there is going to be costly, because maintaining this stuff is so hard.

Microsoft has repeatedly stated that they are after the people who currently go to PHP. Let me do the same comparison:

I don’t like those numbers. I really don’t like them.

Put simply, if Microsoft attempts to make the .NET platform more approachable for the PHP guys, it is also going to devalue the entire platform. I am pretty sure that this is something that they don’t want. Having a lot of hobbist programmer but fewer professional ones is going to hurt the Microsoft eco system.

Microsoft, if you want to create a PHP haven in the .NET world, that is great, but make sure that it is completely isolated, because otherwise you are going to hurt everyone who has a commercial stake in your platform.

I think that there is a lot of sense from commercial point of view in WebMatrix, but it should be clear that this isn’t .NET programming, this is WebMatrix programming. So if Microsoft succeed in gaining market share for this, it won’t be the .NET developers who would suddenly look at a 30% less money, it would be WebMatrix developers who would have to carry that stigma.

Tags:

Published at

Originally posted at

Comments (45)

Microsoft.Data, because the 90s were so good, we want to do them again

I just saw this post, and I had to double check the date of the post twice to be sure that I am reading about something that is going to come out soon, rather than something that have come out in 2002.

This is the code that is being proudly presented as an achievement:

using (var db = Database.Open("Northwind")) {
    foreach (var product in db.Query("select * from products where UnitsInStock < 20")) {
        Response.Write(product.ProductName + " " + product.UnitsInStock);
    }
}

Allow me to give you the exact statements used:

The user doesn’t have to learn about connection strings or how to create a command with a connection and then use a reader to get the results. Also, the above code is tied to Sql Server since we’re using specific implementations of the connection, command, and reader(SqlConnection, SqlCommand, SqlDataReader).

Compare this with code below it. We’ve reduced the amount of lines required to connect to the database, and the syntax for accessing columns is also a lot nicer, that’s because we’re taking advantage of C#’s new dynamic feature.

I really don’t know where to start. Yes, compared to raw ADO.Net I guess that this is improvement. But being beaten only thrice a week instead of daily is also an improvement.

I mean, seriously, are you freaking kidding me? Are you telling me that you are aiming to make the life of people writing code like this easier?

We have direct use of Response.Write – because it is so hard to create maintainable code, we decided to just give up from the get go. Beyond that, putting SQL directly in the code like that, including the parameters, is an open invitation for SQL injection.

Next, let us talk seriously, okay. Anyone who had to write ADO.Net code already wrapped it up. Anyone who didn’t isn’t going to.

But given all the times that we have heard “no resources to fix XYZ” from Microsoft, are you telling me that creating a framework that is intended for really bad programmers to slide down the maintainability scale more easily? Is that a valuable use of resources?

Writing code like that is actively harmful, period. There is really no need to deal with those low level stuff in this day and age.

Tags:

Published at

Originally posted at

Comments (84)

Abstracting the persistence medium isn’t going to let you switch persistence abstractions

This came in as a comment for my post about encapsulating DALs:

Just for the sake of DAL - what if I want to persist my data in another place, different than a relational database? For instance, in a object-oriented database, or just serialize the entities and store them somewhere on the disk... What about then ? NHibernate cannot handle this situation, can it ? Wouldn't I be entitled to abstract my DAL so that it can support such case ?

The problem here is that it assume that such a translation is even possible.

Let us take the typical Posts & Comments scenario. In a relational model, I would model this using two tables. Now, let us say that I have the following DAL operations:

  • GetPost(id)
  • GetPostWithComments(id)
  • GetRecentPostsWithCommentCount(id)

The implementation of this should be pretty obvious on a relational system.

But when you move to a document database, for example, suddenly this “abstract” API becomes an issue.

How do you store comments? As separate documents, embedded in the post? If they are embedded in the post, GetPost will now have the comments, which isn’t what happens with the relational implementation. If they are separate documents, you complicated the implementation of GetPostWithComments, and GetRecentPostsWithCommentCount now have a SELECT N+1 problem.

Now, to be absolutely clear, you can certainly do that, there are fairly simple solutions for that, but this isn’t how you would build the system if you started from a document database standpoint.

What you would end up with in a document database design is probably just:

  • GetPost(id)
  • GetRecentPosts(id)

Things like methods for eagerly loading stuff just don’t happen there, because if you need to eagerly load stuff, they are embedded in the document.

Again, you can make this work, but you end up with a highly procedural API that is very unpleasant to work with. Consider the Membership Provider API as a good example of that. The API is a PITA to work with compared to APIs that don’t need to work to the lowest common denominator.

When talking about data stores, that usually means limit yourself to key/value store operations, and even then, there is stuff that leaks through. Let me show you an example that you might be familiar with:

image

The MembershipProvider is actually a good example of an API that try to fit several different storage systems. It has a huge advantage from the get go because it comes with two implementations for two different storage systems (RDBMS and ActiveDirectory).

Note the API that you have, not only is it highly procedural, but it can also assume nothing about its implementation. It makes for code that is highly coupled to this API and to this form of working with data in procedural fashion. This make it much harder to work with compared to code using a system that enable automatic change tracking and data store synchronization (pretty much every ORM).

That is one problem of such APIs, it greatly increase the cost of working with it. Another problem is that stuff still leaks through.

Take a look at the GetNumberOfUsersOnline method. That is a method that can only be answered efficiently on a relational database. The ActiveDirectoryMembershipProvider just didn’t implement this method, because it isn’t possible to answer that question from ActiveDirectory.

By that same token, you can’t implement FindUsersByEmail, FindUsersByName on a key/value store (well, you can maintain a reverse index, of course), nor can you implement GetAllUsers.

You can’t abstract the nature of the underlying data store that you use, it is going to leak through. The issues isn’t so much if you can make it work, you usually can, the issue is if you can make it work efficiently. Typical problems that show up is that you try to map concepts that simply do not exists in the underlying data store.

Relations is a good example of something that it is very easy to do in SQL but very hard to do using almost anything else. You can still do that, of course, but now your code is full of SELECT N+1 and you can waive goodbye to your performance. If you build an API for a document database and try to move it to a relational one, you are going to get hit with “OMG! It takes 50 queries to save this entity on a database, and 1 request to save it on a DocDb”, leaving aside the problem that you are also going to have problems reading it from a relational database (the API would assume that you give back the full object).

In short, abstractions only work as long as you can assume that what you are abstracting is roughly similar. Abstracting the type of the relational database you use is easy, because you are abstracting the same concept. But you can't abstract away different concepts. Don’t even try.

Tags:

Published at

Originally posted at

Comments (12)

Package management for .NET: Nu

Package management for .NET has long been an issue. I won’t rehash the story, but you can read about a lot of the problems here.

There has been several attempts to replicate the success of Gems in Ruby, but none has really taken hold. Dru Sellers & Rob Reynolds had decided to take a slightly different approach. Instead of trying to replicate Ruby’s gems, just use them.

That is how Nu was created (and I really want to know what that name was picked, I nearly titled this post Dr. No). Before you can use Nu, you need to install it, which means that you have to make the following one time steps:

  • Install Ruby (you can also use Iron Ruby, but the Ruby installer is just sweet & painless).
  • gem install nu

Once this is done, all you need to do to install a piece of software is just write:

  • nu install rhino.mocks
  • nu install nhibernate

And Nu will get the latest version (including resolving dependencies) and put them in your lib folder.

After executing the two commands above, I have the following contents in my lib directory.

  • castle.core
  • castle.dynamicproxy2
  • log4net
  • nhibernate
  • rhino.mocks

I like it, it is going to make things so much simpler!