Ayende @ Rahien

It's a girl

How would I build MsBuild...

In the Open Letter to Scott Guthrie David talks about copying the features from OSS projects, and bring MsBuild as an example. MsBuild is a project that reached feature parity with Nant, there is basically no technical reason that I know to prefer on over the other. NAnt has more tasks, but MsBuild has quite a few as well.

The problem is that feature parity is just that, not enough. Both Nant and MsBuild are xml based programming languages. Read the last sentence again, and you can probably tell what I don't like in both. A much better approach to this would have been to keep the same targets & actions model, but to skip the XML and go with an imperative language for the actual implementation.

Xml is nice because you can pharse it easily, but it is not something that is meant for human consumption. A better alternative would have been something like Rake, but on .Net. (About two years ago, I played with porting NAnt to Boo. Not the project itself, but the syntax. I got to the point where it was very readable and left it, don't think that there is anything left of the project anymore, although the sources should be up somewhere.) At any rate, this would have been a significant improvement over the existing technology, not just another feature-parity product.

Test Responsabilities

Jermey Millier talks about lines of codes in tests vs. productions, and he raises an important issue with regard to the tests:

I say that mainly because unit test code is composed almost entirely of stand alone methods.  They don't interrelate or interract with each other.  The responsibility of a unit test is almost always the same:  setup data, do something, check the results.  The hardest thing about coding is deciding what to code, not the mechanical act of typing. 

This worth repeating. The tests are not a good place to show off how smart you can be with regard to design patterns and triple indirect factories. About the only refactoring that I would do in the tests (aside from rename method) is to refactor the creation of the test data, even then, the names are something like: CreateCustomer_WithTwoOrders_ButWithNoCredit(), I am asserting on the data from this method, so it has better be clear, and easy to find.

A good test, assuming that I understand the technology that is tested, and the testing/mocking framework, so reveal its intent, preferably via its name, but from the code as well.

The Fallacy of Shared Entity Model

What works for the database doesn't work for an entity, and what works for an entity is a really bad idea if you are trying to call a web service, a web service data is really not the right thing when you want a report. Your customer has a Birthday field, which I couldn't care less. I really want to track the last time a user signed in, but you don't have that information avialable to give me. You really don't want to give the employee salary data to the employee listing grid, etc.

I see a lot of dicussion about trying to have a single Entity (in the pure term: something with business meaning) that map to every representation and any scenario. The needs, constraints and requirements for each fo those are very different. Trying to make them all fit in one representation makes this a very bad representation.

Let us consider a simple entity, the Employee. It contains data such as name, date of birth, hire date, salary, etc. On the database side, it may be represented as a temporal table for each property, linked together by the employee id. In the business logic, I have an EmployeeSnapshot, which is not temporal, but is valid for a certain date only. On the UI, I have only a limited view of the employee (with salary information, for instnace), and a web service exposes additional employee information (hierarchies), to be consumed by the CRM (for permissions).

For this simple entity, I already have four different representation. I might be able to satisfy them all within a single class, but what does it gives me? A heavy weight hybrid that I can't really change without affecting all parts of the system.

In my current project, I have:

  • Policy - Active Record Entity - Business logic
  • AjaxianPolicy - Simple view of the fields that I want to show on the screen - Strictly DTO

When I get to the part that I need web services for, I will also add PolicyMessage class.

On the surface, this seems to break the DRY principal. I now exposed the fact that Policy has a Id property in several places. I don't think that this apply here, it may looks like repetition and more work, but I end up with simpler model to work with, and I can optimize each individual case independently. And by optimize I do not mean perfromance, I mean ease of use, compatability, etc.

An Open Letter to Scott Guthrie

Speaking and working with developers who live daily with your tools, I find confusion and concern over your relationship to the open source community that has grown up around the .Net platform. While Sun, IBM, other platform providers, and ISVs clearly see the open source community as a complimentary codebase to their products, Microsoft tends to use the best and brightest work from the community space as a feature map for .Net.

See the full post.

(Via Hammet)

MonoRail: Judged By Its Views

Jeremy Boyd posted about his experiance with MonoRail. He had used it in a project (along with the WebForms views), but he had this comment about it:

When I was doing my build I took a look at Brail and NVelocity as view engines and to be honest I found them an amazing step backwards from ASP.NET. Maybe I’m just lazy, but I do find that you get a lot of benefit (largely from the automatic eventing and management of control state) inside the ASP.NET framework and it feels like going back to classic ASP in some cases (ignoring some of the control/binding support you get).

This automatic eventing and control state management comes as a very high cost. I copied the link to Jeremy's blog from his site, here is the ID for the link "ctl00___ctl00___ctl00_ctl00_bcr_ctl00___Comments___Comments_ctl06_NameLink". Now try to build an Ajax site with this kind of client side nonesense.

This high cost is not only in terms of a more complex client side experience, but it terms of more complex model to work with. Page Life Cycle is a burden, you need to do so much to get so little. Remember, at the most basic level, we are talking about generating text.

MonoRail's views are templates, they are designed, on purpose, to be as near the generated content as possible. This reduce the amount of work that you have to do. You get to work with a single model all the way through. Something like Ajax Generators is either not possible or completely not trivial to make in WebForms. And the number of ways your can extend MonoRail to fit your needs is amazing.

Web Development is not hard.  It is a simple model, and shouldn't be complicated. What is complicated is meeting the business needs, producing pleasant looking and usable UI and ending up with a maintainable application.

Now, to the crux of Jeremy's problem. The templates are HTML mixed with minimal control flow, and they do look similar to the way classic ASP worked. The main difference is that the templates are not doing anything beyond presentation logic, and a little of that anyway. MonoRail has the concept of ViewComponents, which are similar to the controls in WebForms. Those will usually take care of any complex presentation logic needed for the application.

The end result is that you have a maintianable application, with easy to follow, easy to understand code paths and usages.

Real Developers find the correct abstraction level

Bob Lewis (opps!) Bob Grommes is talking about Real Developers. Since he touches on some of the things that I have talked about recently, I want to make something clear. I don't think that WebForms are bad because you are not manually parsing the query string and form data yourself (to an unprotected buffer, in assmebly).

I like a lot of the things that ASP.Net has to offer. I don't like the attempt to put a Shambling Facade in front of the way things work to make the easy scenarios demoable, and the hard scenarios harder.

Hibernate Shards

The Hibernate projects has just made a big release, and it include a completely new project call Hibernate Shards. This project aims to provide a good abstraction on horizontally partitioned data sets.

A nice example would be Blogger.com, which has some milliions of users. At this level of traffic and data set size, normal data bases breaks. You really want to do horizontal partioning of the data, so all the data for users whose name starts with A-G are on Db1 and all the rest are on F-Z Db2. There was an article from O'Rielly in this subject about 6 months ago, talking with the big players in this market, they all ended up going with this approach.

Hibernate Shards is still in early stages, but the idea is sound, and it looks like it can be an easy way to scale up with Hibernate, without a lot of application modifications.

Processing invalid patches

I just got a patch file that SVN couldn't handle. I handled the issue with this piece of code, here for future reference:

import System.IO

 

file = """random.patch""";

lines = File.ReadAllLines(file)

currentFile as TextWriter

for line in lines:

      if line.StartsWith("+++"):#new file

            currentFile.Dispose() if currentFile is not null

            fileName = line.Replace("+++","").Replace("(revision 0)","").Trim()

            currentFile = File.CreateText(fileName)

            print fileName

            continue

      continue if not line.StartsWith("+")

      currentFile.WriteLine( line.Substring(1) )

I love Boo!

Linq For NHibernate: Orderring and Paging

Bobby Diaz has implemented orderring and paging support for Linq to NHibernate, so this works:

(from c in nwnd.Customers select c.CustomerID)
        .Skip(10).Take(10).ToList();

As well as this:

var query = from c in nwnd.Customers
   where c.Country == "Belgium"
   orderby c.Country descending, c.City descending
   select c.City;

Thanks again, Bobby, and the new code is on SVN.

WebForms and lies

I made the comment several times that I feel that WebForms lies to me,  Joe Young asks what do I mean by that. Here is a small reproduction that I just created, that demonstrate the problem. Put this in an ASP.Net page, and run, select another value, and see what you get.

<script runat="server">

       protected void Page_Load(object sender, EventArgs e)

       {

              Test.DataSource = new string[]{ "London", "Paris", "Tel Aviv" };

              DataBind();

       }

 

       protected void SelectedIndexChanged(object sender, EventArgs e)

       {

              string msg = string.Format(@"alert('What the contol say: {0}; What the request says {1}');",

                     Test.SelectedValue, Request[Test.ClientID]);

              ClientScript.RegisterStartupScript(GetType(),"blah",msg,true);

       }

</script>

 

<asp:DropDownList ID="Test" EnableViewState="false" AutoPostBack="true" AppendDataBoundItems="true"

       runat="server" OnSelectedIndexChanged="SelectedIndexChanged">

       <asp:ListItem>nothing selected</asp:ListItem>

</asp:DropDownList>

Before you would jump to explain to me why this is happening, assume that I have spent a bit of time on this issue, I understand what the page life cycle is and can figure out what happen when.

Nevertheless, as it stands, here is a control that flat out lies to me about its state. I can go and query the Request and get the correct result back. For more fun and games, remove the AppendDataBoundItems and see what happens then.

I run into this kind of things fairly often, from view state that arrive to the wrong control (and what fun it was to find that out) to control naming rules to 101 little things that if you don't get just right, would silently break the code.

I will try to post about the rest of Joe's comment (and Jeremy Boyd's post) shortly.

Repositories 101

I got this question in an email today, and I think that it would make a good post. I broke it into two parts:

We are moving from Delphi (RAD way) to  real OO in C# + NHibernate.  I've reading about DDD. Now I'm kind of lost about Repositories + NHibernate. I found code that uses a Repository<T> class, which I liked at first, but after a while I wondered... how can i test it ? or, tomorrow we can change from nhibernate to other thing...

Let us talk about my Repository<T>, it is a static class, so it can't be mocked, but it is fully testable, since it uses Windsor to get the implementation of the IRepository<T> interface, of which there are several. Now consider the following code:

public ICollection<Customer> GetPreferredCustomers()
{
  return Repository<Customer>.FindAll(Where.Customer.IsPreferred = true);
}

Is it testable? Yes, I can replace the internal implementation of the Repository<T> with my own, in this case, a mock. The test would look something like:

[Test]
public void GetPreferredCustomerWillForwardQueryToRepository()
{
   //already setup with a fresh mock that it will return
   IRepository<Customer> customersRepository = IoC.Resolve<IRepository<Customer>>();
   Expect.Call(customersRepository.FindAll(null,null))
     .Constraints(Criteria.Contains("IsPreferred = True"), Is.Anything). Return(null);
   mocks.ReplayAll();
   service.GetPreferredCustomers();
   mocks.VerifyAll();
}

As you can see, it is not hard, but neither is it trivial. And I am not testing that the query works. For this types of situations, I am using in memory database and test against known data. I feel that it gives me as much value, and it doesn't hinder the tests.

How can I build a nice repository that could allow  change its "core", well, from nhibernate to what else.. I thought about an IRepository<T> interface, and than have a NHRepository : IRepository<T>, and in my code I referece to IRepository, and not to NHRepository...   am in in the right way ?

If you really like, you can do it by taking an interface like my IRepository<T> and use that, just stripping all the NHibernate specific parts. The reason I don't like it is that it is a false promise. You won't be able to easily switch to another ORM easily. Each ORM does it magic differently, and trying to move from NHibernate to LLBLGen (or vice versa), to make a simple example, is not going to be "let us just change the repository implementation" and that is it.

That would actually be the easy part, the far harder issue would be to make sure that logic that relies on the existance of Unit of Work (or auto wiring of reverse dependencies, to take something that NHibernate does not do), is much harder. You can limit yourself to simple CRUD operations, in which cases the differences are neglible in most scenarios, but that is missing out a lot, in my opinion.

Linq for Entities: Abstractions

Jeremy has a great post summarizing the MVP summit, and he include:

Linq for Entities is much more than an O/R Mapper.  It potentially provides us with a unified data access strategy over heterogeneous data sources (web services, xml, non relational databases, etc). 
  • Web service call is (a remote remote call - extranet/internet)
  • Non relational databases
  • Hierarchical data stores
  • Relational databases

Now consider the following query:

Customer customer = (from c in someHeterogeneousDataSource.Customers
   where c.Name == "Ayende" select c).First();
  • If the web service expose a GetCustomerByName(), this would be a good candidate, if not, the implementation would need to call GetAllCusomters() and filter it in memory.
  • For non relational databases, I am aware of object databases, flat files, temporal and hierarchical (covered seperatedly) - each of which has its own tradeoffs. I am not familiar with object databases to say what the tradeoffs are here, but for a flat file, it is going to be a linear scan. The query is not even a valid one for a temporal database (unless there is an implict CurrentTime).
  • For hierarchical data stores, this query would need to iterate all the customers, and compare their name to the query.
  • Relational database would think that this is too easy, and quit.

And this is just about the simplest query possible. I can't guess what will happen if I happened to want a join between customer and orders.

I get the feeling that I am missing something here, because it sure isn't heterogeneous to me.

Plain old .Net classes

Frans Bouma commented on my post about persistance ignorance, which I feel deserve a post in reply.

First, let me define the term POCO or PONO, it means Plain Old C# (or .NET) Object. The term comes from the Java world, where it is used to describe objects that are not encumbered with frameworks (like EJB, for instnace). A good example of a non-POCO class would be the WebForm1 class. It cannot be usefully utilized outside the ASP.Net envrionment.  Persistance ignorance is not having the entity class involve in its own persistance (or in persistance at all).

Persistance ignorance is not always nice to have, Castle Active Record, which I am using in my current project, is certainly not Persistance Ignorance, it explicitly puts the responsabilities for persistance on the class (and handle it in the base class). Active Record make it much easier to work with NHibernate. The important thing about Active Record is that I can decide that I want this and that class to be ignorant of their persistance, and handle it seperatedly, even to the point where their assembly do not reference the Active Record assembly. This is no minor thing.

Now let me get into Frans' comment (note: I edited the comment to concentrate on the stuff that I want to bring to discussion and to reply to. You can read the full comment here):

Persistence ignorance is as useless as POCO is as a term. A POCO class isn't persistable unless some post-compile or runtime-magic is added to the class. This is often overlooked by some of the vocal 'POCO or bust' people, and it gets really annoying. Because, it really matters WHAT is added to the POCO class: O/R mapper A adds different magic than O/R mapper B. [...]

So swapping O/R mappers will change the behavior of your application [...]
So is this 'persistence ignorance' really existing? No. Not only is the database part of your application, if you want it or not, it also makes up a huge part of your application's execution time, so your application spends a lot of time inside the DB. Ignoring that doesn't make it go away. In fact, ignoring that makes you vulnerable for bad performing applications which you could have avoided with simple thinking and proper architectural decisions.

I agree with everything that is said here, except the bolded part (emphasis mine). Because to my way of thinking, even with all the conditions that Frans mentions, persistance ignorance is important. It is important because my business logic isn't cluttered with other respnsabilities, it is important because we don't have mixins in .Net, and I got only one base class, and I am already using it. It is important because I may use types from other assemblies, including stuff that the author never though would ever be persisted.

Everything that Frans says is true, but it is not true for the object itself. The object is blissfully unaware that it is persisted, or how it is persisted. A good runtime-magic is mostly transperant, and do not require much thinking about in most cases. I was surprise to learn by just how much, to tell you the truth.

Persistance Ignorance has another point in its favor, if you classes are open for persistance ignorance, they are open for other extensability points as well. Validation being the key stone here, but I have done some interesting things with this recently, NHibernate Search, for instance.

If a POCO class ends up with 60% plumbing code for databinding, xml, helper code for producing strongly typed filters, prefetch paths etc. and 40% BL code, what's so great about having to write that 60% of the code by hand? Isn't that tying your code to some persistence framework as well?

If a POCO is 60% plumbing, it is not a POCO, period.

I mean: dyn. proxy using POCO frameworks also force you to write your code in a given way.

Yes, you are forced to use virtual methods. I would have love to remove this restriction, but the CLR won't let me. I consider this a failing of the CLR. That said, it is not a burden, IMO. The most insightful part is this:

After all, the core issue with this is that what POCO people really want is a place to write their own code without having to comply to a set of rules forced upon them by a 3rd party library they use the POCO classes with. If that can be solved, one way or the other, you've solved the problem.

And that is exactly what I like having the POCO option. Because it means that the designer of the tools let me have a lot more freedom than I would have otherwise. Again, take a look at WebForm1 as an example of a non POCO class, just try to work with it without also having the UI instansiated, and try having that outside a request, etc.

Linq for NHibernate Updates

I got a couple of great patches yesterday and today, Bobby Diaz sent a patch to add support for Count() and DataContext like behavior. Jon Stelly went and cleaned up my code, turning it from a sleep-deprived hack into a respectable project. You can see his full list here. Thanks to their efforts, this now works:

(from user in orm.Users select user).Count();

I changed the implementation of NHibernteLinqQuery to use the visitor pattern, which is the natural way that I think about walking through an AST. It is still very primitive one, but it does the job that we need right now.

The new code is in the repository:

Bobby and Jon, thanks again. Everyone else, you are invited to the party as well.

Linq: Possibilities

Jeff Brown commented on my Linq Options post:

Pity they didn't shoot for lexically-scoped blocks a la SmallTalk (or Ruby)...  This approach with Expressions has the same control-flow limitations as anonymous delegates but you can omit the curly braces sometimes.

Linq is not just anonymous delegates. (I should be clear that I am mostly thinking about the abilities of Expression rather than the Langague Integrated Query here). It means that I can start doing some really stuff. For what it worth, there is such a thing as the ExecutionScope for linq, but I am not sure what it is supposed to do, as far as I can see, it is the entire lexical scope for the expression.

Here is a trivial example that shows what you can do with it. Assume that I have this work item (and saved action):

[Serializable]
public class WorkItem

{

    string name;

    string action;

    string on;

 

    public WorkItem(string name, string action, string on)

    {

        this.name = name;

        this.action = action;

        this.on = on;

    }

    public WorkItem() {}

 

    public void DoAction()

    {

        Console.WriteLine(name +" "+action+" " +on);

    }

}

 

[Serializable]
private class SavedAction
{
   public object target;
   public string method;
}

And I have this code:

public delegate void Act();

static void Main(string[] args)
{
    WorkItem wi = new WorkItem("Ayende,", "write", "blog post");
    Save("Temp.action",() => wi.DoAction());
    Act act = Load("Temp.action");
    act();
}

What is going on here? I am saving the labmda into a file in the Save(), then load and execute it in the next two statement. Sadly, Linq's Expression<T> are not serializable, which I consider a huge minus, but for this example, I worked around it a bit. Here is the code for the Load, which isn't really interesting:

private static Act Load(string file)
{
    SavedAction action;
    BinaryFormatter bf = new BinaryFormatter();
    using (Stream s = File.OpenRead(file))
        action = (SavedAction)bf.Deserialize(s);
    return (Act)Delegate.CreateDelegate(typeof(Act), action.target,
        action.target.GetType().GetMethod(action.method));
}

The Save() is where the real magic begins, I compile the expression, extract the target, extract the method that was about to call, and save it, for later processing in the load.

private static void Save(Expression<Act> actionToSave)
{
    Act act = actionToSave.Compile();
    ExecutionScope scope = (ExecutionScope)act.Target;
    SavedAction action = new SavedAction();
    MethodCallExpression l = (MethodCallExpression)actionToSave.Body;
    action.target = Expression.Lambda(l.Object).Compile().DynamicInvoke();
    action.method = l.Method.Name;
    BinaryFormatter bf = new BinaryFormatter();
    using(Stream s = File.Create("Temp.action")) 
        bf.Serialize(s, action);
}

I am very excited about these capabilities. Yes, I can do it today, but the inteface I would have to expose is wholly unatural, while Linq provide for much nicer alternative.

Linq Options

Here are a few things that Linq enable:

ExecuteInUI( txtBox => txtBox.Text = "new val" );

This is just nicer syntax of something that we had for a long time

ExecuteInRemoteServer( a => a.LongCalculation() );

This is much more interesting, because I can literally walk the tree of dependent assemblies, then ship the entire code base to a remote server and continue the operation. Continuations are also an interesting concept in this regard.

My Boss' Guideline to interfacing with external code...

I just had to post this.

When you are dealing with external code, you have to assume the worst. Lesson 4 from CS110, a well written function is divided to:

  • Perconditions - checking the state of the object and the state of the arguments
  • Have you way with the code (It was much worse in Hebrew).
  • Validate the post conditions

Googlize your entities: NHibernate & Lucene.NET Integration

I need to do full text searches on my entities, across many different fields, and with many different parameters. A while ago I heard about Hibernate Search(TM), and became envious. It took me about two days*, but it is (almost) here. Here is how I search my blog's posts at the moment:

using (ISession session = sessionFactory.OpenSession())

using (IFullTextSession ftSession = Search.CreateFullTextSession(session))

{

       IList<Post> posts = ftSession.CreateFullTextQuery<Post>("Text:NHibernate and Title:Performance")

              .List<Post>();

       Assert.AreNotEqual(0, posts.Count);

}

And yes, I probably do have seamless integration with updating the indexes, so I wouldn't have to worry about it except for the first time. I strongly suggest that you would read the documentation above, since it is almost identical to my implementation.

Unlike most of my stuff, I am not going to release it just yet, it still needs more testing and some beating before I will let it see the light of day.

* And just to point out, no sleep, for real. I am working on caffiene and sheer inertation at the moment.

Adding additional information to XML files conforming to a schema

Lame title, but bear with me.

I have got a set of XML files that conform to an XSD schema, specifically, this one. I want to add additional elements and attributes to the documents, without breaking the schema, and hopefully without changing the code that read the schema.

Basically, I have this:

<class name="Blog" table="Blogs">

       <search:indexed-entity index="indexes/blogs"/>

       <property name="Subtitle">

              <search:field index="tokenized"/>

       </property>

</class>

I have bolded the parts that I want to add to the document. Any suggestions?

Performance: Multiply Collections Fetch With NHibernate

We are back to the traditional model of Blog -> Posts -> Comments. Now, we want to load all the blogs, posts and comments and index them to improve search performance. A naive implemention will use this code:

foreach (Blog blog in session.CreateQuery("from Blog").List<Blog>())

{

       Index(blog,"Blog");

       foreach (Post post in blog.Posts)

       {

              Index(post, "Post");

              foreach (Comment comment in post.Comments)

              {

                     Index(comment, "Comments");

              }

       }

}

This code is going to produce an ungodly amount of database queries. We want to grab the data from the database in as few queries as possible. Obviously we want to use some sort of eager loading, but what can we use? Let us analyze what are our options for solving this.

The easiest route is to tell NHibernate to load the entire object graph in one go. For a long time, NHibernate had a limitation of a single collection eager load per query. This means that you couldn't load both the Posts collection and the Comments collection. Sergey has recently removed this limitation, so let us replacing the HQL query with:

from Blog blog

     left join fetch blog.Posts post

     left join fetch post.Comments

This query results in this SQL (I cleaned up the names a bit):

select  blog.Id ,                                    post.BlogId ,

        post.Id ,                                    post.UserId ,

        comment.Id ,                                 comment.Name ,

        blog.Title ,                                 comment.Email ,

        blog.Subtitle ,                              comment.HomePage ,

        blog.AllowsComments ,                        comment.Ip ,

        blog.CreatedAt ,                             comment.Text ,

        post.Title ,                                 comment.PostId ,

        post.Text ,                                  post.BlogId ,

        post.PostedAt ,                              post.Id ,

        comment.PostId ,                             comment.Id

from    Blogs blog

        left outer join Posts post on blog.Id = post.BlogId

        left outer join Comments comment on post.Id = comment.PostId

A word of warning, though:

There is a non trivial cartesian product here. If you know that you are working on small sets, that is fine, but be wary of using this technique over large sets of data.

Implementing Linq for NHibernate: A How To Guide - Part 1

There is an appaling lack of documentation about how to implement Linq providers. The best resource that I could find is Fabrice's post about Linq To Amazon. The Linq In Action may provide additional information, but from what I have seen it is about using linq, not build a provider yourself. Since I would really like people to pitch in and help with the implementation of Linq for NHibernate (not that this is a hint or anything), I decided to document what I found out while building Linq for NHibernate.

I strongly suggest that you would get the VPC Image and use that to explore what is possible. The code, to remind you, is at:

svn co https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/experiments/NHibernate.Linq/

My goal at the moment is feature parity with the 101 examples that Microsoft have published.

I am not going to explain in detail how Linq works, you can go to other sites to find that out, but here is a short introduction. Linq is an extention to the compiler that turns certain keywords into method calls. The interesting part it that the compiler can give you the AST (Abstract Syntax Tree) for the query instead of executable code. This is important, because then you are free to take actions based on the AST (for instance, make a database call).

The whole concept revolves around two main ideas, IQueryable<T> and Expression which are tightly linked together. The compiler will output an Expression tree that will be passed to the IQueryable<T> implementation. Of course, there need to be an IQueryable<T> implementation, and that is where extention methods come into place. I implemented to core functionalty by adding an extention method to ISession, like this:

public static class LinqForNHibernate

{

    public static IQueryable<T> Linq<T>(this ISession session)

    {

        return new NHibernateLinqQuery<T>(session);

    }

}

Now I can execute this query, and NHiberanteLinqQuery get to intercept the expression tree:

var query = (from user in session.Linq<User>() select user).ToList();

The rest of this post is going to focus mostly on the NHibernateLinqQuery implementation. I have choosen to base the Linq implementation on the criteria API. This make the task a lot simpler, since I can let mature API handle a lot of the underlying query generation. The criteria API does not exposes 100% of the functionality offered by NHibernate, but it offers most of it, and saves me the need to handle query generation myself. Where I would need features in the criteria API that do not currently exists, I can add them.

Here are the fields on NHibernateLinqQuery:

ISession session;

ICriteria rootCriteria;

IDictionary<ExpressionType, Action<System.Linq.Expressions.Expression>> actions;

Stack<IList<NHibernate.Expression.ICriterion>> criterionStack = new Stack<IList<NHibernate.Expression.ICriterion>>();

IList<TResult> result;

The session and rootCriteria are probably obvious, but actions require an explanation. Each Expression has an ExpressionType, and the action dictionary contains the matching methods that can handle them. Basically, each ExpressionType is handled by a method Process[ExpressionType] on the NHibernateLinqQuery. Here is an example of the method that handles ExpressionType.Lambda:

public void ProcessLambda(Expression expr)

{

    LambdaExpression lambda = (LambdaExpression)expr;

    ProcessExpression(lambda.Body);

}

Where ProcessExpression is implemented as:

public void ProcessExpression(Expression expr)

{

    actions[expr.NodeType](expr);

}

Basically a visitor pattern with the actions dictionary serving as the dispatcher.

The criterionStack contains all the current predicates about the query. It is a stack of a list of ICriterion, and the idea is that I can insert a new list to the stack, have it process some of the expression, and then pop the list and use the processed items. Let us see this in code, we have the CurrentCritetions, which all the Process[ExpressionType] will handle, which is simply:

public IList<NHibernate.Expression.ICriterion> CurrentCriterions

{

       get { return criterionStack.Peek(); }

}

Once we have both of those, we can then use it for complex expression, like handling [user.Name == "ayende" || user.Name = "rahien"]:

public void ProcessOrElse(Expression expr)

{

       BinaryExpression and = (BinaryExpression) expr;

       criterionStack.Push(new List<NHibernate.Expression.ICriterion>());

       ProcessExpression(and.Left);

       ProcessExpression(and.Right);

       IList<NHibernate.Expression.ICriterion> ors = criterionStack.Pop();

 

       NHibernate.Expression.Disjunction disjunction = new NHibernate.Expression.Disjunction();

       foreach (var crit in ors)

       {

              disjunction.Add(crit);

       }

       CurrentCriterions.Add(disjunction);

}

We push a new list to the stack, process the right and left expressions of the or, pop the current critetion list, and then we combine them into a disjunction, which we push into the original criterion list. This way we don't have to worry about complex expression, they are mostly handled by themselves.

Now that I talked about how I am parsing the expression tree, let us talk about how the query is handled. Again, this isn't documented that I have seen, so I am mainly talking about what I discovered. The first thing that happens is that the IQueryable<T>.Expression property is called. Basically it is asked to give what sort of an expression should handle this query. I choose to handle the query in the same IQueryable<T> implementation, so I am returning this reference:

public System.Linq.Expressions.Expression Expression

{

       get { return System.Linq.Expressions.Expression.Constant(this); }

}

Then, the CreateQuery<TElement> method is called. it is important to understand the difference in the <T>'s here. The query itself is a generic type, NHibernateLinqQuery<TResult>, where TResult is the entity that we are querying. The result of the query may be different than TResult, because we may use projection to get only some of the values, or select a child item values.

Therefor, we need to return a new IQueryable<TElement>, which is why I am creating a new instance of the same class, passing it the current state of the objects, and continue to parse the expression tree.

As you can see, I am only handling the Select and Where methods at the moment. My naming convention at the moment is "HandleXyzCall" is to handle a query method, while "ProcessXyz" it to process an expression type.

public IQueryable<TElement> CreateQuery<TElement>(System.Linq.Expressions.Expression expression)

{

       MethodCallExpression call = (MethodCallExpression) expression;

       switch (call.Method.Name)

       {

              case "Where":

                     HandleWhereCall(call);

                     break;

              case "Select":

                     HandleSelectCall(call);

                     break;

       }

 

       foreach (var crit in CurrentCriterions)

       {

              rootCriteria.Add(crit);

       }

       return new NHibernateLinqQuery<TElement>(session, rootCriteria);

}

The HandleWhereCall is very simplistic, it just process all the arguments passed to the where clause:

private void HandleWhereCall(MethodCallExpression call)

{

       foreach (Expression expr in call.Arguments)

       {

              ProcessExpression(expr);

       }

}

The select method is a bit more complex, since it need to handle projections and other various interesting features. I am not going to show it here, because it is over 50 lines of codes and it is very state machine like. Not really interesting.

On the sad state of NHibernate code generation

I wanted to generate NHibernate mapping & objects from the Northwind schema, to be able to run the same queries as the Linq samples.  I evaluated several MyGeneration templates as well as a commerical product (GenWise). None of them had an output that I would dim acceptable.

What od I mean by that?

  • Singularizing the table names when creating an entity.
  • Properties names that follow the .NET Naming Guidelines.
  • Able to generate many to one <set>

I like code generation in general, but I know that there are many people who swear by it. And come on, it is Northwind, the database equivalent of Hello World. Are there any reasonable code generation tools for NHibernate? 

If there isn't, I am willing to trade, a Linq implementation for a code generator that can handle the Northwind database.