Ayende @ Rahien

It's a girl

Public Service Announcement: Git master repositories for the Rhino Tools projects

There have been some changes, and it seems that it is hard to track them. Here are where you can find the master repositories for the rhino tools projects:

On PSake

James Kovacks introduced psake ( a power shell based build system )over a year ago, and at the time, I gave it a glance and decided that it was interesting, but not worth further investigation.

This weekend, as I was restructuring my Rhino Tools project, I realized that I need to touch the build system as well. The Rhino Tools build system has been through several projects, and was originally ported from Hibernate. It is NAnt based, complex, and can do just about everything that you want expect be easily understandable.

It became clear to me very quickly that it ain’t going to be easy to change the way it works, nor would it be easy to modify that to reflect the new structure. There are other issues with complex build systems, they tend to create zones of “there be dragons”, where only the initiated go, and even they go with trepidation. I decided to take advantage of the changes that I am already making to get a simpler build system.

I had a couple of options open to me: Rake and Bake.

Bake seemed natural, until I remember that no one touched it in a year or two. Beside, I can only stretch NIH so far :-). And while I know that people rave about rake, I did not want to introduce a Ruby dependency on my build system. I know that it was an annoyance when I had to build Fluent NHibernate.

One thing that I knew that I am not willing to go back to was editing XML, so I started looking at other build systems, ending up running into PSake.

There are a few interesting things that reading about it brought to mind. First, NAnt doesn’t cut it anymore. It can’t build WPF applications nor handle multi targeting well. Second, I am already managing the compilation part of the build using MSBuild, thanks to Visual Studio.

That leave the build system with executing msbuild, setting up directories, executing tests, running post build tools, etc.

PSake handles those well, since the execution environment is the command line. The syntax is nice, just enough to specify tasks and dependencies, but everything else is just pure command line. The following is Rhino Mocks build script, using PSake:

properties { 
  $base_dir  = resolve-path .
  $lib_dir = "$base_dir\SharedLibs"
  $build_dir = "$base_dir\build" 
  $buildartifacts_dir = "$build_dir\" 
  $sln_file = "$base_dir\Rhino.Mocks-vs2008.sln" 
  $version = ""
  $tools_dir = "$base_dir\Tools"
  $release_dir = "$base_dir\Release"

task default -depends Release

task Clean { 
  remove-item -force -recurse $buildartifacts_dir -ErrorAction SilentlyContinue 
  remove-item -force -recurse $release_dir -ErrorAction SilentlyContinue 

task Init -depends Clean { 
    . .\psake_ext.ps1
    Generate-Assembly-Info `
        -file "$base_dir\Rhino.Mocks\Properties\AssemblyInfo.cs" `
        -title "Rhino Mocks $version" `
        -description "Mocking Framework for .NET" `
        -company "Hibernating Rhinos" `
        -product "Rhino Mocks $version" `
        -version $version `
        -copyright "Hibernating Rhinos & Ayende Rahien 2004 - 2009"
    Generate-Assembly-Info `
        -file "$base_dir\Rhino.Mocks.Tests\Properties\AssemblyInfo.cs" `
        -title "Rhino Mocks Tests $version" `
        -description "Mocking Framework for .NET" `
        -company "Hibernating Rhinos" `
        -product "Rhino Mocks Tests $version" `
        -version $version `
        -clsCompliant "false" `
        -copyright "Hibernating Rhinos & Ayende Rahien 2004 - 2009"
    Generate-Assembly-Info `
        -file "$base_dir\Rhino.Mocks.Tests.Model\Properties\AssemblyInfo.cs" `
        -title "Rhino Mocks Tests Model $version" `
        -description "Mocking Framework for .NET" `
        -company "Hibernating Rhinos" `
        -product "Rhino Mocks Tests Model $version" `
        -version $version `
        -clsCompliant "false" `
        -copyright "Hibernating Rhinos & Ayende Rahien 2004 - 2009"
    new-item $release_dir -itemType directory 
    new-item $buildartifacts_dir -itemType directory 
    cp $tools_dir\MbUnit\*.* $build_dir

task Compile -depends Init { 
  exec msbuild "/p:OutDir=""$buildartifacts_dir "" $sln_file"

task Test -depends Compile {
  $old = pwd
  cd $build_dir
  exec ".\MbUnit.Cons.exe" "$build_dir\Rhino.Mocks.Tests.dll"
  cd $old        

task Merge {
    $old = pwd
    cd $build_dir
    Remove-Item Rhino.Mocks.Partial.dll -ErrorAction SilentlyContinue 
    Rename-Item $build_dir\Rhino.Mocks.dll Rhino.Mocks.Partial.dll
    & $tools_dir\ILMerge.exe Rhino.Mocks.Partial.dll `
        Castle.DynamicProxy2.dll `
        Castle.Core.dll `
        /out:Rhino.Mocks.dll `
        /t:library `
        "/keyfile:$base_dir\ayende-open-source.snk" `
    if ($lastExitCode -ne 0) {
        throw "Error: Failed to merge assemblies!"
    cd $old

task Release -depends Test, Merge {
    & $tools_dir\zip.exe -9 -A -j `
        $release_dir\Rhino.Mocks.zip `
        $build_dir\Rhino.Mocks.dll `
        $build_dir\Rhino.Mocks.xml `
        license.txt `
    if ($lastExitCode -ne 0) {
        throw "Error: Failed to execute ZIP command"

It is about 50 lines, all told, with a lot of spaces and is quite readable.

This handles the same tasks as the old set of scripts did, and it does this without undue complexity. I like it.

The complexity of unity

This post is about the Rhino Tools project. It has been running for a long time now, over 5 years, and amassed quite a few projects in it.

I really like the codebase in the projects in Rhino Tools, but secondary aspects has been creeping in that made managing the project harder. In particular, putting all the projects in a single repository made it easy, far too easy. Projects had an easy time taking dependencies that they shouldn’t, and the entire build process was… complex, to say the least.

I have been somewhat unhappily tolerant of this so far because while it was annoying, it didn’t actively create problems for me so far. The problems started creeping when I wanted to move Rhino Tools to use NHibernate 2.1. That is when I realized that this is going to be a very painful process, since I have to take on the entire Rhino Tools set of projects in one go, instead of dealing with each of them independently. the fact that so many of the dependencies where in Rhino Commons, to which I have a profound dislike, helped increase my frustration.

There are other things that I find annoying now, Rhino Security is a general purpose library for NHibernate, but it makes a lot of assumptions about how it is going to use, which is wrong. Rhino ETL had a dependency on Rhino Commons because of three classes.

To resolve that, I decided to make a few other changes, taking dependencies is supposed to be a hard process, it is supposed to make you think.

I have been working on splitting the Rhino Tools projects to all its sub projects, so each of them is independent of all the others. That increase the effort of managing all of them as a unit, but decrease the effort of managing them independently.

The current goals are to:

  • Make it simpler to treat each project independently
  • Make it easier to deal with the management of each project (dependencies, build scripts)

There is a side line in which I am also learning to use Git, and there is a high likelihood that the separate Rhino Tools projects will move to github. Suversion’s patching & tracking capabilities annoyed me for the very last time about a week ago.

Common infrastructure? Don't make me laugh

I am currently in the process of retiring the IRepository<T> from Rhino.Commons. I am moving it to its own set of projects, and I am not going to give it much attention in the future.

Note: Backward comparability is maintained, as long as you reference the new DLLs.

But why am I doing that? If there is anything that I have been thought in the last couple of years is that there is a sharp and clear distinction between technological infrastructure (web frameworks, IoC containers, OR/M) and application infrastructure (layer super type, base services, conventions). The first can and should be made common, but trying to make the application infrastructure shared between multiple projects that aren't closely matched is likely to cause a lot of issues.

To take the IRepository<T> example, it currently have over 50 methods. If that isn't a violation of SRP, I don't know what is. Hell, you can even execute a stored procedure using the IRepository<T> infrastructure. That is too much.

What will I use instead? A project focused infrastructure. A repository interface reflect the application that is using it, and it changes from project to project and even in the same project as our understanding of how the project is built improve.

I am still considering what to do with the other tidbits that I have there, such as the EntitiesToRepositories implementation. Ideally, I want to keep Rhino Commons focused on only session management, and nothing else beside.

Rhino Queues

One of the things that often come up in the NServiceBus mailing list is the request for an xcopy, zero administration, queuing service. This is especially the case when you have smart clients or want to have queues over the Internet.

I decided to try to build such a thing, because it didn't seem such a hard problem. I turned out to be wrong, but it was an interesting experiment. Actually, the problem isn't that it is hard to do this, the problem was that I wanted durable queuing, and that led me to a lot of technologies that weren't suitable for my needs.

You can get the bits here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/branches/rhino-queues-1.0

What it is:

  • XCopyable, Zero Administration, Embedded, Async queuing service
  • Robust in the face of networking outages
  • System.Transactions support
  • Fast
  • Works over HTTP

What it isn't:

  • Durable queuing service
  • Wetted by production use

Broadly, using Rhino Queues you get async queues over HTTP. But, it keeps all the data in memory, so if you restart the application, it will lose all waiting messages. It also tries its best (but does not guarantee) to ensure message ordering and ensure delivery.

Let us take a look at the code, and then discuss the implementation details from there.

Server usage:

using(var factory = new Configuration("server")

using (var queue = factory.OpenQueue("echo")) { var message = queue.Recieve(); var str = (string)message.Value; var rev = new string(str.Reverse().ToArray()); using(var remoteQueue = factory.OpenQueue(message.Source)) { Console.WriteLine("Handled message"); remoteQueue.Send(rev); } } }

And the client usage:

using(var factory = new Configuration("client")


	using (var tx = new TransactionScope())
	using (var serverQueue = factory.OpenQueue("echo@server"))
		Console.WriteLine("Sending 'hello there'");
		serverQueue.Send("Hello there").Source = new Destination("echo-reply@client");
	using (var clientQueue = factory.OpenQueue("echo-reply"))
		var msgText = clientQueue.Recieve();

As you can see, we map a nice name to an endpoint, so we can send message to [queue]@[endpoint nice name], sending a message to just [queue] send it to the local machine. It is expected that the nice name for an endpoint would be the machine name, which alleviate the need of syncing names across all endpoints.

You can also see the usage of System.Transaction and without System.Transactions.

One thing that would probably raise questions is why Rhino Queues doesn't have a durable mode, after all, I just spent some time building a durable queue infrastructure. The reason for that is very simple, I spent too much time on that, and I can't spend more time on this at the moment. So I am putting both Rhino Queues and Rhino Queues Storage Disk out there, and I'll let the community bring them together.

A word of warning, Rhino Queues is a cool project, but it is not a replacement for such things as MSMQ. If you can, you probably want to make use of MSMQ, specifically because there is a lot more production time using it.

The purpose of Rhino Commons

Nathan has posted about utility libraries and he includes Rhino Commons there as well.

I don't see Rhino Commons as a utility library. At least not anymore. It certainly started its life as such, but it has grown since then.

Rhino Commons represent my default architecture. This is my base when I am building applications. It has some utility classes, sure, but it contains a lot more foundation and infrastructure components than anything else.

Answer: How many tests?

Two days ago I asked how many tests this method need:

///Get the latest published webcast 
public Webcast GetLatest();

Here is what I came up with:

public class WebcastRepositoryTest : DatabaseTestFixtureBase
	private IWebcastRepository webcastRepository;

	public void TestFixtureSetup()
			"windsor.boo", MappingInfo.FromAssemblyContaining<Webcast>());

	public void Setup()
		webcastRepository = IoC.Resolve<IWebcastRepository>();

	public void Teardown()

	public void Can_save_webcast()
		var webcast = new Webcast { Name = "test", PublishDate = null };
		With.Transaction(() => webcastRepository.Save(webcast));
		Assert.AreNotEqual(0, webcast.Id);

	public void Can_load_webcast()
		var webcast = new Webcast { Name = "test", PublishDate = null };
		With.Transaction(() => webcastRepository.Save(webcast));

		var webcast2 = webcastRepository.Get(webcast.Id);
		Assert.AreEqual(webcast.Id, webcast2.Id);
		Assert.AreEqual("test", webcast2.Name);

	public void When_asking_for_latest_webcast_will_not_consider_any_that_is_not_published()
		var webcast = new Webcast { Name = "test", PublishDate = null };
		With.Transaction(() => webcastRepository.Save(webcast));


	public void When_asking_for_latest_webcast_will_get_published_webcast()
		var webcast = new Webcast { Name = "test", PublishDate = null };
		With.Transaction(() => webcastRepository.Save(webcast));
		var webcast2 = new Webcast { Name = "test", PublishDate = DateTime.Now.AddDays(-1) };
		With.Transaction(() => webcastRepository.Save(webcast2));

		Assert.AreEqual(webcast2.Id, webcastRepository.GetLatest().Id);

	public void When_asking_for_latest_webcast_will_get_the_latest_webcast()
		var webcast = new Webcast { Name = "test", PublishDate = DateTime.Now.AddDays(-2) };
		With.Transaction(() => webcastRepository.Save(webcast));
		var webcast2 = new Webcast { Name = "test", PublishDate = DateTime.Now.AddDays(-1) };
		With.Transaction(() => webcastRepository.Save(webcast2));

		Assert.AreEqual(webcast2.Id, webcastRepository.GetLatest().Id);

	public void When_asking_for_latest_webcast_will_not_consider_webcasts_published_in_the_future()
		var webcast = new Webcast { Name = "test", PublishDate = DateTime.Now.AddDays(-2) };
		With.Transaction(() => webcastRepository.Save(webcast));
		var webcast2 = new Webcast { Name = "test", PublishDate = DateTime.Now.AddDays(2) };
		With.Transaction(() => webcastRepository.Save(webcast2));
		Assert.AreEqual(webcast.Id, webcastRepository.GetLatest().Id);

And the implementation:

public class WebcastRepository : RepositoryDecorator<Webcast>, IWebcastRepository
	public WebcastRepository(IRepository<Webcast> repository)
		Inner = repository;

	public Webcast GetLatest()
		var publishedWebcastsByDateDesc =
			from webcast in Webcasts
			where webcast.PublishDate != null && webcast.PublishDate < SystemTime.Now()
			orderby webcast.PublishDate descending 
			select webcast;

		return publishedWebcastsByDateDesc.FirstOrDefault();

	private static IOrderedQueryable<Webcast> Webcasts
		get { return UnitOfWork.CurrentSession.Linq<Webcast>(); }

I think it is pretty sweet.

Adaptive Domain Models with Rhino Commons

Udi Dahan has been talking about this for a while now. As usual, he makes sense, but I am working in different enough context that it takes time to assimilate it.

At any rate, we have been talking about this for a few days, and I finally sat down and decided that I really need to look at it with code. The result of that experiment is that I like this approach, but am still not 100% sold.

The first idea is that we need to decouple the service layer from our domain implementation. But why? The domain layer is under the service layer, after all. Surely the service layer should be able to reference the domain. The reasoning here is that the domain model play several different roles in most applications. It is the preferred way to access our persistent information (but they should not be aware of persistence), it is the central place for business logic, it is the representation of our notions about the domain, and much more that I am probably leaving aside.

The problem here is there is a dissonance between the requirements we have here. Let us take a simple example of an Order entity.

image As you can see, Order has several things that I can do. It can accept an new line, and it can calculate the total cost of the order.

But those are two distinct responsibilities that are based on the same entity. What is more, they have completely different persistence related requirements.

I talked about this issue here, over a year ago.

So, we need to split the responsibilities, so we can take care of each of them independently. But it doesn't make sense to split the Order entity, so instead we will introduce purpose driven interfaces. Now, when we want to talk about the domain, we can view certain aspect of the Order entity in isolation.

This leads us to the following design:


And now we can refer to the separate responsibilities independently. Doing this based on the type open up to the non invasive API approaches that I talked about before. You can read Udi's posts about it to learn more about the concepts. Right now I am more interested in discussing the implementation.

First, the unit of abstraction that we work in is the IRepository<T>, as always.

The major change with introducing the idea of a ConcreteType to the repository. Now it will try to use the ConcreteType instead of the give typeof(T) that it was created with. This affects all queries done with the repository (of course, if you don't specify ConcreteType, nothing changes).

The repository got a single new method:

T Create();

This allows you to create new instances of the entity without knowing its concrete type. And that is basically it.

Well, not really :-)

I introduced two other concepts as well.

public interface IFetchingStrategy<T>
	ICriteria Apply(ICriteria criteria);

IFetchingStrategy can interfere in the way queries are constructed. As a simple example, you could build a strategy that force eager load of the OrderLines collection when the IOrderCostCalculator is being queried.

There is not complex configuration involved in setting up IFetchingStrategy. All you need to do is register your strategies in the container, and let the repository do the rest.

However, doesn't this mean that we now need to explicitly register repositories for all our entities (and for all their interfaces)?

Well, yes, but no. Technically we need to do that. But we have help, EntitiesToRepositories.Register, so we can just put the following line somewhere in the application startup and we are done.

	typeof (NHRepository<>),
	typeof (IOrderCostCalculator).Assembly);

And this is it, you can start working with this new paradigm with no extra steps.

As a side benefit, this really pave the way to complex multi tenant applications.

Dealing with hierarchical structures in databases

I have a very simple requirement, I need to create a hierarchy of users' groups. So you can do something like:

  • Administrators
    • DBA
      • SQLite DBA

If you are a member of SQLite DBA group, you are implicitly a member of the Administrators group.

In the database, it is trivial to model this:


Except that then we run into the problem of dealing with the hierarchy. We can't really ask questions that involve more than one level of the hierarchy easily.  Some databases has support for hierarchical operators, but that is different from one database to the next. That is a problem, since I need it to work across databases, and without doing too much fancy stuff.

We can work around the problem by introducing a new table:


Now we move the burden of the hierarchy from the query phase to the data entry phase.

From the point of view of the entity, we have this:


Please ignore the death star shape and concentrate on the details :-)

Here is how we are getting all the data in the tree:

public virtual UsersGroup[] GetAssociatedUsersGroupFor(IUser user)
    DetachedCriteria directGroupsCriteria = DetachedCriteria.For<UsersGroup>()
        .CreateAlias("Users", "user")
        .Add(Expression.Eq("user.id", user.SecurityInfo.Identifier))

    DetachedCriteria allGroupsCriteria = DetachedCriteria.For<UsersGroup>()
        .CreateAlias("Users", "user", JoinType.LeftOuterJoin)
        .CreateAlias("AllChildren", "child", JoinType.LeftOuterJoin)
            Subqueries.PropertyIn("child.id", directGroupsCriteria) ||
            Expression.Eq("user.id", user.SecurityInfo.Identifier));

    ICollection<UsersGroup> usersGroups = 
        usersGroupRepository.FindAll(allGroupsCriteria, Order.Asc("Name"));
    return Collection.ToArray<UsersGroup>(usersGroups);

Note that here we don't care whatever we are associated with a group directly or indirectly. This is an important consideration in some scenarios (mostly when you want to display information to the user), so we need some way to chart the hierarchy, right?

Here is how we are doing this:

public virtual UsersGroup[] GetAncestryAssociation(IUser user, string usersGroupName)
    UsersGroup desiredGroup = GetUsersGroupByName(usersGroupName);
    ICollection<UsersGroup> directGroups =
    if (directGroups.Contains(desiredGroup))
        return new UsersGroup[] { desiredGroup };
    // as a nice benefit, this does an eager load of all the groups in the hierarchy
    // in an efficient way, so we don't have SELECT N + 1 here, nor do we need
    // to load the Users collection (which may be very large) to check if we are associated
    // directly or not
    UsersGroup[] associatedGroups = GetAssociatedUsersGroupFor(user);
    if (Array.IndexOf(associatedGroups, desiredGroup) == -1)
        return new UsersGroup[0];
    // now we need to find out the path to it
    List<UsersGroup> shortest = new List<UsersGroup>();
    foreach (UsersGroup usersGroup in associatedGroups)
        List<UsersGroup> path = new List<UsersGroup>();
        UsersGroup current = usersGroup;
        while (current.Parent != null && current != desiredGroup)
            current = current.Parent;
        if (current != null)
        // Valid paths are those that are contains the desired group
        // and start in one of the groups that are directly associated
        // with the user
        if (path.Contains(desiredGroup) && directGroups.Contains(path[0]))
            shortest = Min(shortest, path);
    return shortest.ToArray();

As an aside, this is about as complex a method as I can tolerate, and even that just barely.

I mentioned that the burden was when creating it, right? Here is what I meant:

public UsersGroup CreateChildUserGroupOf(string parentGroupName, string usersGroupName)
    UsersGroup parent = GetUsersGroupByName(parentGroupName);
    Guard.Against<ArgumentException>(parent == null,
                                     "Parent users group '" + parentGroupName + "' does not exists");

    UsersGroup group = CreateUsersGroup(usersGroupName);
    group.Parent = parent;
    return group;

We could hide it all inside the Parent's property setter, but we still need to deal with it.

And that is all you need to do in order to get it cross database hierarchical structures working.

Rhino Igloo – MVC Framework for Web Forms

How many times have the documentation of a product told you that you really should use another product? Well, this is one such case. Rhino Igloo is an attempt to make an application that was mandated to Web Forms more palatable to work with. As such, it is heavily influenced by MonoRail and its ideas.

I speak as the creator of this framework, if at all possible, prefer (mandate!) the use of MonoRail, it will be far easier, all around.

Model View Controller is a problematic subject in Web Forms, because of the importance placed on the view (ASPX page) in the Web Forms framework.

When coming to build an MVC framework on top of Web Forms, we need to consider this limitation in our design. For that reason, Rhino Igloo is not a pure MVC framework, and it works under the limitations of the Web Forms framework.

Along with the MVC framework, Rhino Igloo has tight integration to the Windsor IoC container, and make extensive use of it in its internal operations.

It is important to note that while we cover the main chain of events that occurs when using Rhino Igloo, we are not going into the implementation details here, just a broad usage overview.

I took to liberty of flat out ignoring implementation details that are not necessary to understanding how things flow.

The Views

Overall, the Rhino Igloo4framework hooks into the Web Forms pipeline by utilizing a common set of base classes ( BasePage, BaseMaster, BaseScriptService, BaseHttpHandler ).  This set of base classes provide transparent dependency injection capabilities to the views.

As an example, to the right you can see the Login page. As you can see that the login page has a Controller property, of type LoginController. The controller property is a simple property, which the base page initializes with an instance of the LoginController.

It is important to note that it is not the responsibility of the Login page to initialize the Controller property, but rather its parent, the BasePage. As an example, here is a valid page, which, when run, will have its Controller property set to an instance of a LoginController:

public partial class Login : BasePage


    private LoginController controller;


    public LoginController Controller


        get { return controller; }

        set { controller = value; }



As a result of that, the view no longer needs to be aware of the dependencies that the LoginController itself has. Those are automatically supplied using Windsor itself.

The controller is the one responsible for handling the logic of the particular use case that we need to handle. The view's responsibilities are to call the controller when needed. Here is the full implementation of the above Login_Click method:

public void Logon_Click(object sender, EventArgs e)


    if (!Controller.Authenticate(Username.Text, Password.Text))


      lblError.Text = Scope.ErrorMessage;




As you can see, the view calls to the controller, which can communicate back to the view using the scope.

The Scope

The scope class is the main way controllers get their data, and a common way to pass results from the controller to the view.

The scope class contains properties such as Input and Session, which allows an abstract access to the current request and the user's session.

This also allows replacing them when testing, so we can test the behavior of the controllers without resorting to loading the entire HTTP pipeline.

Of particular interest are the following properties:

·         Scope.ErrorMessage

·         Scope.SuccessMessage

·         Scope.ErrorSummary

These allow the controller to pass messages to the view, and in the last case, the entire validation failure package, as a whole.

Communication between the controller and the view is intentionally limited, in order to ensure separation of concerns between the two.

The controllers

All controllers inherit from Base Controller, which serves as a common base class and a container for a set of utility methods as well.

From the controller, we can access the current context, which is a nicer wrapper around the Http Context and operations exposed by it.

While the controllers are POCO classes, which are easily instantiated and used outside the HTTP context, there are a few extra niceties that are supplied for the controllers.

The first is the Initalize() method, which allows the controller to initialize itself before processing begins.

As an extension to this idea, and taking from MonoRail DataBind approach, there are other niceties:


Inject is an attribute that can be used to decorate a property, at which point the Rhino Igloo framework will take a value from the request, convert it to the appropriate data type and set the property when the controller is first being created. Here is an example:


public virtual Guid CurrentGuid


    Get { return currentGuid;  }

    set { currentGuid = value; }


The framework will search for a request parameter with the key "CurrentGuid", convert the string value to a Guid, and set its value.


InjectEntity performs the same operation as Inject does, but it takes the value and queries the database an entity with the specified key.

[InjectEntity(Name = Constants.Id, EagerLoad = "Customer")]

public Order Order


    get { return order; }

    set { order = value; }


This example shows how we can automatically load an order from the database based on the "Id" request parameter, and eager load the Customer property as well.

The model

Rhino Igloo has integration to Rhino Common's Unit Of Work style, but overall, has no constraints on the way the model is built.

Running on the trunk: Building Rhino Commons

Well, it looks like I have to share the big secret of how to keep Rhino Commons in sync with both NHibernate & Castle. The secret is never opening Visual Studio and doing it all from the command line. Here is the magic formula:

D:\OSS>cd nhibernate
D:\OSS\nhibernate>svn up
D:\OSS>cd Castle
D:\OSS\Castle>svn up
D:\OSS\Castle>copy ..\nhibernate\build\NHibernate-2.0.0.Alpha1-debug\bin\net-2.0\*.* build\net-2.0\debug /y
D:\OSS\Castle>copy ..\nhibernate\build\NHibernate-2.0.0.Alpha1-debug\bin\net-2.0\*.* ..\rhino-tools\SharedLibs\NHibernate /y
D:\OSS\Castle>copy build\net-2.0\debug\*.* ..\rhino-tools\SharedLibs\Castle\*.* /y
D:\OSS>cd rhino-tools
D:\OSS\rhino-tools>msbuild BuildAll.build

Rhino ETL: Thinking about Joins & Merges

Well, I think that I have a solid foundation with the engine and syntax right now, I still have error conditions to verify, but that is something that I can handle as I go along. Now it is time to consider handling joins and merges. My initial thinking was something like:

joinTransform UsersAndOrganizations:
		Row.OrgId = Right["Organization Id"]

The problem is that while this gives me equality operation, I can't handle sets very well, I have to compare each row vs. each row, and I would like to do it better. It would also mean having to do everything in memory, and I am not really crazy about that (nor particularly worried, will solved that when I need it).

Another option is:

joinTransform UsersAndOrganizations:
	left:  [Row.Id, Row.UserName]
	right: [Row.UserId, Row.FullName]
		Row.OrgId = Right["Organization Id"]

This lets me handle it in a better way, since I now have two sets of keys, and I can do comparisons a lot more easily.That is a lot harder to read, though.

Any suggestions?

Both on the syntax and implementation strategies...

Rhino ETL: First Code Drop

First, let me make it clear, it is not ready yet.

What we have:

  • 99% complete on the syntax
  • Overall architecture should be stable
  • The engine works - but I think of it as a spike, it is likely to change significantly.

What remains to be done:

  • Parallelising the work inside a pipeline
  • Better error messages
  • More logging
  • More tests
  • Transforms over sets of rows

Here are a few works about how it works. The DSL is compromised of connection, source, destination and transform, which has one to one mapping with the respective Connection, DataSource, DataDestination and Transform class. In some cases, we just fill the data in (Connection), in some cases we pass a generator (think of it as a delegate) to the instance that we create (DataSource, DataDestination), and sometimes we subclass the class to add the new behavior (transform).

A pipeline is a central concept, and is compromised of a set of pipeline associations, which connect the input/output of components.

Places to start looking at:

  • EtlContextBuilder - Compile the DSL and spits out an instance of:
  • EtlConfigurationContext - the result of the DSL, which can be run using:
  • ExecutionPackage - the result of building the EtlConfigurationContext, this one manages the running of all the pipelines.

There is an extensive set of tests (mostly for the syntax), and a couple of integration tests. As I said, anything that happens as a result of a call to ExecutionPackage.Execute() is suspect and will likely change. I may have been somewhat delegate happy in the execution, it is anonymous delegate that calls anonymous delegate, etc, which is probably too complex for what we need here.

I am putting the source out for review, while it can probably handle most simple things, it very bare bone and subject to change.

You can get it here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino-ETL

But it needs references from the root, so it would be easiest to just do:

svn checkout https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

Rhino.ETL: Full Package Syntax

Okay, here is the full package syntax that I have now, which is enough to express quite a bit, I am now getting started on working on the engine itself, I am going to try the message passing architecture for now, since it is much more flexible.

	ConnectionType: SqlConnection,
	ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;"

source Northwind, Connection="NorthwindConnection":
	Command: "SELECT * FROM Orders WHERE RequiredDate BETWEEN @LastUpdate AND @CurrentDate"
		@LastUpdate = date.Today.AddDays(-1)
		@CurrentTime = ExecuteScalar("NorthwindConnection", "SELECT MAX(RequiredDate) FROM Orders")

transform ToLowerCase:
	for column in Parameters.Columns:
		Row[column] = Row[column].ToLower() if Row[column] isa string

destination Northwind, Connection = "NorthwindConnection":
	Command: """
INSERT INTO [Orders_Copy]
	[CustomerID], [EmployeeID], [OrderDate], [RequiredDate], [ShippedDate],[ShipVia],

pipeline CopyOrders:
	Sources.Northwind >> ToLowerCase(Columns: ['ShipCity','ShipRegion'])
	ToLowerCase >> Destinations.Northwind 

Rhino.ETL: Turning Transformations to FizzBuzz tests

Tobin Harris has asked some questions about how Rhino.ETL will handle transformations.  As you can see, I consider this something as trivial as a FizzBuzz test, which is a Good Thing, since it really should be so simple. Tobin's questions really show the current pain points in ETL processes.

  • Remove commas from numbers
  • transform RemoveCommas:
      for column in row.Columns:
    	if row[column] isa string:
    		row[column] = row[column].Replace(",","")
  • Trim and convert empty string to null
  • transform TrimEmptyStringToNull:
    	for column in row.Columns:
    		val = row[column]
    		if val isa string:
    			row[column] = null if val.Trim().Length == 0
  • Reformat UK postcodes - No idea from what format, and to what format, but let us say that I have "SW1A0AA" and I want "SW1A 0AA"
  • transform IntroduceSpace:
    	row.PostalCode = row.PostalCode.Substring(0,4) +' ' + row.PostalCode.Substring(4)
  • Make title case and Derive title from name and drop into column 'n':
  • transform  MakeTitleCase:
    	row.Title = row.Name.Substring(0,1).ToUpper() + row.Name.Substring(1)
  • Remove blank rows - right now, you would need to check all the columns manually ( here is a sample for one column that should suffice in most cases ), if this is an important, it is easy to add the check in the row class itself, so you can ask for it directly.
  • transform RemoveRowsWithoutId:
    	RemoveRow() if not row.Id
  • Format dates - I think you already got the idea, but never the less, let us take "Mar 04, 2007" and translate it to "2007-03-04", as an aside, it is probably easier to keep the date object directly.
  • transform TranslateDate:
    	row.Date = date.Parse(row.Date).ToString("yyyy-MM-dd")
  • Remove illegal dates
  • transform RemoveBadDate:
    	tmp as date
    	row.Date = null if not date.TryParse(row.Date, tmp)

Things that I don't have an implementation of are:

  • Remove repeated column headers in data - I don't understand the requirement.
  • Unpivot repeated groups onto new rows, Unpivot( startCol, colsPerGroup, numberOfGroups) - I have two problems here, I never groked pivot/unpviot fully, so this require more research, but I have a more serious issue, and that is that this is a transformation over a set of rows, and I can't thing of a good syntax for that, or the semantics it should have.
    I am opened for ideas...

Rhino.ETL: Providing Answers

It would be easier to me to answer a few of the questions that has cropped up regarding Rhino.ETL.

Boo vs. Ruby: Why I choose to go with Boo rather than Ruby. Very simple reasoning, my familiarity with Boo. I can make Boo do a lot of stuff already, I would have to start from scratch on Ruby. I don't see any value in one over the other, frankly, is there a reason behind the preference?

NAnt ETL Tasks: The main problem I have with such an endeavor is that it is back to XML again, if I want to build complex processes, I want them to be easy to follow, and that exclude XML.

Active Warehouse: Interesting idea, but that is using the imperative approach, I want to do something a little more declarative, and I really want it to be on the .Net platform (hence, much more familiar & debuggable). I also in a position where I believe that it would actually take me less time to build the tool than learn a tool in a new language.

Other OSS ETL tools: There are quite a few OSS ETL tools that has been raised, they all share one problem from my perspective, they are not .Net and they are all visual / XML oriented.

I should also mention that I am building this project as preemptive step against the next project ETL's requirements, so I have both time to build it, and I have the craziest itch to scratch after dealing with SSIS in this project. The last time I was this excited about something, Rhino Mocks came out :-)

Framework building: Rhino.ETL Status Report

I am currently working on making this syntax possible, and letting ideas buzz at the back of my head regarding the implementation of the ETL engine itself. This probably requires some explanation. My idea about this is to separate the framework into two distinct layers. The core engine, which I'll talk about in a second, and the DSL syntax.

One of the basic design decisions was that the DSL would be declarative, and not imperative. How does this follow, when I have something like this working:

source ComplexGenerator:
		if Environment.GetEnvironmentVariable("production"):
			return "SELECT * FROM Production.Customers"
			return "SELECT * FROM Test.Customers"

This certainly looks like an imperative language to me... (And no, this isn't really an example of something that I would recommend doing, it is here just make the principal).

The idea is that the DSL is used to build the object graph, then we can execute that object graph. But building it in a two stage fashion make it a lot easier to deal with such things as validation, visualization, etc.

Now, let us more to the core engine, and see what I have been thinking about. Core concepts:

  • Connection - The details about how to get the IDbConnection instance, including such things as number of concurrent connection, etc...
  • DataSource - Contains the details about how to get the data. Command to execute, parameters, associated connection, etc.
  • DataDestination - Contains the details about how to write the data, command / action to execute, parameters, connection, etc.
  • Row - A single row. A simple key <-> value structure with a twist that it can also contain other rows (from a merge/join)
  • Transform - Transform the current row
  • RowSet - a set of rows, obviously, useful for aggregation, lookup, etc. Not really sure how it should come into play yet.

The architecture of the whole thing is based on the pipeline idea, obviously. Now, there are several implementation decisions that should be considered from there.

  • Destination as the driver. The destination is the driver behind this architecture, it request the next row from the pipeline, which starts things rolling. Implementation can be as simple as:
    foreach(Row row in Pipeline.NextRow())
    This has the side affect of making the entire pipeline single threaded per destination, it makes it much easier to implement, and would make it easier to see the flow of things. Parallelism can be managed by multiple pipelines and/or helper threads. The major benefit in parallelism is with the data read/write, and those are limited to a pipeline at any rate.
    It does bring up the interesting question of how to deal with something like merge join, which requires multiply inputs, you would need to manage the different inputs in the merge, but I think that this is mandatory anyway.
  • Message passing architecture. In this architecture, each component (source, transform, destination) is basically an independent object with input/output channels, they all operate without reliance on each other. This is more complex because you can't do the simplest thing of just giving each component a thread, so you need to manage yielding and concurrency to a much higher degree.
    A bigger issue is that it puts a higher burden on writing components.

Right now I am leaning toward going to the single threaded pipeline idea, any comments?

Test driving Rhino.ETL

Here is the first test:

public void EvaluatingScript_WithConnection_WillAddDataSourceToContext() 
	EtlConfigurationContext configurationContext = EtlContextBuilder.FromFile(@"Connections\connection_only.retl");
	Assert.AreEqual(3, configurationContext.Connections.Count, "should have three connections");

There is quite a bit of information just in this test, we introduced the EtlConfigurationContext class, decided that we will create it from a factory, and that we have something that is called a connection. Another decision made was the “retl” extension (Rhino ETL), but that is a side benefit.

The source for this is:

	ConnectionType: SqlConnection,
	ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;",
	ConcurrentConnections: 5
	ConnectionType: OracleConnection,
	ConnectionStringName: "SouthSand"

	ConnectionType: OracleConnection,
	ConnectionStringGenerator: { System.Environment.GetEnvironmentVariable("MyEnvVar") }

You may have wondered about the last one, what does this do? Well, it allows you to do runtime evaluation of something, in this case, it get the value from an env-var, but that has a lot of potential. Here it a test that demonstrate the capabilities:

public void DataSources_ConnectionStringGenerator_CanUseEvnrionmentVariables()


	Environment.SetEnvironmentVariable("MyEnvVar", "2");



Dependency Injection in Web Forms MVC

David Hayden has a post about the issue that you face when you are trying to use dependency injection in Web Forms MVC. I talked about similar issues here.

He points out that this type of code is bad:

    protected Page_PreInit(object sender, EventArgs e)
            // Constructor Injection of Data Access Service and View
            ICustomerDAO dao = Container.Resolve<ICustomerDAO>():
            _presenter = new AddCustomerPresenter(dao, this);
            // Property Injection of Logging
            ILoggingService logger = Container.Resolve<ILoggingService>():
            _presenter.Logger = logger;

This type of code a Worst Practice in my opinion. It means that the view is responsible for setting up the presenter, that is a big No! right there.

He gives the example of WCSF & Object Builder way of doing it, but I don't think that this is a good approach:

public partial class AddCustomer : Page, IAddCustomer
    private AddCustomerPresenter _presenter;

    public AddCustomerPresenter Presenter
            this._presenter = value;
            this._presenter.View = this;
    // ...

The problems that I have with this approach is that the view suddenly makes assumptions about the life cycle of the controller, which is not something that I want it to do. I may want a controller per conversation, for instance, and then where would I be? Another issue is that the view is responsible for injecting itself to presenter, which is not something that I would like to see there as well.

Here is how I do it with Rhino.Igloo:

public partial class AddCustomer : BasePage, IAddCustomer
    private AddCustomerPresenter _presenter;

    public AddCustomerPresenter Presenter
            this._presenter = value;
    // ...

The BijectionFacility will notice that we have a settable property of type that inherit from BaseController, and will get it from the container and inject that in. I don't believe in explicit Controller->View communication, but assuming that I needed that, it would be very easy to inject that into the presenter. Very easy as in adding three lines of code to ComponentRepository's InjectControllers method:

PropertyInfo view = controller.GetType().GetProperty("View");
	view.SetValue(controller, instance);

Rhino Commons, Repository and Unit Of Work

Rhino Commons is a great collection of stuff that I gathered along the way, but never documented. There is a sample application (https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/SampleApplications/Exesto), but not much more.  I want to spend a few minutes talking about the way the data access part of it works. This is post about how it works, not how to make it work (in other words, very little code here).

Before I start, I want to mentions that Rhino Commons is (highly) opinionated software, unlike Castle or NHibernate. It is a separate place where I take what Castle & NHibernate gives me, add a mix of my own best practices and let it run.

The data access part in Rhino Commons revolves around the Unit Of Work, Unit Of Work Factory and the Unit Of Work Application. The main abstraction that Rhino Commons provides in terms on data access is the IRepository<T> interface, which is accessible via the static Repository<T> accessor class. The Unit Of Work class and the IRepository<T> works together to simplify data access code in most cases.

It started as a set of wrapper methods and sort of grew from there. I find that this is very useful for intent revealing code when used in conjunction with the NHibernate Query Generator. Another useful tidbit is that it also serve to handle the differences between NHibernate & Active Record models fairly transparently this allows me to work against the NHibernate model (me likey) without having to define any XML (me likey more!) :-)


As you can see, the IRepository<T> is serving as a way to query NHibernate very easily. In a DDD environment I'll probably inherit from it and add additional methods to it, like CustomersThatTheUserIsAllowedToView(User user), etc.

Another thing that the use of the Repository gives me is the ability to do cross cutting concerns with queries, things like With.Caching, With.Transaction (although I prefer the Automatic Transaction Management Facility more), etc. It is important to note that the default Flush Mode for the session using this approach is Commit only, so Transactions play an important role here.

After the IRepository<T>, we have the Unit Or Work itself, which is basically responsible to manage the NHibernate session / Active Record Scope. Unit Of Work Factory is used to initialize NHibernate / Active Record and to create new Units Of Work.


Note that while you can create Unit Of Work using the IUnitOfWorkFactory, you query it using the Repository. The idea is that most of the time, you are only dealing with the Repository, and dealing with the Unit Of Work is left to a higher level code. I am a big believer in context being king, and this is one case of many where I am using this approach.

If the management of the Unit Of Work is relegated to a higher level code, who is responsible for managing it?

That is the job of the UnitOfWorkApplication, which handles the session per request pattern. This is an HttpApplication rather than the usual Http Module since HttpApplication.Application_start is guaranteed to run once and only once, while Http Modules can be created/disposed based on load.


Notice that the UnitOfWorkApplication is also responsible to create the container, after which it is available to the rest of the application.

Rhino Mocks 2.9.6 Released

Hi, over a month without a release for Rhino Mocks is a Big Thing, I think :-)

Anyway, this is a fix to a problem with the documentation message, which only work on the default expectation (because I was lazy when I wrote this?). The basic issue was that this didn't give the correct message:

IAppLock mockAppLock = this.MockFactory.CreateMock<IAppLock>();
//rig up Dispose() method
   .Message("IAppLock should be disposed.");

Now it will report to you both the fact that you didn't call Dispose, as well as why you should have called dispose, which is a lot more interesting, IMO.

As usual, source and binaries are here.

Rhino Mocks 2.9.5: Extending Rhino Mocks...

After getting a request for implementing a feature that I don't want to have in Rhino Mocks (very narrow scenario, very confusing semantics), I decided to follow my own advice and open up the API.

In this case, it involved making MockRepository.CreateMockObject() and the MockRepository.CreateMockState delegate protected. Now, users of the library can extend it with their own recorders and replayers.

Of course, using this require a bit of understanding of how Rhino Mocks Internals work. Basically, when Rhino Mocks is returning a type, its state is set to RecordMockState. This means that any call to the mock object will be recorded for later time. When Replay or ReplayAll are called, the mock state is queried for its next state.

This architecture allows plugging in additional mock state, and cutomize behaviors as much as you wish. This was what enabled me to add dynamic mocks, and later on delegate mocks and partial mocks. Now this is opened for you, to do as you please. In 95% of the cases, I don't think that you will need this, but it is good to have for the 5% chance.

You can see an example of this extensability here.

I also fixed BackToRecordAll(), making sure that it would erase all state from the mocked object, including when it is on the recording phase.

A more radical change is that now Rhino Mocks will now throw when you try to verify whose expectations failed at some point. This is done to aid valid mocking when the code under test need to handle all exception cases, and so catch the ExpectationViolationException. I lost the mail that request this feature, so I am using this as a way to inform whom ever it was :-)

As usual, code and source are here.

Happy Mocking...

Rhino Tools Worth $1,000,000 ?!

I registered Rhino Tools with Ohloh (which, in Hebrew, means "On no!"), you can find the project site here. Ohloh is a project assesment site, you point it at a repository, and it does its thinks and push a nice graph in the end.

Appernatly, it would take 18 years and nearly million dollars to create Rhino Tools. I'm also very surprised by the amount of code that it found, 75 thousands lines of code?

(Image from clipboard).png

I told it to estimate Castle, I wonder what it would be...

The New Repository: Rhino Tools

Following the advice from the comments, I have managed to consolidate my various OSS SVN repositories into a single one, hosted at source forge.

I was quite suprised that it was so much, to tell you the truth.

The repository URL is:


To checkout, execute:

There is no password necceary for reads. I'll be update the various links on the site to point to the correct location soon.

PluralizingNamingStrategy for NHibenrate

One of the nicest parts of developing with NHibernate is that you can get NHibernate to generate the table from the mapping. This is extremely useful during development, when changes to the domain model and the data model are fairly common. This is even more important when you want to try something on a different database. (For instnace, you may want to use a SQLite database during testing / development, and SQL Server for production, etc). I intend to post soon about unit testing applications that uses NHibernate, and this functionality is a extremely important in those cases.

This functionality has just one issue, unless you explicitly specify the table name, it uses the name of the class as the name of the table, this tends to give me ticks. Mostly because I am used to thinking about tables in plurals. A table named Employee is an anatema, a table named Employees is all right.

Take a look at this simple model:

(Image from clipboard).png

Creating the mapping for it is not hard, just tedious at times. Creating the mapping and then creating the tables is just boring, and the default NHibernate naming rules are not acceptable for me. I decided to take a leaf from Ruby On Rail (actually, I robbed the MonoRail generator for the source code, and converted it from Boo to Ruby) and create an Inflector and a pluralizing naming strategy. Now, when I generate the table structure from the mapping, I get the following database structure:

(Image from clipboard).png

Now this is much nicer.