Ayende @ Rahien

It's a girl

Ayende’s Razor

This is a response to a comment on another post:

Oren, in all seriousness, I thought that problems that were "(a) complex, (b) hard to understand (c) hard to optimize" were the kinds that folks like you and I get paid to solve...

Given two solutions that match the requirements of the problem, the simpler one is the better.

Your queries are going to fail in production

Reading Davy Brion’s post There Is No Excuse For Failing Queries In Production had me twitching slightly. The premise is that since you have tests for your queries, they are not going to fail unless some idiot goes to the database and mess things up,and when that is going to happen, you are going to trip a circuit breaker and fail fast. Davy’s post is a response to a comment thread on another post:

Bertrand Le Roy: Would it make sense for the query to implement IDisposable so that you can put it in a using block? I would be a little concerned about the query failing and never getting disposed of.
Davy Brion: well, if a query fails the underlying SqlCommand object will eventually be garbage collected so i wouldn’t really worry about that. Failed queries shouldn’t happen frequently anyway IMO because they typically are the result of incorrect code.

The problem with Davy’s reasoning is that I can think of several very easy ways to get queries to fail in production even if you go through testing and put your DB under lock & key. Probably the easiest problems to envision are timeouts and transaction deadlocks. Imagine the following scenario, your application is running in production, and the load is big enough (and the request pattern is such that) transaction deadlocks begin to happen. Deadlocks in the DB are not errors. Well, not in the usual sense of the word. They are transient and should be dealt separately.

Using Davy’s approach, we would either have the circuit breaker tripping because of a perfectly normal occurrence or we may have a connection leak because we are assuming that all failures are either catastrophic or avoidable.

How I found a memory leak

I got a report on a memory leak in using Rhino Queues, after a bit of a back and forth just to ensure that it isn’t the user’s fault, I looked into it. Sadly, I was able to reproduce the error on my machine (I say sadly, because it proved the problem was with my code). This post show the analysis phase of tracking down this leak.

That meant that all I had to do is find it. The problem with memory leaks is that they are so insanely hard to track down, since by the time you see their results, the actual cause for that is long gone.

By I fired my trusty dotTrace and let the application run for a while with the following settings:

image

Then I dumped the results and looked at anything suspicious:

image

And look what I found, obviously I am holding buffers for too long. Maybe I am pinning them by making async network calls?

Let us dig a bit deeper and look only at arrays of bytes:

image

I then asked dotTrace to look at who is holding those buffers.

image

And that is interesting, all those buffers are actually held by garbage collector handle. By I have no idea what that is. Googling is always a good idea when you are lost, and it brought me to the GCHandle structure. But who is calling this? I certainly am not doing that. GCHandle stuff is only useful when you want to talk to unmanaged code. My suspicions that this is something related to the network stack seems to be confirmed.

Let us take a look at the actual objects, and here I got a true surprise.

image

4 bytes?! That isn’t something that I would usually pass to a network stream. My theory about bad networking code seems to be fragile all of a sudden.

Okay, what else can we dig out of here? Someone is creating a lot of 4 bytes buffers. But that is all I know so far. DotTrace has a great feature for tracking such things, trackign the actual allocation stack, so I moved back to the roots window and looked at that.

image

What do you know, it looks like we have a smoking gun here. But it isn’t where I expected it to be. At this point, I left dotTrace (I didn’t have the appropriate PDB for the version I was using for it to dig into it) and went to reflector.

image

RertieveColumnAsString and RertieveColumnAsUInt32 are the first things that make me think of buffer allocation, so I checked them first.

image

Still following the bread crumbs trail, I found:

image

Which leads to:

image

And here we have a call to GCHandle.Alloc() without a corresponding Free().

We have found the leak.

Once I did that, of course, I headed to the Esent project to find that the problem has long been solved and that the solution is simply to update my version of Esent Interop.

It makes for a good story, at the very least :-)

Avoid Soft Deletes

One of the annoyances that we have to deal when building enterprise applications is the requirement that no data shall be lost. The usual response to that is to introduce a WasDeleted or an IsActive column in the database and implement deletes as an update that would set that flag.

Simple, easy to understand, quick to implement and explain.

It is also, quite often, wrong.

The problem is that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues. Let us consider the following model:

public class Order
{
    public virtual ICollection<OrderLines> OrderLines { get;set; }
}

public class OrderLine
{
    public virtual Order Order { get;set; }
}

public class Customer
{
    public virtual Order LastOrder { get;set; }
}

Let us say that we want to delete an order. What should we do? That is a business decision, actually. But it is one that is enforced by the DB itself, keeping the data integrity.

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

Now, to be fair, if you are using NHibernate there are very easy ways to handle that, which actually handle cascades properly. It is still a more complex solution.

For myself, I much rather drop the requirement for soft deletes in the first place and head directly to why we care for that. Usually, this is an issue of auditing. You can’t remove data from the database once it is there, or the regulator will have your head on a silver platter and the price of the platter will be deducted from your salary.

The fun part is that it is so much easier to implement that with NHibernate, for that matter, I am not going to show you how to implement that feature, I am going to show how to implement a related one and leave building the auditable deletes as an exercise for the reader :-)

Here is how we implement audit trail for updates:

public class SeparateTableLogEventListener : IPreUpdateEventListener
{
    public bool OnPreUpdate(PreUpdateEvent updateEvent)
    {
        var sb = new StringBuilder("Updating ")
            .Append(updateEvent.Entity.GetType().FullName)
            .Append("#")
            .Append(updateEvent.Id)
            .AppendLine();

        for (int i = 0; i < updateEvent.OldState.Length; i++)
        {
            if (Equals(updateEvent.OldState[i], updateEvent.State[i])) continue;

            sb.Append(updateEvent.Persister.PropertyNames[i])
                .Append(": ")
                .Append(updateEvent.OldState[i])
                .Append(" => ")
                .Append(updateEvent.State[i])
                .AppendLine();
        }

        var session = updateEvent.Session.GetSession(EntityMode.Poco);
        session.Save(new LogEntry
        {
            WhatChanged = sb.ToString()
        });
        session.Flush();

        return false;
    }
}

It is an example, so it is pretty raw, but it should be create what is going on.

Your task, if you agree to accept it, is to build a similar audit listener for deletes. This message will self destruct whenever it feels like.

On PSake

James Kovacks introduced psake ( a power shell based build system )over a year ago, and at the time, I gave it a glance and decided that it was interesting, but not worth further investigation.

This weekend, as I was restructuring my Rhino Tools project, I realized that I need to touch the build system as well. The Rhino Tools build system has been through several projects, and was originally ported from Hibernate. It is NAnt based, complex, and can do just about everything that you want expect be easily understandable.

It became clear to me very quickly that it ain’t going to be easy to change the way it works, nor would it be easy to modify that to reflect the new structure. There are other issues with complex build systems, they tend to create zones of “there be dragons”, where only the initiated go, and even they go with trepidation. I decided to take advantage of the changes that I am already making to get a simpler build system.

I had a couple of options open to me: Rake and Bake.

Bake seemed natural, until I remember that no one touched it in a year or two. Beside, I can only stretch NIH so far :-). And while I know that people rave about rake, I did not want to introduce a Ruby dependency on my build system. I know that it was an annoyance when I had to build Fluent NHibernate.

One thing that I knew that I am not willing to go back to was editing XML, so I started looking at other build systems, ending up running into PSake.

There are a few interesting things that reading about it brought to mind. First, NAnt doesn’t cut it anymore. It can’t build WPF applications nor handle multi targeting well. Second, I am already managing the compilation part of the build using MSBuild, thanks to Visual Studio.

That leave the build system with executing msbuild, setting up directories, executing tests, running post build tools, etc.

PSake handles those well, since the execution environment is the command line. The syntax is nice, just enough to specify tasks and dependencies, but everything else is just pure command line. The following is Rhino Mocks build script, using PSake:

properties { 
  $base_dir  = resolve-path .
  $lib_dir = "$base_dir\SharedLibs"
  $build_dir = "$base_dir\build" 
  $buildartifacts_dir = "$build_dir\" 
  $sln_file = "$base_dir\Rhino.Mocks-vs2008.sln" 
  $version = "3.6.0.0"
  $tools_dir = "$base_dir\Tools"
  $release_dir = "$base_dir\Release"
} 

task default -depends Release

task Clean { 
  remove-item -force -recurse $buildartifacts_dir -ErrorAction SilentlyContinue 
  remove-item -force -recurse $release_dir -ErrorAction SilentlyContinue 
} 

task Init -depends Clean { 
    . .\psake_ext.ps1
    Generate-Assembly-Info `
        -file "$base_dir\Rhino.Mocks\Properties\AssemblyInfo.cs" `
        -title "Rhino Mocks $version" `
        -description "Mocking Framework for .NET" `
        -company "Hibernating Rhinos" `
        -product "Rhino Mocks $version" `
        -version $version `
        -copyright "Hibernating Rhinos & Ayende Rahien 2004 - 2009"
        
    Generate-Assembly-Info `
        -file "$base_dir\Rhino.Mocks.Tests\Properties\AssemblyInfo.cs" `
        -title "Rhino Mocks Tests $version" `
        -description "Mocking Framework for .NET" `
        -company "Hibernating Rhinos" `
        -product "Rhino Mocks Tests $version" `
        -version $version `
        -clsCompliant "false" `
        -copyright "Hibernating Rhinos & Ayende Rahien 2004 - 2009"
        
    Generate-Assembly-Info `
        -file "$base_dir\Rhino.Mocks.Tests.Model\Properties\AssemblyInfo.cs" `
        -title "Rhino Mocks Tests Model $version" `
        -description "Mocking Framework for .NET" `
        -company "Hibernating Rhinos" `
        -product "Rhino Mocks Tests Model $version" `
        -version $version `
        -clsCompliant "false" `
        -copyright "Hibernating Rhinos & Ayende Rahien 2004 - 2009"
        
    new-item $release_dir -itemType directory 
    new-item $buildartifacts_dir -itemType directory 
    cp $tools_dir\MbUnit\*.* $build_dir
} 

task Compile -depends Init { 
  exec msbuild "/p:OutDir=""$buildartifacts_dir "" $sln_file"
} 

task Test -depends Compile {
  $old = pwd
  cd $build_dir
  exec ".\MbUnit.Cons.exe" "$build_dir\Rhino.Mocks.Tests.dll"
  cd $old        
}

task Merge {
    $old = pwd
    cd $build_dir
    
    Remove-Item Rhino.Mocks.Partial.dll -ErrorAction SilentlyContinue 
    Rename-Item $build_dir\Rhino.Mocks.dll Rhino.Mocks.Partial.dll
    
    & $tools_dir\ILMerge.exe Rhino.Mocks.Partial.dll `
        Castle.DynamicProxy2.dll `
        Castle.Core.dll `
        /out:Rhino.Mocks.dll `
        /t:library `
        "/keyfile:$base_dir\ayende-open-source.snk" `
        "/internalize:$base_dir\ilmerge.exclude"
    if ($lastExitCode -ne 0) {
        throw "Error: Failed to merge assemblies!"
    }
    cd $old
}

task Release -depends Test, Merge {
    & $tools_dir\zip.exe -9 -A -j `
        $release_dir\Rhino.Mocks.zip `
        $build_dir\Rhino.Mocks.dll `
        $build_dir\Rhino.Mocks.xml `
        license.txt `
        acknowledgements.txt
    if ($lastExitCode -ne 0) {
        throw "Error: Failed to execute ZIP command"
    }
}

It is about 50 lines, all told, with a lot of spaces and is quite readable.

This handles the same tasks as the old set of scripts did, and it does this without undue complexity. I like it.

The complexity of unity

This post is about the Rhino Tools project. It has been running for a long time now, over 5 years, and amassed quite a few projects in it.

I really like the codebase in the projects in Rhino Tools, but secondary aspects has been creeping in that made managing the project harder. In particular, putting all the projects in a single repository made it easy, far too easy. Projects had an easy time taking dependencies that they shouldn’t, and the entire build process was… complex, to say the least.

I have been somewhat unhappily tolerant of this so far because while it was annoying, it didn’t actively create problems for me so far. The problems started creeping when I wanted to move Rhino Tools to use NHibernate 2.1. That is when I realized that this is going to be a very painful process, since I have to take on the entire Rhino Tools set of projects in one go, instead of dealing with each of them independently. the fact that so many of the dependencies where in Rhino Commons, to which I have a profound dislike, helped increase my frustration.

There are other things that I find annoying now, Rhino Security is a general purpose library for NHibernate, but it makes a lot of assumptions about how it is going to use, which is wrong. Rhino ETL had a dependency on Rhino Commons because of three classes.

To resolve that, I decided to make a few other changes, taking dependencies is supposed to be a hard process, it is supposed to make you think.

I have been working on splitting the Rhino Tools projects to all its sub projects, so each of them is independent of all the others. That increase the effort of managing all of them as a unit, but decrease the effort of managing them independently.

The current goals are to:

  • Make it simpler to treat each project independently
  • Make it easier to deal with the management of each project (dependencies, build scripts)

There is a side line in which I am also learning to use Git, and there is a high likelihood that the separate Rhino Tools projects will move to github. Suversion’s patching & tracking capabilities annoyed me for the very last time about a week ago.

How do you manage to blog so much?

In a recent email thread, I was asked:

How come that you manage to post and comment that much? I bet you spend really loads of time on your blog.

The reason why I ask is because roughly a month ago I've decided to roll out my own programming blog. I've made three or four posts there
(in 5 days) and abandoned the idea because it was consuming waaaaay too much time (like 2-3 hours per post)

The only problem is time. If I'm gonna to post that much every day (and eventually also answer to comments), it seems that my effective working time would be cut by 3+ hours. Daily.  So here comes my original question. How come that you manage to post and comment that much?

And here is my answer:

I see a lot of people in a similar situation. I have been blogging for 6 years now (Wow! how the the time flies), and I have started from a blog that has zero readers to one that has a respectable readership. There are only two things that I do that are in any way unique. First, my "does it fit to be blogged about?" level is pretty low. If it is interesting (to me), it will probably go to the blog.

Second, I don't mind pushing a blog post that requires fixing later. It usually takes me ten minutes to put out a blog post, so I can literally have a thought, post it up and move on, without really noticing it hurting my workflow. And the mere act of putting things in writing for others to read is a significant one, it allows me to look at things in writing in a way that is hard to do when they are just in my head.

It does take time, make no mistakes about that. The 10 minutes blog post is about 30% of my posts, in a lot of cases, it is something that I have to work on for half an hour. In some rare cases, it goes to an hour or two. There have been several dozens of posts that took days. But while it started out as a hobby, it has become part of my work now. The blog is my marketing effort, so to speak. And it is an effective one.

Right now, I have set it up so about once a week I am spending four or five hours crunching out blog posts and future posting them. Afterward, I can blog whenever I feel like, and that takes a lot of the pressure off. It helps that I am trying hard to schedule most of them day after day. So I get a lot of breathing room, but there is new content every day.

That doesn’t mean that I actually blog once a week, though. I push stuff to the blog all the time, but it is usually short notes, not posts that take a lot of time.

As for comments, take a look at my commenting style, I am generally commenting only when I actually have something to add to the conversation, and I very rarely have long comments (if I do, they turn into posts :-) ).

It also doesn’t take much time to reply to most of them, and it creates a feedback cycle that means that more people are participating and reading the blog. It is rare that I post a topic that really stir people up and that I feel obligated to respond to all/most of the comments. That does bother me, because it takes too much time. In those cases, I’ll generally close the comment threads with a note about that.

One final thought, the time I spend blogging is not wasted. It is well spent. Because it is an investment in reputation, respectability and familiarization.

A guide into OR/M implementation challenges: Custom Queries

Continuing to shadow Davy’s series about building your own DAL, this post is about the Executing Custom Queries.

The post does a good job of showing how you can create a good API on top of what is pretty much raw SQL. If you do have to create your own DAL, or need to use SQL frequently, please use something like that rater than using ADO.Net directly.

ADO.Net is not a data access library, it is the building blocks you use to build a data access library.

A few comments on queries in NHibernate. NHibernate uses a more complex model for queries, obviously. We have a set of abstractions between the query and generated SQL, because we are supporting several query options and a lot more mapping options. But while a large amount of effort goes into translating the user desires to SQL, there is an almost equivalent amount of work going into hydrating the queries. Davy’s support only a very flat model, but with NHibernate, you can specify eager load options, custom DTOs or just plain value queries.

In order to handle those scenarios, NHibernate tracks the intent behind each column, and know whatever a column set represent an entity an association, a collection or a value. That goes through a fairly complex process that I’ll not describe here, but once the first stage hydration process is done, NHibernate has a second stage available. This is why you can write queries such as “select new CustomerHealthIndicator(c.Age,c.SickDays.size()) from Customer c”.

The first stage is recognizing that we aren’t loading an entity (just fields of an entity), the second is passing them to the CustomerHealthIndicator constructor. You can actually take advantage of the second stage yourself, it is called Result Transformer, and you can provide you own and set it a query using SetResultTransformer(…);

Tags:

Published at

Originally posted at

Comments (16)

A guide into OR/M implementation challenges: Lazy loading

Continuing to shadow Davy’s series about building your own DAL, this post is about the Lazy Loading. Davy support lazy loading is something that I had great fun reading, it is simple, elegant and beautiful to read.

I don’t have much to say about the actual implementation, NHibernate does things in much the same way (we can quibble about minor details such as who holds the identifier and other stuff, but they aren’t really important). A major difference between Davy’s lazy loading approach and NHibernate’s is that Davy doesn’t support inheritance. Inheritance and lazy loading plays some nasty games with NHibernate implementation of lazy loaded entities.

While Davy can get away with loading the values into the same proxy object that he is using, NHibernate must load them into a separate object. Why is that? Let us say that we have a many to one association from AnimalLover to Animal. That association is lazy loaded, so we put a AnimalProxy (which inherit from Animal) as the value in the Animal property. Now, when we want to load the Animal property, NHibernate has to load the entity, at that point, it discovers that the entity isn’t actually an Animal, but a Dog.

Obviously, you cannot load a Dog into an AnimalProxy. What NHibernate does in this case is load the entity into another object (a Dog instance) and once that instance is loaded, direct all methods calls to the new instance. It sounds complicated, but it is actually quite elegant and transparent from the user perspective.

NHibernate tips & tricks: Efficiently selecting a tree

I run into the following in a code base that is currently being converted to use NHibernate from a hand rolled data access layer.

private void GetHierarchyRecursive(int empId, List<Employee> emps) {
    List<Employee> subEmps = GetEmployeesManagedBy(empId);
    foreach (Employee c in subEmps) {
        emps.Add(c);
        GetHierarchyRecursive(c.EmployeeID, c.ManagedEmployees);
    }
}

GetHierarchyRecursive is a method that hits the database. In my book, a method that is calling the database in a loop is guilt of a bug until proven otherwise (and even then I’ll look at it funny).

When the code was ported to NHibernate, the question of how to implement this came up. And I wanted to avoid having the same pattern repeat itself. The fun part with NHibernate is that it make such things so easy.

session.CreateQuery(
		"select e from Employee e join fetch e.ManagedEmployees"
	)
.SetResultTransformer(new DistinctRootEntityResultTransformer())
.List<Employee>();

This will load the entire hierarchy in a single query. Moreover, it will build the organization tree correctly, so now you can traverse the entire graph without hitting empty spot or causing lazy loading.

WPF tripwires, beware of uncommon culture infos

I got a few bug reports about NH Prof giving an error that looks somewhat like this one:

System.Windows.Markup.XamlParseException: Cannot convert string '0.5,1' in attribute 'EndPoint' to object of type 'System.Windows.Point'. Input string was not in a correct format. 

It took a while to figure out exactly what is going on, but I finally was able to track it down to this hotfix (note that this hotfix only take cares of list separator, while the problem exists for the decimal separator as well. Since I can’t really install the hotfix for all the users of NH Prof, I was left with having to work around that.

I whimpered a bit when I wrote this, but it works:

private static void EnsureThatTheCultureInfoIsValidForXamlParsing()
{
	var numberFormat = CultureInfo.CurrentCulture.NumberFormat;
	if (numberFormat.NumberDecimalSeparator == "." && 
		numberFormat.NumberGroupSeparator == ",") 
		return;
	Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo("en-US");
	Thread.CurrentThread.CurrentUICulture = CultureInfo.GetCultureInfo("en-US");
}

I wonder when I’ll get the bug report about NH Prof not respecting the user’s UI culture…

NH Prof: The ten minutes feature

15:51 - I have about ten more minutes before starting a presentation, and I thought that I might as well make use of the time and do some work on NH Prof.

This feature is supporting filtering of sessions by URL. And I don’t expect it to be very hard.

16:01 - Manual testing is successful, writing a test for it

16:02 – Test passed, ready to commit, but don’t have network connection to do so.

The new feature is integrated into the application, in the UI, filtering appropriately, the works:

image

Just for fun, I did the feature with the projector on in front of the waiting crowd. I love NH Prof architecture.

A guide into OR/M implementation challenges: The Session Level Cache

Continuing to shadow Davy’s series about building your own DAL, this post is about the Session Level Cache.

The session cache (first level cache in NHibernate terms) exists to support one main scenario, in a single session, a row in the database is represented by a single instance. That means that the session needs to track all the instances that it loads and be able to search through them. Davy does a good job covering how it is used, and the implementation is quite similar to the way it is done in NHibernate.

Davy’s implementation is to use a nested dictionary to hold instances per entity type. This is done mainly to support  RemoveAllInstancesOf<TEntity>(), a method that is unique to Day’s DSL. The reasoning for that method are interesting:

When a user executes a custom DELETE statement, there is no way for us to know which entities were actually removed from the database. But if any of those deleted entities happen to remain in the SessionLevelCache, this could lead to buggy application code whenever a piece of code tries to retrieve a specific entity which has already been removed from the database, but is still present in the SessionLevelCache. In order to deal with this scenario, the SessionLevelCache has a ClearAll and a RemoveAllInstancesOf method which you can use from your application code to either clear the entire SessionLevelCache, or to remove all instances of a specific entity type from the cache.

Personally, I think this is using an ICBM to crack eggshells. But I am probably being unfair. NHibernate has much the same issue, if you issue a Delete via SQL or HQL queries, NHibernate doesn’t have a way to track what was actually delete and deal with it. With NHibernate, it doesn’t tend to be a problem for the session level cache, mostly because of usage habits than anything else. The session used to do so rarely have to deal with entities loaded that were deleted by the query issue (and if it does, the user needs to handle that by calling Evict() on all the objects manually). NHibernate doesn’t try to support this scenario explicitly for the session cache. It does support this very feature for the second level cache.

It make sense, though. With NHibernate, in the vast majority of cases deleting is going to be done using NHibernate itself, rather than a special query. With Davy’s DAL, the usage of SQL queries for deletes is going to be much higher.

Another interesting point in Davy’s post is the handling of queries:

When a custom query is executed, or when all instances are retrieved, there is no way for us to exclude already cached entity instances from the result of the query. Well, theoretically speaking you could attempt to do this by adding a clause to the WHERE statement of each query that would prevent cached entities from being loaded. But then you might have to add the cached entity instances to the resulting list of entities anyways if they would otherwise satisfy the other query conditions. Obviously, trying to get this right is simply put insane and i don’t think there’s any DAL or ORM that actually does this (even if there was, i can’t really imagine any of them getting this right in every corner case that will pop up).

So a good compromise is to simply check for the existence of a specific instance in the cache before hydrating a new instance. If it is there, we return it from the cache and we skip the hydration for that database record. In this way, we avoid having to modify the original query, and while we could potentially return a few records that we already have in memory, at least we will be sure that our users will always have the same reference for any particular database record.

This is more or less how NHibernate operates, and for much the same reasoning. But there is a small twist. In order to ensure query coherency between the data base queries and in memory entities, NHibernate will optionally try to flush all the items in the session level cache that have been changed that may be affected by the query. A more detailed description of this can be found here, Davy’s DAL doesn’t do automatic change tracking, so this is not a feature that can be easily added with this prerequisite.

Tags:

Published at

Originally posted at

Comments (7)

Progressive.NET event in Stockholm

I think that it is better to forget to blog about an event than to forget to show up for the event, so I am improving.

I am currently in Stockholm, Sweden, for the Progressive.NET event that starts tomorrow. I am going to do an Intro to NHibernate and two runs of my Advanced NHibernate workshop.

Should be fun :-)

A guide into OR/M implementation challenges: Hydrating Entities

Continuing to shadow Davy’s series about building your own DAL, this post is about hydrating entities.

Hydrating Entities is the process of taking a row from the database and turning it into an entity, while de-hydrating in the reverse process, taking an entity and turning it into a flat set of values to be inserted/updated.

Here, again, Davy’s has chosen to parallel NHibernate’s method of doing so, and it allows us to take a look at a very simplified version and see what the advantages of this approach is.

First, we can see how the session level cache is implemented, with the check being done directly in the entity hydration process. Davy has some discussion about the various options that you can choose at that point, whatever to just use the values from the session cache, to update the entity with the new values or to throw if there is a change conflict.

NHibernate’s decision at this point was to assume that the entity that we have is correct, and ignore any changes made in the meantime to the database. That turn out to be a good approach, because any optimistic concurrency checks that we might want will run when we commit the transaction, so there isn’t much difference from the result perspective, but it does simplify the behavior of NHibernate.

Next, there is the treatment of reference properties, what NHibernate call many-to-one associations. Here is the relevant code (editted slightly so it can fit on the blog width):

private void SetReferenceProperties<TEntity>(
	TableInfo tableInfo, 
	TEntity entity, 
	IDictionary<string, object> values)
{
	foreach (var referenceInfo in tableInfo.References)
	{
		if (referenceInfo.PropertyInfo.CanWrite == false)
			continue;
		
		object foreignKeyValue = values[referenceInfo.Name];

		if (foreignKeyValue is DBNull)
		{
			referenceInfo.PropertyInfo.SetValue(entity, null, null);
			continue;
		}

		var referencedEntity = sessionLevelCache.TryToFind(
			referenceInfo.ReferenceType, foreignKeyValue);
			
		if(referencedEntity == null)
			referencedEntity = CreateProxy(tableInfo, referenceInfo, foreignKeyValue);
								   
		referenceInfo.PropertyInfo.SetValue(entity, referencedEntity, null);
	}
}

There are a lot of things going on here, so I’ll take them one at a time.

You can see how the uniquing process is going on. If we already have the referenced entity loaded, we will get it directly from the session cache, instead of creating a separate instance of it.

It also shows something that Davy’s promised to touch in a separate post, lazy loading. I had an early look at his implementation and it is pretty. So I’ll skip that for now.

This piece of code also demonstrate something that is very interesting. The lazy loaded inheritance many to one association conundrum.  Which I’ll touch on a future post.

There are a few other implications of the choice of hydrating entities in this fashion. For a start, we are working with detached entities this way, the entity doesn’t have to have a reference to the session (except to support lazy loading). It also means that our entities are pure POCO, we handle it all completely externally to the entity itself.

It also means that if we would like to handle change tracking (with Davy’s DAL currently doesn’t do), we have a much more robust way of doing so, because we can simply dehydrate the entity and compare its current state to its original state. That is exactly how NHibernate is doing it. This turn out to be a far more robust approach, because it is safe in the face of method modifying state internally, without going through properties or invoking change tracking logic.

I wanted to also touch about a few things that makes the NHibernate implementation of the same thing a bit more complex. NHibernate supports reflection optimization and multiple ways of actually setting the values on the entity, is also support things like components and multi column properties, which means that there isn’t a neat ordering between properties and columns that make the Davy’s code so orderly.

Tags:

Published at

Originally posted at

Comments (1)

Concepts & Features: A concept cover the whole range, a feature is constrained

A while ago I mentioned the idea of Concepts and Features, I expounded it a bit more in the Feature by Feature post. Concepts and Features is the logical result of applying the Open Closed and Single Responsibility Principles. It boils down to a single requirement:

A feature creation may not involve any design activity.

Please read the original post for details about how this is actually handled. And a the specific example about filtering with NH Prof.

But while I put constraints on what a feature is, I haven’t talked about what a concept is. Oh, I talked about it being the infrastructure, but not much more.

The point about a concept implementation is that it contains everything that a feature must do. To take the filtering support in NH Prof as an example, the concept is responsible for finding all available filters, create and manage the UI, show the filter definition for the user when a filter is active, save and load the filter definition when the application is closed/started, perform the actual filtering, etc. The concept is also responsible for defining the appropriate conventions for the features in this particular concept.

As you can see, a lot of work goes into building a concept. But that work pays off the first time that you can service a user request in a rapid manner. To take NH Prof again, for most things, I generally need about half an hour to put out a new feature within an existing concept.

Concepts & Features in NH Prof: Filtering

A while ago I mentioned the idea of Concepts and Features, I expounded it a bit more in the Feature by Feature post. Concepts and Features is the logical result of applying the Open Closed and Single Responsibility Principles. It boils down to a single requirement:

A feature creation may not involve any design activity.

Please read the original post for details about how this is actually handled. In this case, I wanted to show you how this works in practice with NH Prof. Filtering in NH Prof is a new concept, which allows you to limit what you are seeing in the UI based on some criteria.

The ground work for this feature include the UI for showing the filters, changing the UI to support the actual idea of filtering, and other related stuff. I can already see that we would want to be able to serialize the filters to save them between sessions, but that is part of what the concept is. It is the overall idea.

But once we have the concept outlined, thinking about features related to it is very easy. Let us see how we can build a new filter for NH Prof, one that filter out all the cached statements.

I intentionally choose this filter, because it doesn’t really have any options you need a UI for. Which make my task easier, but here is the UI, FilterByNotCachedView.xaml

<UserControl
	xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
	xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml">
    <Grid>
		<TextBlock>Uncached statements</TextBlock>
    </Grid>
</UserControl>

And the filter implementation:

[DisplayName("Not Cached")]
public class FilterByNotCached: StatementFilterBase
{
	public override bool IsValid
	{
		get { return true; }
	}


	public override Func<IFilterableStatementSnapshot, bool> Process
	{
		get
		{
			return snapshot => snapshot.IsCached == false;
		}
	}
}

Not hard to figure out what this is doing, I think.

This is all the code required, and NH Prof knows that it is going to pick it up from the appropriate places, and make use of it. This is why after adding these two files, I can now run NH Prof and I get the following:

image

I don’t think that I can emphasis enough how important this is to enable creating consistent and easy to handle solutions. Moreover, since most requests tend to be for features, rather than concepts, it is extremely easy to put up a new feature.

The end result is that I have a smart infrastructure layer, where I am actually implementing the concepts, and on top of it I am building the actual features. Just to give you an idea, NH Prof currently have just 6 concepts, and everything else is built on top of it.

A guide into OR/M implementation challenges: CRUD

Continuing the series about Davy’s build your own DAL, this post is talking about CRUD functionality.

One of the more annoying things about any DAL is dealing with the repetitive nature of talking to the data source. One of Davy’s stated goal in going this route is to totally eliminate those. CRUD functionality shouldn’t be something that you have to work with for each entity, it is just something that exists and that you get for free whenever you are using it.

I think he was very successful there, but I also want to talk about his method in doing so. Not surprisingly, Davy’s DAL is taking a lot of concepts from NHibernate, simplifying them a bit and then applying them. His approach for handling CRUD operations is reminiscent of how NHibernate itself works.

He divided the actual operations into distinct classes, called DatabaseActions, and he has things like FindAllAction, InsertAction, GetByIdAction.

This architecture gives you two major things, maintaining things is now far easier, and playing around with the action implementations is far easier. It is just a short step away from Davy’s Database Actions to NHibernate’s event & listeners approach.

This is also a nice demonstrations of the Single Responsibility Principle. It makes maintaining the software much easier than it would be otherwise.

Tags:

Published at

Originally posted at

A guide into OR/M implementation challenges: Mapping

Continuing my shadowing of Davy’s Build Your Own DAL series, this post is shadowing Davy’s mapping post. 

Looking at the mapping, it is fairly clear that Davy has (wisely) made a lot of design decisions along the way that drastically reduce the scope of work he had to deal with. The mapping model is attribute base and essentially supports a simple one to one relation between the classes and the tables. This make things far simpler to deal with internally.

The choice has been made to fix the following:

  • Only support SQL Server
  • Attributed model
  • Primary key is always:
    • Numeric
    • Identity
    • Single Key

Using those rules, Davy create a really slick implementation. About the only complaint that I can make about it is that he doesn’t support having a limit clause of selects.

Take a look at the code he have, it should give you a good idea about what is involved in dealing with mapping between objects and classes.

The really fun part about redoing things that we are familiar with is that we get to ignore all the other things that we don’t want to do which introduce complexity. Davy’s solution works for his scenario, but I want to expand a bit on the additional features that NHibernate has at that layer. It should give you some understanding on the task that NHibernate is solving.

  1. Inheritance, Davy’s DAL is supporting only Table Per Class inheritance. NHibernate support 4 different inheritance model (+ mixed models that we won’t get into here).
  2. Eager loading, something that would add significantly to the complexity of the solution is the ability to load a Product with its Category. That requires that you’ll be able to change the generated SQL dynamically, and more importantly, that you will be able to read it correctly. That is far from simple.
  3. Property that spans multiple columns, it seems like a simple thing, but in actually it affects just about every part of the mapping layer, since it means that all property to column conversion has to take into account multiple columns. It is not so much complex as it is annoying. Especially since you have to either carry the column names or the column indexes all over the place.
  4. Collections, they are complex enough on their own (try to think about the effort involved in syncing changes in a collection in an efficient manner) but the part that really kills with them is trying to do eager loading with them. Oh, but I forgot about one to many vs. many to many. And let us not get into the distinctions between different types of collections.
  5. Optimistic Concurrency, this is actually a feature that would be relatively easy to add, I think. At least, if all you care about is a single type of versioning / optimistic concurrency. NHibernate supports several (detailed here).

I could probably go on, but I think the point was made. As I said before, the problem with taking on something like this is that it either take a day to get the basic functionality going or several months to really get it into shape to handle more than a single scenario.

This series of posts should give you a chance to appreciate what is going on behind the scenes, because you’ll have Davy’s effort at the base functionality, and my comments on what is required to take it to the next level.

NHibernate rarities: SessionFactoryObjectFactory

I had a discussion about session factory management in NHibernate just now, and I was asked why NHibernate isn’t managing that internally.

Well… it does, actually. As you can guess, SessionFactoryObjectFactory is the one responsible for that, and you can use GetNamedInstance() to get a specific session factory.

It isn’t used much (as a matter of fact, I have never seen it used), I am not quite sure why. I think that it is easier to just manage it ourselves. There are very rare cases where you would need more than one or two session factories, and when you run into them, you usually want to do other things as well, such as lazy session factory initialization and de-initialization.

Tags:

Published at

A guide into OR/M implementation challenges: Reasoning

image Davy Brion, one of NHibernate’s committers, is running a series of posts discussing building your own DAL. The reasons for doing so varies, in Davy’s case, he has clients who are adamantly against using anything but the Naked CLR. I have run into similar situations before, sometimes it is institutional blindness, sometimes it is legal reasons, sometimes there are actually good reasons for it, although it is rare.

Davy’s approach to the problem was quite sensible. Deprived of his usual toolset, he was unwilling to give up the advantages inherent to them, so he set up to build them. I did much the same when I worked on SvnBridge, I couldn’t use one of the existing OSS containers, but I wasn’t willing to build a large application without one, so I created one.

I touched about this topic in more details in the past. But basically, a single purpose framework is significantly simpler than a general purpose one. That sounds simple and obvious, right? But it is actually quite significant.

You might not get the same richness that you will get with the real deal, but you get enough to get you going. In fact, since you have only a single scenario to cover, it is quite easy to get the features that you need out the door. You are allowed to cheat, after all :-)

In Davy’s case, he made the decision that using POCO and not worrying about persistence all over the place are the most important things, and he set out to get them.

I am going to shadow his posts in the series, talking about the implications of the choices made and the differences between Davy’s approach and NHibernate. I think it would be a good approach to show the challenges inherit in building an OR/M.

NHibernate Documentation

I know that some people complain about the lack of documentation with NHibernate. That is something that I never truly could grasp. Leaving aside the excellent book, right there in NHibernate’s site we have a huge amount of documentation.

I’ll admit that the plot in the documentation is weak, but I think that reading this document is essential. It covers just about every aspect of NHibernate, and more importantly, it allows you to understand the design decisions and constraints that were used when building it.

Next time you wonder about NHibernate Documentation, just head over there and check. For that matter, I strongly suggest that you will read the whole thing start-to-finish if you are doing anything with NHibernate. It is an invaluable resource.

Authors Review: David Weber & John Ringo

Well, this is going to be a tad different than my usual posts, instead of doing technical post, or maybe a SF book review, I am going to talk about two authors that I really like.

David Weber is the author of the Honor Harrington series, the Prince Roger (in conjunction with Ringo) series, the Dahak series and the Safehold series, as well as other assorted books.

John Ringo is the author of the Prince Roger series, the Posleen series, the Council Wars series and a bunch of other stuff.

Both are really good authors, although I much prefer Weber’s books to Ringo’s. Their Prince Roger series of book was flat out amazing, and it is only after I read a lot more of their material that I can truly grasp how much each author contributed to them.

Ringo is way better in portraying the actual details of military, especially marines, SpecOp, etc. Small teams with a lot of mayhem attached. Unfortunately, he seems to be concentrating almost solely on having stupid opponents. I am sorry, but fighting enemies whose tactic is to shout Charge! isn’t a complex task. He is also way too attached to fighting scenes and a large percentage of his books are dedicated to that.

Well, he is Military SF writer, after all, but I think that he is not dedicating enough time to other stuff related to war. And his characters are sometimes unbelievable. The entire concept he base a lot of the Posleen series on is unbelievable in the extreme. No, not because it is SF. Because it goes against human nature to do some of the thing he portray them doing. The end of the Posleen war, for example, was one such case. The fleet comes back home, violating orders of supposedly friendly alien masters that want to see Earth destroyed by another bunch of aliens.

The problem is not that the fleet comes home in violation of orders, the problem is that it didn’t do so much sooner than that. Humans are not wired for something like that, especially since it was made clear that long before the actual event the fleet was well aware of what is going on. I spent 4 years in a military prison, orders be damned, I know exactly how far you can stretch that. And you can’t stretch it far at all. Not on a large scale with normal psych humans.

Or when one race of aliens is trying to subvert the war effort to help another race kill more humans. That is believable. What isn’t believable that the moment it was made widely spread knowledge they weren’t all exterminated. Instead, Ringo made them rulers. It makes for a good story, but I just didn’t find it believable at all. The books are still good, but the belief suspension required to go on with the story is annoying.

On a more personal note, I think Ringo is also a right winged red necked nutcase. A great author, admittedly, but I find it hard sometimes to not get annoyed about some of the perspectives that I see in the book.

Weber, on the other hand, is great in portraying navies. And I love reading his fight scenes. Mostly because he knows where to put them and how much to stretch them. He also put an amazing amount of depth into the worlds he create in surprisingly little brush strokes.

He does have a few themes that I also find fairly annoying. Chief among them, while not as annoying as having stupid enemies (which he have to some amount as well), is having the “good” side have amazingly good information about the other. Or have one side significantly better armed than the other. Sure it make it easy to make the good guys win, but I like a more realistic scenario.

His recent books in the Honor Harrington universe has portrayed exactly such a scenario and have been a pleasure to read. Beyond anything else, he knows how to give a depth to his universe, and his characters are well polished and likable. I can’t think of a scenario where a character has behaved in a way that I would consider wrong.

Weber is currently my favorite author, and I am eagerly waiting for Torch of Freedom in November.

But hands down, the best series is the Prince Roger series, on which they collaborated. This is a tale that has both Weber’s depth in creating a universe and Ringo’s touch for portraying military people. I wish there would be more books there.

NHibernate Perf Tricks

I originally titled this post NHibernate Stupid Perf Tricks, but decided to remove that. The purpose of this post is to show some performance optimizations that you can take advantage of with NHibernate. This is not a benchmark, the results aren’t useful for anything except comparing to one another. I would also like to remind you that NHibernate isn’t intended for ETL scenarios, if you desire that, you probably want to look into ETL tools, rather than an OR/M developed for OLTP scenarios.

There is a wide scope for performance improvements outside what is shown here, for example, the database was not optimized, the machine was used throughout the benchmark, etc.

To start with, here is the context in which we are working. This will be used to execute the different scenarios that we will execute.

The initial system configuration was:

<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
  <session-factory>
    <property name="dialect">NHibernate.Dialect.MsSql2000Dialect</property>
    <property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
    <property name="connection.connection_string">
      Server=(local);initial catalog=shalom_kita_alef;Integrated Security=SSPI
    </property>
    <property name='proxyfactory.factory_class'>
	NHibernate.ByteCode.Castle.ProxyFactoryFactory, NHibernate.ByteCode.Castle
     </property>
    <mapping assembly="PerfTricksForContrivedScenarios" />
  </session-factory>
</hibernate-configuration>

The model used was:

image 

And the mapping for this is:

<class name="User"
			 table="Users">
	<id name="Id">
		<generator class="hilo"/>
	</id>

	<property name="Password"/>
	<property name="Username"/>
	<property name="Email"/>
	<property name="CreatedAt"/>
	<property name="Bio"/>

</class>

And each new user is created using:

public static User GenerateUser(int salt)
{
	return new User
	{
		Bio = new string('*', 128),
		CreatedAt = DateTime.Now,
		Email = salt + "@example.org",
		Password = Guid.NewGuid().ToByteArray(),
		Username = "User " + salt
	};
}
Our first attempt is to simply check serial execution speed, and I wrote the following (very trivial) code to do so.
const int count = 500 * 1000;
var configuration = new Configuration()
	.Configure("hibernate.cfg.xml");
new SchemaExport(configuration).Create(false, true);
var sessionFactory = configuration
	.BuildSessionFactory();

var stopwatch = Stopwatch.StartNew();

for (int i = 0; i < count; i++)
{
	using(var session = sessionFactory.OpenSession())
	using(var tx = session.BeginTransaction())
	{
		session.Save(GenerateUser(i));
		tx.Commit();
	}

}

Console.WriteLine(stopwatch.ElapsedMilliseconds);

Note that we create a separate session for each element. This is probably the slowest way of doing things, since it means that we significantly increase the number of connections open/close and transactions that we need to handle.

This is here to give us a base line on how slow we can make things, to tell you the truth. Another thing to note that this is simply serial. This is just another example of how this is not a true representation of how things happen in the real world. In real world scenarios, we are usually handling small requests, like the one simulated above, but we do so in parallel. We are also using a local database vs. the far more common remote DB approach which skew results ever furhter.

Anyway, the initial approach took: 21.1 minutes, or roughly a row every two and a half milliseconds, about 400 rows / second.

I am pretty sure most of that time went into connection & transaction management, though.

So the first thing to try was to see what would happen if I would do that using a single session, that would remove the issue of opening and closing the connection and creating lots of new transactions.

The code in question is:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	for (int i = 0; i < count; i++)
	{
		session.Save(GenerateUser(i));
	}

	tx.Commit();
}

I expect that this will be much faster, but I have to explain something. It is usually not recommended to use the session for doing bulk operations, but this is a special case. We are only saving new instances, so the flush does no unnecessary work and we only commit once, so the save to the DB is done in a single continuous stream.

This version run for 4.2 minutes, or roughly 2 rows per millisecond about 2,000 rows / second.

Now, the next obvious step is to move to stateless session, which is intended for bulk scenarios. How much would this take?

using (var session = sessionFactory.OpenStatelessSession())
using (var tx = session.BeginTransaction())
{
	for (int i = 0; i < count; i++)
	{
		session.Insert(GenerateUser(i));
	}
	tx.Commit();
}

As you can see, the code is virtual identical. And I expect the performance to be slightly improved but on par with the previous version.

This version run at 2.9 minutes, about 3 rows per millisecond and close to 2,800 rows / second.

I am actually surprised, I expected it to be faster, but it was much faster.

There are still performance optimizations that we can make, though. NHibernate has a rich batching system that we can enable in the configuration:

<property name='adonet.batch_size'>100</property>

With this change, the same code (using stateless sessions) runs at: 2.5 minutes and at 3,200 rows / second.

This doesn’t show as much improvement as I hoped it would. This is an example of how a real world optimization is actually failing to show its promise in a contrived example. The purpose of batching is to create as few remote calls as possible, which dramatically improve performance. Since we are running on a local database, it isn’t as noticeable.

Just to give you some idea about the scope of what we did, we wrote 500,000 rows and 160MB of data in a few minutes.

Now, remember, those aren’t numbers you can take to the bank, their only usefulness is to know that by a few very simple acts we improved performance in a really contrived scenario by 90% or so. And yes, there are other tricks that you can utilize (preparing commands, increasing the batch size, parallelism, etc). I am not going to try to outline then, though. For the simple reason that performance should be quite enough for everything who is using an OR/M. That bring me back to me initial point, OR/M are not about bulk data manipulations, if you want to do that, there are better methods.

For the scenario outlined here, you probably want to make use of SqlBulkCopy, or the equivalent for doing this. Just to give you an idea about why, here is the code:

var dt = new DataTable("Users");
dt.Columns.Add(new DataColumn("Id", typeof(int)));
dt.Columns.Add(new DataColumn("Password", typeof(byte[])));
dt.Columns.Add(new DataColumn("Username"));
dt.Columns.Add(new DataColumn("Email"));
dt.Columns.Add(new DataColumn("CreatedAt", typeof(DateTime)));
dt.Columns.Add(new DataColumn("Bio"));

for (int i = 0; i < count; i++)
{
	var row = dt.NewRow();
	row["Id"] = i;
	row["Password"] = Guid.NewGuid().ToByteArray();
	row["Username"] ="User " + i;
	row["Email"] = i + "@example.org";
	row["CreatedAt"] =DateTime.Now;
	row["Bio"] =  new string('*', 128);
	dt.Rows.Add(row);
}

using (var connection = ((ISessionFactoryImplementor)sessionFactory).ConnectionProvider.GetConnection())
{
	var s = (SqlConnection)connection;
	var copy = new SqlBulkCopy(s);
	copy.BulkCopyTimeout = 10000;
	copy.DestinationTableName = "Users";
	foreach (DataColumn column in dt.Columns)
	{
		copy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
	}
	copy.WriteToServer(dt);
}

And this ends up in 49 seconds, or about 10,000 rows / second.

Use the appropriate tool for the task.

But even so, getting to 1/3 of the speed of SqlBulkCopy (the absolute top speed you can get to) is nothing to sneeze at.

Tags:

Published at

Originally posted at

Comments (44)

How to lose respect and annoy people

This post is going to be short, because I don’t think that at my current state of mind, I should be posting anything that contain any opinions. This is a reply to a blog post from X-tensive titled: Why I don't believe Oren Eini, which is a reply to my posts about the ORMBattle.NET nonsense benchmarks.

I will just say that if I was willing to let X-tensive enjoy the benefit of the doubt the reason they run the benchmarks the way they did, the post has made it all too clear that doubt shouldn’t be entertained.

Here are a few quotes from the post that I will reply too.

I confirm this is achieved mainly because of automatic batching of CUD sequences. It allows us to outperform frameworks without this feature by 2-3 times on CUD tests. So I expect I will never see this feature implemented in NHibernate.

That is interesting, NHibernate has batching support since 2006. The tests that X-tensive did never even bother to configure NHibernate to use batching. For that matter, they didn’t even bother to do the most cursory check. Googling NHibernate Batching gives you plenty of information about this feature.

Currently (i.e. after all the fixes) NHibernate is 8-10 times slower than EF, Lightspeed and our own product on this test.

Yes, for a test that was explicitly contrived to show that.

Since we track performance on our tests here, we'll immediately see, if any of these conditions will be violated. And if this ever happen, I expect Oren must at least publicly apologize for his exceptional rhino obstinacy ;)

Um, no, I don’t think so. Moreover, not that I think that it would do much good, but I still think that personal attacks are NOT cool. And no, a smiley isn’t going to help, not is something like this:

P.S. Guys, nothing personal.