Ayende @ Rahien

Refunds available at head office

Ideal deployment disaster

Today we botched deployment for production. We discovered an unforeseen complication related to the production network topography which means that (to simplify) transactions wouldn’t work.

I am happy about this. Not because of the failed production deployment, but because of how it happened. We worked on deploying the app for half an hour, found the problem, then spent the next few hours trying to figure out the exact cause and options to fix it. We actually came up with four ways of resolving the issue, from the really hackish to what I believe that would be appropriate, but would take some time to do using the proper process for production changes.

So, why am I happy?

We spent time on that from noon to 3 PM (look Ma, no midnight deployment), and we actually aren’t going live any time soon. The reason for deploying for production was to practice, to see what strange new things would show up in production. They did show up, we will fix them, at our leisure, without trying to solve a hairy problem while staring at the screen with blood shot eyes and caffeine positioning.  At no time there was someone breathing down our neck, or the IT guys to argue with, or emergency change request or any of the usual crap that usually comes with deploying to production if you aren’t being careful.

Practice the way you play, and since pushing to production is usually such a mess anyway, you want to practice this as soon as possible. And not just deploying to another environment (we already did that last week) to test your deployment process. You want to deploy to the actual production environment, so you can get permissions, firewalls, subnets and all the rest sorted out up front.

Looking for PHP Developer

My client needs a PHP developer to work on a Cake PHP application.

This is probably a full time position, but I think that a part time position is also available.

If you are interested, please contact me.

Solving problems with messaging: Creating a new user

Another example, which is naturally asynchronous, is the way most web sites create a new user. This is done in a two stage process. First, the user fills in their details and initial validation is done on the user information (most often, that we have no duplicate user names).

Then, we send an email to the user and ask them to validate their email address. It is only when they validate their account that we will create a real user and let them login into the system. If a certain period of time elapsed (24 hours to a few days, usually), we need to delete any action that we perform and make that user name available again.

When we want to solve the problem with messaging, we run into an interesting problem. The process of creating the user is a multi message process, in which we have to maintain the current state. Not only that, but we also need to deal with the timing issues build into this problem.

It gets a bit more interesting when you consider the cohesiveness of the problem. Let us consider a typical implementation.

First, we have the issue of the CreateUser page:

image

Then we have the process of actually validating the user:

image

And, lest us forget, we have a scheduled job to remove expired user account reservations:

image

We have the logic for this single feature scattered across three different places, which execute in different times, and likely reside in different projects.

Not only that, but if we want to make the experience pleasant for the user, we have a lot to deal with. Sending an email is slow. You don’t want to put this as a part of synchronous process, if only because of the latency it will add to showing a response to the user. It is also an unreliable process. And we haven’t even started to discuss error handling yet.

For that matter, sending an email is not something that you should just new an SmtpClient for. You have to make sure that someone doesn’t use your CreateUser page to bomb someone else’s mailbox, you need to keep track of emails for regulatory reasons, you need to detect invalid emails (from SMTP response), etc.

Let us see how we can do this with async messaging, first we will tackle the register user and send an email to validate their email:

image

When the user click on the link in their email, we have the following set of interactions:

image

And, of course, we need to expire the reserved username:

image

In the diagrams, everything that happens in the App Server is happening inside the context of a single saga. This is a single class that manages all the logic relating to the creation of a new user. That is good in itself, but I gain a few other things from this approach.

Robustness from errors and fast response times are one thing, of course, but there are other things that I am not touching here. In the previous example, I have shown a very simplistic approach to handling the behavior, where everything is happening inline. This is done because, frankly, I didn’t have the time to sit and draw all the pretty boxes for the sync example.

In the real world, we would want to have pretty much the same separation in the sync example as well. And now we are running into even more “interesting” problems. Especially if we started out with everything running locally. The sync model make it really hard to ignore the fallacies of distributed computing. The async model put them in your face, and make you deal with them explicitly.

The level of complexity that you have to deal with with async messaging remains more or less constant when you try to bring the issues of scalability, fault tolerance and distribution. They most certainly do not stay the same when you have sync model.

Another advantage of this approach is that we are using the actor model, which make it very explicit who is doing what and why, and allow us to work on all of those in an independent fashion.

The end result is a system compromised of easily disassembled parts. It is easy to understand what is going on because the interactions between the parts of the system are explicit, understood and not easily bypassed.

NHibernate HQL AST Parser

Steve from iMeta has been doing a lot of work on the HQL AST Parser. For a long time, that has been a really troublesome pain point for us, since this is a prerequisite for a lot of other features. It is also one of two parts of NHibernate that really need significant refactoring because the way they are built right now make it hard to do stuff (the second being the semantic model for the mapping, which the Fluent NHibernate guys are working on).

Just to give you two features that should make you drool which depends on this work:

  • Full Linq implementation
  • Set based DML operations on top of the domain model

In true Open Source manner, you can view the work being done right now: http://unhaddins.googlecode.com/svn/trunk, Checkout the ANTRL-HQL project.

This is something that several members of the NHibernate project has tried doing in the past, but the scope of the work is very big, and require full time work for an extended period of time. iMeta has been sponsoring Steve’s work on NHibernate, which make this possible.

I have been going through the code, and I am literally jumping up and down in excitement. It is still very early, but it is already clear that Steve has taken us much farther than before, and it is possible to see the end. The most significant milestone has been reached, and we are currently able to execute ( a very simple ) query and get the results back:

image

Yes, it also integrates with NH Prof :-)

5 Books that I am waiting for…

Well, technically I am not expecting this any longer, since it is out.

image

One of the best sources for information about NHibernate is finally available!

Moving on to books that aren’t out yet, but I am really looking for, and have nothing to do with technology (well, they are both SF, but other than that):

image image

Both are really good, and I have been reading (well, hearing) the series for a long time. Don’t expect to hear much from me the next day or two after each of them out.

And, of course, this book is on Amazon as well:

image

Although you can read it right now, it feels more real that you can see it in Amazon.

I am still trying to convince Manning that Oren Eini is not a fiction, and that it should really appear on the cover as well.

And this is not a book, but I am very excited about this:

image

I should have this next week, and then my time would get even busier, because the Kindle is the ultimate impulse read device.

Progressive.NET Event

Progressive .NET Tutorials 2009Skills Matter is co-organising The Progressive .NET Tutorials with some of us. This will be a really exciting 3 day long event with 2 tracks featuring half day and full day Tutorials by me, David Laribee, Gojko Adzic, Hammet, Ian Cooper, Mike Hadlow, Scott Belware and Sebastien Lambla.
For the full programme and description of the tutorials, check out: http://skillsmatter.com/event/open-source-dot-net/progressive-dot-net-exchange

If you are going to book before Feb 28th, just put SM1368-622459-33L (in the Promo Code field) and  pay just £350 (normal price £1000). I hear that the tickets are going fast, so if you would like to secure a place and claim your discount – do hurry up!

As a bonus, you can stay for the next week and enroll in the NHibernate course that I am also giving.

The last event where we did something similar was Kazien Conf, and we are going to try to reproduce the same thing in the Seattle ALT.Net conference in a week as well.

I can’t talk about what will happen next week, but the previous workshops were a tremendous success. We are going to run 12 sessions during this event, and I am finding myself in the problematic position of wanting to be in at least three places at the same time, while I can only manage two (maybe I need to let Barid out as well*).

* Bonus points for anyone who manages to track down that reference.

Dictionary Puzzler

A while ago in the NHibernate mailing list we got a report that NHibernate is making use of a dictionary with an enum as the key, and that is causing a big performance issue.

The response was almost unanimous, I think, “what?! how can that be?!!?”. Several people went in to and tried to figure out what is going on there. The answer is totally non oblivious, Dictionary<K,V> force boxing for any value type that is used as the key.

That sound completely contradictory to what you would expect, after all, one of the major points in generics was the elimination of boxing, so what happened?

Well, the issue is that Dictionary<K,V> has to compare the keys, and for that, it must make some assumptions about the actual key. It is abstracted into EqualityComparer, and that is where the actual problem starts. EqualityComparer has some special cases for the common types (anything that is IEquatable<T>, which most of the usual suspects implements), to speed this up.

The problem is that the fall back is to an ObjectComparer, and that, of course, will box any value type.

And enum does not implements IEquatable<T>…

Omer has a good coverage on the subject, with really impressive results. Take a look at his results.

image

I am not going to steal his thunder, but I suggest going over and reading the code, it is very elegant.

Unmanaged API “fun”, finding out MSMQ subqueues names in C#

I needed to get a list of subqueues of a queue, and there is no way of actually doing that in managed code. I decided that I can still use P/Invoke and just make the appropriate unmanaged calls. In hindsight, it would have been significantly simpler to just build an unmanaged DLL in C++/CLR, but I was already too deep, and I can never remember how to properly compile a C++ app anymore.

I finally had to slap myself a couple of times and remind me that I was a C++ developer for a long time, and I damn well should remember how to treat memory like it was something both sacred and abused. I had tested that on a Windows 2008 64 bits machine. During this project, I also found out that there is basically no information at all about this sort of thing, so I am putting the code here.

public class MsmqUtil
{
    internal const int PROPID_MGMT_QUEUE_SUBQUEUE_NAMES = 27;
    internal const ushort VT_NULL = 1;
 
    public static unsafe string[] GetMsmqSubQueueNames(string queueFormatName)
    {
        var props = new MQMGMTPROPS {cProp = 1};
        var variant = new MQPROPVariant {vt = VT_NULL};
        try
        {
            props.aPropID = Marshal.AllocHGlobal(sizeof (int));
            Marshal.WriteInt32(props.aPropID, PROPID_MGMT_QUEUE_SUBQUEUE_NAMES);
 
            props.aPropVar = Marshal.AllocHGlobal(Marshal.SizeOf(typeof (MQPROPVariant)));
            Marshal.StructureToPtr(new MQPROPVariant {vt = VT_NULL}, props.aPropVar, false);
 
            props.status = Marshal.AllocHGlobal(sizeof (int));
            Marshal.WriteInt32(props.status, 0);
 
            var result = MQMgmtGetInfo(null, "queue=" + queueFormatName, ref props);
            if (result != 0)
                throw new Win32Exception(result);
 
            if (Marshal.ReadInt32(props.status) != 0)
            {
                return null;
            }
 
            variant = (MQPROPVariant) Marshal.PtrToStructure(props.aPropVar, typeof (MQPROPVariant));
 
            var subQueues = new List<string>();
            for (int i = 0; i < variant.Val.calpwstr.cElems; i++)
            {
                var item = new string(variant.Val.calpwstr.pElems[i]);
                subQueues.Add(item);
            }
            return subQueues.ToArray();
        }
        finally
        {
            if (variant.vt != VT_NULL)
            {
                for (var i = 0; i < variant.Val.calpwstr.cElems; i++)
                {
                    MQFreeMemory(variant.Val.calpwstr.pElems[i]);
                }
                MQFreeMemory(variant.Val.calpwstr.pElems);
            }
            Marshal.FreeHGlobal(props.aPropID);
            Marshal.FreeHGlobal(props.aPropVar);
            Marshal.FreeHGlobal(props.status);
        }
    }
 
    [DllImport("mqrt.dll")]
    internal static extern int MQMgmtGetInfo([MarshalAs(UnmanagedType.BStr)] string computerName,
                                             [MarshalAs(UnmanagedType.BStr)] string objectName,
                                             ref MQMGMTPROPS mgmtProps);
 
    [DllImport("mqrt.dll")]
    internal static extern unsafe int MQFreeMemory(void* queue);
 
    [StructLayout(LayoutKind.Sequential)]
    internal struct CALPWSTR
    {
        public uint cElems;
        public unsafe char** pElems;
    }
 
    [StructLayout(LayoutKind.Sequential)]
    internal struct MQMGMTPROPS
    {
        public uint cProp;
        public IntPtr aPropID;
        public IntPtr aPropVar;
        public IntPtr status;
    }
 
    [StructLayout(LayoutKind.Sequential, Size = 16)]
    internal struct MQPROPVariant
    {
        public ushort vt;
        public ushort wReserved1;
        public ushort wReserved2;
        public ushort wReserved3;
        public UnionedVariant Val; //8
    }
 
    [StructLayout(LayoutKind.Explicit, Size = 8)]
    internal struct UnionedVariant
    {
        [FieldOffset(0)] public uint ulVal; /* VT_UI4    */
        [FieldOffset(0)] public CALPWSTR calpwstr; /* VT_VECTOR | VT_LPWSTR  */
    }
}

Challange: The sample legacy database from hell

I need a legacy database for my NHibernate course. I can take some of the creepy ones in the NHibernate test suite, but I think that would be evil to the students.

Anyone can come up with a good (um, bad) legacy database that is also small enough to be a good example?

Challenge: write a self extracting program

I had to do this yesterday, and it was fun enough to post about it. Not only that, but the solution that I came up with is enough of a twist that I think would make this hard for many people.

Here is the challenge:

Given a zip file, produce an executable that when run on another machine, will unzip the content of the zip file on that machine. (nitpicker corner: yes, there are many ways to cheat this, but don’t try to cheat). You may use any library and process that you want, but the end result must be a standalone file that can be copied to an separate machine and when execute will extract the content of the zip file.

You can assume that on that separate machine you will have the .Net framework installed, but you cannot depend on the presence of external libraries. I’ll note that the default compression streams in the .Net framework are not applicable here, since they can’t handle zip files (which contains a directory structure), they can only handle zip streams.

I took me roughly 30 minutes, most of which were spent dealing with ICSharpCode.SharpZipLib API, which is not something that I dealt with before.

Production Errors Best Practices

I have been practicing deployment lately, and today we have gotten everything squared away very nicely, only to be confronted with a failure when we actually run the app.

The error was: KeyNotFoundException, because we forgot to move some data to the QA database.  We fixed the error, but I also change the error message that we were getting.

Now, we get the following error for this scenario:

Rhino.ServiceBus.Exceptins.InvalidDeploymentException: ERROR_ORDER_ATTRIB_NOT_FOUND: Could not find order attribute with the name: CorrelationId
This is usually an indication that the database has not been properly prepared for the deployment.
You need to add the order attribute to the appropriate table.
Important! If the order attribute 'CorrelationId' is missing, it is likely that other required order attributes are also missing.
Please refer to the deployment guide about the required attributes. If you don't find the required attributes in the deployment guide, please shout at the developer, preferably at 2 AM.

There are a couple of important things that I want to discuss about this error:

  • It is very explicit about what is going on.
  • It contains the information about how to fix the error.
  • It warns about additional problems that this may relate to.
  • It contains a uniquely identifying string that can be used to clearly communicate the error, placed in Google or put in a knowledge base.

Resolving graphics problems with NH Prof

Christopher has good coverage on how we solved and isolated a problem that we  run into using NH Prof. In essence, the issue was messed up text being reported by some of the users.

To be frank, it was a very frustrating issue, because none of us could come up with a repro, and the only thing that sometimes seemed to help is updating the graphic drivers. Having to update your graphic drivers to use NH Prof isn’t something that I consider appropriate. It reminded me of the time that to deploy a system that I built, we had to replace the organization domain controller. It is… not a place where you want to be.

We had accidently reproduced it in a SharedView session, and I got to see Christopher and Rob get to the root of the issue in a matter of minutes. It is not an area that I have any sort of expertise on, and it was fascinating to watch.

And, of course, we got rid of the problem that NH Prof had.

Reducing the cost of getting a stack trace

image I am trying to find ways to reduce the cost of the stack trace used in NH Prof. The access to the stack trace is extremely valuable, but there is a significant cost of using it, so we need a better way of handling this. I decided to run a couple of experiment running this.

All experiments were run 5,000 times, on a stack trace of 7 levels.

  • new StackTrace(true) - ~600ms
  • new StackTrace(false) - ~150ms

So right there, we have a huge cost saving, but let us continue a bit.

  • throwing exception - ~400ms

That is not so good, I have to say.

Well, when in doubt, cheat!

Using reflector and some _really_ nasty stuff, I came up with this:

var stackFrameHelperType = typeof(object).Assembly.GetType("System.Diagnostics.StackFrameHelper");
var GetStackFramesInternal = Type.GetType("System.Diagnostics.StackTrace, mscorlib").GetMethod("GetStackFramesInternal",BindingFlags.Static|BindingFlags.NonPublic);
 
var method = new DynamicMethod("GetStackTraceFast",typeof(object),new Type[0],typeof(StackTrace),true);
 
var generator = method.GetILGenerator();
generator.DeclareLocal(stackFrameHelperType);
generator.Emit(OpCodes.Ldc_I4_0);
generator.Emit(OpCodes.Ldnull);
generator.Emit(OpCodes.Newobj, stackFrameHelperType.GetConstructor(new[] { typeof(bool), typeof(Thread) }));
generator.Emit(OpCodes.Stloc_0);
generator.Emit(OpCodes.Ldloc_0);
generator.Emit(OpCodes.Ldc_I4_0);
generator.Emit(OpCodes.Ldnull);
generator.Emit(OpCodes.Call, GetStackFramesInternal);
generator.Emit(OpCodes.Ldloc_0);
generator.Emit(OpCodes.Ret);
getTheStackTrace = (Func<object>)method.CreateDelegate(typeof(Func<object>));

Calling getTheStackTrace 5000 times with depth of 7 frames is… 54ms. And now that is a horse of a different color indeed.

And the best part is, I can use the StackFrameHelper as a key into cached stack traces.

And yes, I am aware that if anyone from the CLR team is reading this, a ninja team will be dispatched to… discuss with me the notion of supported operations vs. unsupported operation.

Pull vs. Push models for UI updates

I thought that I would give this topic a bit of discussion, because it seems to me that a lot of people are making false assumptions here.

Let us consider a push model for an application UI. Assuming that data binding works, and we have implemented INotifyPropertyChanged and INotifyCollectionChanged, push model is actually the easiest to work with. Change the model, and the UI is updated automatically.

Very often, this can look like magic. And it is a damn simple model to work with.

It also breaks down completely when the number of updates that you have to deal with goes up beyond what is humanly perceivable.

Let us imagine that we have to build a system that needs to provide the user with a real time view of a system that generate 5,000 events per second.

Guess what? There is a serious issue in the actual problem definition. No human can deal with 5,000 events per second. More than that, even if you are displaying an aggregation of the data, it is still not going to work. Having to deal with 5,000 UI updates will usually kill the system. It also means that the system is doing way too much work. Just processing those UI events is going to take a lot of time. Time that we could dedicate to actually making the UI work for the user.

A better model in this scenario is to split the work. We have a backend that deals with the data aggregation, and we have the UI poll the backend fora snapshot of the information periodically (every .2 - .5 seconds).

The advantage is that now we have far less UI work to do. More than that, we don't have to go through all the work to show the current state.

Let us assume that we have 5,000 events per second, that we update the UI every .5 seconds and that only the last 500 events are actually relevant for the UI.

Using the push model, we would have to actually process all 5,000 changes. Using the pull model, every time we pull the new snapshot, we get the already processed data. So in this case, we throw away all the data that is not relevant to us (everything not in the last 500 events that showed up since the last batch. Trying to implement the same for a push model require a fairly complex system, especially if you have a rich model, and you don't have a good way of telling what the implications of change would be.

The reason that this matters is that UI operations are expensive, while backend operations tend to be very cheap.

Add that to the human perception time lag, and you get a smoother user experience. And since now the UI work that needs to be done is so much less, and since the polling happens when the UI isn't busy doing other things, we also have a more responsive UI.

Typemock Open AOP

I just stumbled upon Typemock Open AOP, I know about as little as it is possible to know about this, but I am going to find out more, fast.

It looks like the idea solution for a number of problems that I am facing at the moment. I wanted the power that TypeMock has for a very long time, and it certainly looks like the ideal AOP story. From talking with Eli in 2007, the TypeMock approach impose an overhead of about 5%, which is quite reasonable, in most scenarios.

On Hadoop

Yesterday or the day before that I read the available chapters for Hadoop in Action. Hadoop is a Map Reduce implementation in Java, and it includes some very interesting ideas.

The concept of Map Reduce isn't new, but I liked seeing the actual code examples, which made it so much easier to follow what is actually going on. As usual, an In Action book has a lot of stuff in it that relates to getting things started, and since I don't usually work in Java, they were of little interest to me. But the core ideas are very interesting.

It does seems to be limited to a very small set of scenarios, needing to, in essence, index large sets of data. Some of the examples in the book made sense as theoretical problems, but I think that I am still missing the concrete "order to cash" scenario, seeing how we take a business problem and turn that into a set of technical challenges that can be resolved by utilizing Map Reduce in some part of the problem.

As I said, only the first 4 chapters are currently available, and I was reading the early access version, so it is likely will be fixed when more chapters are coming in.

Core NHibernate: Persistence with NHibernate

I can't believe that I didn't mention this before. I am partnering with Skills Matter to create a course about using NHibernate.

You can see the full details here.

The first run is in London in May, followed by Norway in June and London again in August. After that I stopped tracking, but it is very frequent.

This is a three days course that is meant for developers who want to get into NHibernate. I am going to assume zero NHibernate knowledge and take you to the spot where you can build NHibernate applications on your own.

Rhino Persistent Hash Table

I spoke about the refactoring for Rhino DHT in this post. Now, I want to talking about what I actually did.

Before, Rhino DHT was independent, and had no dependencies outside a Windows machine. But because of the need to support replication, I decided to make some changes to the way it is structured. Furthermore, it became very clear that I am going to want to use the notion of a persisted hash table in more than just the DHT.

The notion of a persisted hash table is very powerful, and one that I already gave up on when I found about Esent. So I decided that it would make sense to make an explicit separation between the notion of the persistent hash table and the notion of distributed hash table.

So I moved all the code relating to Esent into Rhino.PersistentHashTable, while keeping the same semantics as before (allowing conflicts by design, and letting the client resolve them, pervasive read caching) while adding some additional features, like multi value keys (bags of values), which you are quite useful for a number of things.

The end result is a very small interface, which we can use to persist data with as little fuss as you can imagine:

image

This may seem odd to you at first, why do we need two classes for this? Here is a typical use case for this:

using (var table = new PersistentHashTable(testDatabase))
{
	table.Initialize();

	table.Batch(actions =>
	{
		actions.Put(new PutRequest
		{
			Key = "test",
			ParentVersions = new ValueVersion[0],
			Bytes = new byte[] { 1 }
		});

var values = actions.Get(new GetRequest { Key = "test" });
actions.Commit();
		Assert.Equal(1, values[0].Version.Number);
		Assert.Equal(new byte[] { 1 }, values[0].Data);
	});
}

There are a few things to note in this code.

  1. We use Batch as a transaction boundary, and we call commit to well, commit the transaction. That allow us to batch several actions into a single database operation.
  2. Most of the methods access a parameter object. I found that this makes versioning the API much easier.
  3. We have only two set of API that we actually care about:
    1. Get/Put/Remove - for setting a single item in the PHT.
    2. AddItem/RemoveItem/GetItems - for adding, removing or querying lists. Unlike single value items, there is no explicit concurrency control over lists. That is because with lists, there is no concept of concurrency control, since each of the operation can be easily made safe for concurrent use. In fact, that is why the PHT provides an explicit API for this, instead of providing it on top of the single value API.

I am planning on making Rhino PHT the storage engine for persistent data in NH Prof.

NH Prof: Performance implications for the profiled application

I got a couple of complaints about this, and I think that this is a really important issue. Using NH Prof slow down the profiled application. That is to be expected, since we require the application to do more work, but it is not as good as we can hope it would be.

Here is the test scenario, running against the Northwind database (thanks Davy for supplying it):

class Program
{
	static void Main(string[] args)
	{
		var sp = Stopwatch.StartNew();

		HibernatingRhinos.NHibernate.Profiler.Appender.NHibernateProfiler.Initialize();
		
		ISessionFactory sessionFactory = CreateSessionFactory();

		using (var session = sessionFactory.OpenSession())
		{
			var products = session.CreateCriteria(typeof(Product))
				.SetCacheable(true).List<Product>();
		}

		using (var session = sessionFactory.OpenSession())
		{
			var products = session.CreateCriteria(typeof(Product))
				.SetCacheable(true).List<Product>();
		}

		using (var session = sessionFactory.OpenSession())
		{
			var products = session.CreateCriteria(typeof(Product))
				.SetCacheable(true).List<Product>();
		}

		using (var session = sessionFactory.OpenSession())
		{
			var products = session.CreateCriteria(typeof(Product))
				.SetCacheable(true).List<Product>();
		}

		using (var session = sessionFactory.OpenSession())
		{
			var products = session.CreateCriteria(typeof(Product))
				.SetCacheable(true).List<Product>();
		}

		using (var session = sessionFactory.OpenSession())
		{
			var products = session.CreateCriteria(typeof(Product))
				.SetCacheable(true).List<Product>();
		}

		Console.WriteLine(sp.ElapsedMilliseconds);
	}

	private static ISessionFactory CreateSessionFactory()
	{
		Configuration configuration = new Configuration()
			.Configure()
			.AddAssembly("nhprof_experiment");

		return configuration.BuildSessionFactory();
	}
}

With NH Prof active, this takes: 13 - 16 seconds to run.

Without NH Prof running, this takes: 12 seconds to run.

The reason for that difference is probably that NH Prof is doing some work when new events come in, so it steals some CPU time from the other process.

Disabling NH Prof (commenting the NHibernateProfiler.Initialize()), this takes 4 - 6 seconds to run. This is a huge difference.

I decided to see whatever this is something that I can actually do something about, and turned the NHibernateAppender into a no op. So the next run would measure just the time that it takes to NHibernate to generate the event that I care about, and not whatever processing I am doing on the profiled application side.

The result both surprised and thrilled me: 4.8 seconds!

Why was I happy about this?

Because it meant that I was actually at fault here. And if I was actually at fault, it means that I can fix things. I pulled up the trusty dotTrace Profiler, and quickly identified a hot spot:

image

Well, in this case, you need to understand a bit about how NH Prof get the data from the profiled application. We gather the event stream and batch send it to the actual profiler. Those sends are RPC calls, and it is expected that they would be slow. Except, they are not happening in the same thread as the actual application, so they should have little effect on the actual performance of the test scenario that we have above.

I decided to try it out again, this time, we will change FlushMEssagesToProfiler and SendStatsticInformation to no ops, and see what happens.

Running the application again, we got runtime of 13 seconds. That confirmed my original suspicion, that while those actions takes a long time, they do not contribute to the actual perf problem in the actual code.

I run the code again under the profiler, looking deeper. That led me to this discovery:

image

Well, I knew that stack trace was slow, but I sincerely thought that I was doing better job in reducing the number of times that I had to call it. Calling it 4,178 times is obviously wrong.

We track the stack trace for the following event sources:

  • NHibernate.SQL
  • NHibernate.Event.Default.DefaultLoadEventListener
  • NHibernate.Cache.StandardQueryCache
  • NHibernate.Transaction.AdoTransaction

The problem is that some things, like DefaultLoadEventListener, are generating a lot of events that I don't really care about. I started approaching the problem by applying filters, so I would only ever create the stack trace for the events I truly care about. That brought the cost down significantly, so now it took 6 - 7 seconds. We shaved 5 - 10 seconds of the total running time, awesome.

But I think that we can still do better than that. Let us see what dotTrace is saying:

image

dotTrace is saying that we now reduced the time this takes from 31% to 3.5%. We have better things to do than to look at this piece of code again, then.

Let us focus on something that will give us more benefit, then:

image

This is actually the initial RPC call between the appender and the profiler. It establish the TCP connection, among other things, so it takes a while to process. Almost 0.7 seconds is a lot of time. I changed that code to run on a background thread, further improving the startup time of the application.

Time to execute the code now? 5.8 seconds.

Time to go to dotTrace again, I think.

image

And we can see that what dominate the actual time for the test is the startup concerns of NHibernate. But I think that in a real scenario, the time for generating a stack trace would again become a meaningful for performance.

There are probably additional things that I can do to optimize this, but I think that this is a  great step. To remind you, the initial costs were:

  • No profiler: 4 - 6 seconds
  • With profiler: 13 - 16 seconds

After the changes, we are with: 5 - 7 seconds, which I consider entirely reasonable.

And for a few hours of performance tuning, reducing the time by 8-9 seconds and by over 60%, I think I did good.

The code pass all the tests and was committed, so in about ten minutes, you would be able to download it from the NH Prof site.

Why startup times of applications MATTER

imageVery often I see people trading startup performance for overall performance. That is a good approach, generally. But it has huge implications when it gets out of hand.

There is one type of user that pays the startup cost of the system over and over and over again. It is YOU!

The developer is the one that keeps restarting the application, and overly long startup procedure will directly effect the way developers will work with the application.

Not only that, but since startup times can get ridiculous quickly, developers start working around the system to be able to avoid paying the startup costs. I have seen a system that had ~3 minutes startup time. And the developers has moved a lot of the actual application logic outside the system (XML files, mostly) in order to avoid having to start the system.

This is something that you should keep in mind when you trade off overall performance for startup performance. It is usually preferable to either do things in the background, or softly page things in, as demands require it.

This is usually something that require a more complex solution, though.

NH Prof Stats

I am just doing some work on NH Prof, and I run into the line counter button, and I thought it is interesting. Here is the break down of line counts in the application:

  • Appender - 818 lines of code
    The Appender is responsible for communication with the profiled application
  • Backend - 3,548 lines of code
    The Backend is responsible for translating the event stream from the Appender into a coherent idea, apply analysis and send them to the UI.
  • Client - 6,430 lines of code
    The Client is where all the UI work is done, it contains the view model, which is large part of this assembly, reports, user interface work, and other such things that directly relate to what the user is doing with the application. Note that this include things like error reporting, actually executing queries, and other things that are not strictly UI, but are directly related to the way the client works.

This include only C# files, not Xaml files.

Tracking NH Prof performance problem on aggressively active application

I mentioned before that NH Prof has a problem with dealing with a lot of activity from the profiled application. In essence, until it catch up with what the application is doing, there isn't much that you can do with the profiler.

I consider this a bug, and I decided to track that issue with some verifiable results, and not just rely on hunches. The results were... interesting. The actual problem was in the place that I expected it to be, but it was more complex than expected. Let me try to explain.

The general architecture of NH Prof is shown here:

image

And the actual trouble from the UI perspective is in the last stage, actually updating the UI. UI updates are done directly on a (very) rich view model, which supports a lot of the functionality that NH Prof provides.

Because, as usual, we have to synchronize all access to the UI, we actually update the UI through a dispatcher class. The relevant code is this:

private void SendBatchOfUpdates()
{
	while (true)
	{
		int remainingCount;
		Action[] actions;
		lock (locker)
		{
			actions = queuedActions
				.Take(100)
				.ToArray();

			for (int i = 0; i < actions.Length; i++)
			{
				queuedActions.RemoveAt(0);
			}
			remainingCount = queuedActions.Count;
		}

		if (actions.Length == 0)
		{
			Thread.Sleep(500);
			continue;
		}

		dispatcher.Invoke((Action)delegate
		{
			using(dispatcher.DisableProcessing())
			{
				var sp = Stopwatch.StartNew();
				foreach (var action in actions)
				{
					action();
				}
				Debug.WriteLine(remainingCount + "\t" + sp.ElapsedMilliseconds);
			}
		});
	}
}

We queue all the operations on the view model, batch them and execute them to the UI. This is the behavior as it stands right now (that is, with the problem). Looking at the code, is should be obvious to you that there is one glaring issue. (Well, it should be oblivious, it wasn't to me until I dug deeper into it.)

If the number of queued actions is greater than the batch size, we are going to try to drain the queue, but as long as we add more actions to the queue faster than we can process them, we are going to keep hammering the UI, we basically keep the UI busy with updates all the time, and never let it time to process user input.

Note the debug line in the end, from that, I can generate the following graph:

image

For the duration line, the number is the number of milliseconds it took to process a batch of 100 items, for the red line, this is the number of items remaining in the queue that we need to drain. You can clearly see the spikes in which more work was added to the queue before we could deal with that.

You can also see that doing the actual UI work is slow. In many cases, it takes hundreds of milliseconds to process, and at times, it takes seconds to run. That is because, again, we have a very rich view model, and a lot of stuff is bound to that.

Before I can go forward, I need to describe a bit how NH Prof actually talks to the UI. Talking to the UI is done by raising events when interesting things happen. The problem that I initially tried to solve was that my input model is a stream of events, and it mapped nicely to updating the UI when I got the information.

Let us say that we want opened a session, executed a single SQL statement, and then closed the session. What are the implication from UI marshalling?

  1. Create new session
  2. Add new statement
  3. Add alert, not running in transaction
  4. Update statement with parsed SQL
  5. Update statement with row count
  6. Add alert, unbounded row set
  7. Update statement duration
  8. Close session
  9. Update session duration

So for the simplest scenario, we have to go to the UI thread 9 times. Even when we batch those calls, we are still end up going to the UI thread too often, and updating the UI frequently. That, as I mentioned, can take time.

Now, a very easy fix for the problem is to move the thread sleep above the if statement. That would introduce a delay in talking to the UI, letting the UI time to catch up with user input. This would make the application responsive during periods where the actions to perform in the UI.

The downside is that while this works, it means slow stream of updates into the application. That is actually acceptable solution, except that this is the same situation that we find ourselves with when we are dealing with processing an offline profiling session. When those are big, it can take really long time to process those.

Therefor, this is not a good solution.

A much better solution would be to reduce the number of times that we go to the UI. We can reduce the number of going to the UI significantly by changing the process to:

  1. Create new session
  2. Add new statement (with alerts, parsed SQL, row count and duration)
  3. Close session (with duration)

That requires some changes to the way that we work, but those are actually good changes.

This is not quite enough, though. We can do better than that when we realize that we actually only show a single session at a time. So most often, we can reduce the work to:

  1. Create new session
  2. Close Session (with duration, number of statements, alerts aggregation, etc)

If we also add batching to the mix, we can reduce it further to:

  1. Create new session & close it (with duration, number of statements, alerts aggregation, etc)

We have to support both of them, because we need to support both streaming & debugging scenarios.

We don't need to let the UI know about the statements, we can give it to the UI when it asks us about them, when you select a session in the sessions list.

And now, when we are looking at a session, we can raise an event only for that session, when a statement is added to that session. As an added bonus, we are going to batch that as well, so we will update the UI about this every .5 seconds or so. At that point, we might as well do an aggregation of the changes, and instead of sending just the new statement, we can send a snapshot of the currently viewed session. The idea here is that if we added several statements, we can reduce the amount of UI work we have to do by making a single set of modifications to the UI.

Thoughts?

Oh, and do you care that I am doing detailed design for NH Prof problems on the blog?

Challenges when building NH Prof

Right now, we have one challenge with building NH Prof: making the UI scale to what people are actually using.

The problem that we face is:

  • A profiled application is generating a lot of events.
  • Many of those events are relevant to the profiler and so result in UI Model changes.
  • UI Model changes require UI refresh.

There is a limit on how often you can refresh the UI. It is not just a matter of the UI being incredibly slow. (Try it, UI is about as slow as remote system calls!) That problem we already solved with batching and aggregation.

The problem is deeper than that, how do you keep a good UI experience when you have so many changes streaming into the UI. If my recent statements list is updated 5000 in the space of one minute, there is no value in actually showing anything there, because you don't get to see that.

We are thinking about various ways of handling that, but I would appreciate additional input. How do you think we should deal with this issue?

Ayende's Open Source Project Maturity Model

From my point of view, there is a very easy model for the maturity of an Open Source project. Look at the answers in the project's mailing list. The questions do no matter.

The model is simple, the higher the percentage of answers given by non project members, the more mature the project is.