Ayende @ Rahien

It's a girl

NH Prof: Query Duration

One of the more popular requests for NH Prof was to be able to track query duration not only for select statements, but for all statements. It took a bit of work, I’ll admit, far more than you would probably assume from the screen shot, but it is here:

image

What you can see now is that the duration is divided into the time a query took in the database and the total time a query took. For the most part, there wouldn’t be much of a difference between them, but it is occasionally interesting to see the difference. It is also a far more accurate measure than the one that we used to have.

This is available from build #349, and using NHibernate’s trunk or 2.1.x branch.

An end to the Mac era

Well, it is about time that I get myself a new machine. My laptop is 1.5 years old, and it is starting to creak around the edges, not to mention my envy when I see what new systems can do.

But the most important decision I made was that I don’t intend to get a Mac as my new laptop. I own two Macs, and I am a very happy user of them, but after using them for so long, I can no longer really justify the Mac tax in comparison to the feature set that I am getting from it. This is particularly important when I consider the next laptop that I want to have.

This time, I am going to play the spoiled child and get myself a real monster:

  • Quad Core CPU
  • 8 GB RAM
  • 256 SSD HD

I am still considering whatever to get a 17” or a 15” screen. I am leaning both ways at the moment, but I’ll probably end up with the 15”.

One thing that I would really like to have is a mousepad that is similar to the Mac’s. I find that this is about the most natural way of working with the mouse now, and trying to use other mousepads, especially without having the two finger scrolling is just painful.

Anyone can put in a good recommendation?

NH Prof: Exposing the data that you already have

One of the things that drove me mad with NH Prof is that at times it was hard to visually see when a session went to the database a lot or hit the cache frequently. I mean, you most certainly had that information if you drill into the session itself, but there is really no excuse for not giving that information to the user directly.

Hence, the following screen shot:

image

I am back to working on NH Prof full time, and I love the way everything is set up. Time to feature is ridiculously low.

NH Prof: Getting big, and bigger

Well, at least now I have proof positive that all the architecture work that we did in order to ensure that NH Prof will scale better actually works. The following screenshot shows an example that used to crash NH Prof, it loading an 80 MB file with over 200,000 individual events to scan and process.

image

The fun part is that this is run through the entire pipeline, but without involving the UI, so we get the ability to process all those events without having to wait for the UI. If you remember, we had to invest a lot of time making sure that this sort of scenario will work properly.

Now, it takes 85 seconds for this to load, which I consider pretty reasonable, it means about 2,350 events per second, which isn’t a high number, but is plenty fast enough overall, considering the amount of work that I have to do in order to do things like calculate alerts, format the SQL, and do all the other things that NH Prof has to do.

NHibernate -

A natural id is a way to refer to a unique field of an object as a substitute of the real entity identifier. A good (and typical) example of that would be with the User entity. We have the user name and the user id, both are unique, but the user id is usually something that is generated by our application and has no relation to the a human being. In other words, user #123814 doesn’t mean anything to me, while user ‘ayende’ has a meaning to us.

There are many reasons for choosing this approach, but the most common one is that we want a small primary key, since it is duplicated to all related tables (especially important for things like user, which is associated with just about everything).

So far, it doesn’t seem really interesting, right? Why am I boring you with talking about this?

Well, it is all about the second level cache and optimizing access to it. Let us take a look at how we set it up:

<class name="User"
		 table="Users">
	<cache usage="nonstrict-read-write"/>
	
	<id name="Id">
		<generator class="identity"/>
	</id>
	
	<natural-id mutable="false">
		<property name="Username"/>
	</natural-id>
</class>

And now we are going to query it:

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
	s.CreateCriteria<User>()
		.Add(Restrictions.NaturalId()
			.Set("Username", "Ayende"))
		.SetCacheable(true)
		.UniqueResult();

	tx.Commit();
}

<natrual-id/> is used almost exclusively for documentation purposes. That means that internally the query above will be translated to:

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
	s.CreateCriteria<User>()
		.Add(Restrictions.Eq("Username", "Ayende"))
		.SetCacheable(true)
		.UniqueResult();

	tx.Commit();
}

Well, almost. There is one subtle different between the two. When querying on <natural-id/>, we also bypass the usual expiration check in the second level cache implemented by NHibernate.

That means that even if we changed the underlying table, NHibernate will not automatically expire the results in the cache for this query, it will still consider it valid, because it assumes that like primary keys, there isn’t really a reason to perform an expiration check on natural ids.

NH Prof: NHibernate Profiler vs. Sql Profiler

I recently got a question, asking why someone should pay for NH Prof if SQL profiler is available for (effectively) free?

The answer is actually very easy to answer. I run the following scenario through both NH Prof and SQL Profiler:

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
	s.CreateCriteria<User>()
		.Add(Restrictions.NaturalId()
			.Set("Username", "Ayende"))
		.SetCacheable(true)
		.UniqueResult();

	tx.Commit();
}

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
	s.CreateCriteria<User>()
		.Add(Restrictions.NaturalId()
			.Set("Username", "Ayende"))
		.SetCacheable(true)
		.UniqueResult();

	tx.Commit();
}

Here is what NH Prof gave me:

image

And here is what I got from SQL Profiler:

image

I think that I have rested my case, pretty conclusively.

NH Prof: Architectural Overview

I am pretty fond of the way NH Prof is architected.  At a very high level, it looks like this:

image

The first part isn’t really that interesting, the main challenge is to get the data from your application to the profiler in as speedy and performant way as possible. We are using a binary protocol based on Protocol Buffers with TCP Sockets as the communication mechanism.

Next we have the listener and the message processors. That is where a lot of the magic actually happens, we do first level analysis on the incoming log messages and transform them into higher level concepts. At this point, we are only dealing with a single message, we have no overall picture, but a lot of work is actually happening at this layer.

It is there that I normalize the differences between all the NHibernate versions that I support and make sure that the other layers work on a common object model.

The next stage is the model building, where I am actually taking all the information and start putting it together, it is there where concepts such as sessions, and statements in session start making sense, it is there that I actually apply analytics to provide you with the alerts. It is a pretty complex piece of code, taken together, but each individual part is pretty simple.

Next, we have the view model, which is another representation of the model in a format that is applicable for showing to the user, I have spoke before about moving from a push to a pull model, in order to allow the UI to remain performant. That has been one of the major challenges in that area. Well, that and making sure that we have good UX.

The interesting part about this architecture is that a new feature is usually something that happen either in the listeners and the model building (when it match the current model that we have) or a vertical slice across the entire layer, when we need to express some new concept.

Politics: Iran

Notice: I am not going to respond to comments to this post.

Just an interesting observation, there are Bad Things currently happening in Iran. And overall, the only response that we see is a mumble here and there. Compare it to the reaction of the world to any incident in Israel in the last four decades, and I think that you can see the problem.

As far as the world is concerned, it looks like the Iranians are second class people.

What happens in DevTeach...

Tends to leave lasting impressions.

You might have noticed that I have been sort of quite lately, I have been in DevTeach last week, and I more or less had zero time to do anything that wasn't actually DevTeach. I can honestly say that this is, hands down, one of the conferences that I most enjoyed period.

We spent so much time in the Hotel's bar that the bartenders knew us all by name.

I don't think that I can really do justice to the last week with any recap, let me just point out that most of it was recorded on Twitter, and that there really isn't any way to explain if you weren't there.

It was busy, full of fun and exhausting, and I was half grateful and half sad when it was over.

The fallacies of parallel computing

Notes from alt.net parallel session.

  • Locality doesn't matter
  • Locks / syncronization are cheap
  • Higher parallelism equates to faster code
  • All actors see the same state
  • Parallel programming is easy
Tags:

Published at

Originally posted at

Comments (19)

NHibernate – Automatic change tracking for aggregate roots in DDD scenarios

Recently I had a long discussion with Greg Young about the need of this feature, and after spending some time talking to him face to face we were able to reach an understanding on what exactly is required to make this work. Basically, the feature that Greg would like to see is to write code like this and have NHibernate take care of optimistically locking the aggregate.

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
    var post = s.Get<Post>(postId);
    post.AddComment(new Comment
    {
        Email = "foo@bar.z",
        Text = "first!",
        Name = "foo",
    });
    tx.Commit();
}

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
    var post = s.Get<Post>(postId);
    var comment = post.Comments.First();
    comment.Text += " some change";

    tx.Commit();
}

In this case, Post is the aggregate and Comment is a contained entity. As it turned out, the solution (assuming that you are willing to accept some intentional limitations) is ridiculously easy. Those limitations are more or less inherit to the way people use DDD in the first place, so that doesn’t add additional restrictions on you. The limitations are:

  • The relation between aggregates and contained entities must be explicit.
  • The association between a contained entity and its aggregate must be direct. That is, any contained entity must have a direct reference to its aggregate, not going via intermediaries.
  • You never access a contained entity except by traversing the object graph from its aggregate.

Given all of that, the only thing that we are left with is to formalize the association between aggregates and contained entities:

public interface ICanFindMyAggregateRoot
{
    IAggregateRoot MyRoot { get;  }
}

public interface IAggregateRoot
{
}

This implies that all you entities must have implemented either IAggregateRoot or ICanFindMyAggregateRoot, following on Greg’s allow-no-errors policy, we verify this using:

private static void ProtectMyDomainModelFromDomainDrivenDesignIgnorance(Configuration configuration)
{
    foreach (var clazz in configuration.ClassMappings)
    {
        if(typeof(IAggregateRoot).IsAssignableFrom(clazz.MappedClass) == false &&
            typeof(ICanFindMyAggregateRoot).IsAssignableFrom(clazz.MappedClass) == false)
            throw new InvalidOperationException("DDD Violation " + clazz.MappedClass);
    }
}

And now that we have finished setting everything up, what are we left with?

public class ForceRootAggregateUpdateListener : IPreUpdateEventListener, IPreInsertEventListener
{
    public bool OnPreUpdate(PreUpdateEvent updateEvent)
    {
        var rootFinder = updateEvent.Entity as ICanFindMyAggregateRoot;
        if (rootFinder == null)
            return false;
        
        updateEvent.Session.Lock(rootFinder.MyRoot, LockMode.Force);

        return false;

    }

    public bool OnPreInsert(PreInsertEvent insertEvent)
    {
        var rootFinder = insertEvent.Entity as ICanFindMyAggregateRoot;
        if (rootFinder == null)
            return false;
        
        insertEvent.Session.Lock(rootFinder.MyRoot, LockMode.Force);

        return false;
    }
}

I have spoken about Listeners in the past, and they are the preferred way to extend NHibernate. In this case, we need to register them, and then forget about it:

<listener type='pre-update' class='Ex1_Querying.ForceRootAggregateUpdateListener, Ex1-Querying'/>
<listener type='pre-insert' class='Ex1_Querying.ForceRootAggregateUpdateListener, Ex1-Querying'/>

As you can see, this is really pretty simple, we check if we are currently update a contained entity and then force an update in the versioned entity version. There is a slight problem here that we may generate several updates per transaction here, but I am not worried about that overly much, it is fairly simple to resolve (by keeping track of the entity and not updating if we already updated in the current transaction), so I’ll leave it up to you.

The end result is that this code:

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
    var post = s.Get<Post>(postId);
    var comment = post.Comments.First();
    comment.Text += " some change";

    tx.Commit();
}

Results in:

image

And this code:

using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
    var post = s.Get<Post>(postId);
    post.AddComment(new Comment
    {
        Email = "foo@bar.z",
        Text = "first!",
        Name = "foo",
    });
    tx.Commit();
}

Will result in:

image

As you can see, we generate two update statements for Post here. The first is for the change in the associated Comments collection, the second is for the change (insert) of a contained entity. We can avoid that duplicate update by adding the additional constraint that all contained entities must be directly attached to the aggregate root (so it contain a reference to anything that it uses), but I feel that this is a too heavy limitation, and that a single superfluous update is just all right with me.

So here you go, fully automated aggregate change tracking in any part of the graph.

NHibernate – query only properties

With most applications, there is some difference between the domain model and the data model. Probably the most common scenario that I run into can be expressed in the usually Blog & Posts example. In our domain model, we don’t want to have a Blog.Posts collection, we want to only have a Post.Blog.

However, in the data model, all associations are bidirectional, but that doesn’t mean that we want to have the same in the domain model. The problem is that we still want to query on that. That is a bummer, right? Because now we have to change our domain model to fit the query model that we want.

Not really. It is not well known, because we don’t generally think about this as an important concept that should be externalized to the users, but while we usually portray NHibernate working on the following model:

image

This is not actually accurate. What is closer to the way NHibernate works is this model:

image

And the reason that this is important is that the entity model doesn’t have to match the POCO model.

Let us see how this works:

<set name="Posts"
     access="noop">
    <key column ="BlogId"/>
    <one-to-many class="Post"/>
</set>

As you can see, we specify a no op access strategy (with NH 2.1 you can also use access=”none”, which sounds nicer), to do so. Now, we can execute a query:

var blog = 
    s.CreateQuery("from Blog b where size(b.Posts) > 5")
    .List<Blog>();

Remember, Blog doesn’t have a Posts property, it is a query only property. This code results in:

image

This is actually most useful when you want to have different projections on the same thing. Let us take a look:

<set name="RecentPosts"
     where="(PostedAt >= (getdate() - 30) )"
     access="noop">
    <key column ="BlogId"/>
    <one-to-many class="Post"/>
</set>

And using this just as before:

var blog =
       s.CreateQuery("from Blog b where size(b.RecentPosts) > 5")
       .List<Blog>();

And that will result in this:

image

I think that this is the key usage scenario for this feature, being able to create query only associations that I’ll traverse only for querying.

Avoid externalizing decisions from your domain model

I recently saw an entity that looked something like this:

public class Prisoner
{
    public virtual bool CanBePutInIsolation() { ... }
    public virtual bool IsEligibleForVacations() { ... }
    public virtual bool CanSendToWork() { ... }
}

I won’t show the logic in those methods, but it was fairly involved and very business focused. It is also, incidentally, caused my rhino sense to tingle. It didn’t surprise me to find out that the UI looked like this:

image

Well, the logic is well encapsulated, and it clearly is business logic. Why should I care where I am actually putting it? Isn’t the place of domain logic in the… domain?

Yes & no. The problem that I have here is that those are query methods that are only going to be used in the UI. There is no business logic in the application that actually uses them, it is only the UI that will call them to make presentation decisions.

There are actually several issues here, first, and I want to make it clear, this is not an issue of mixing presentation logic in the entities. Next, and the reason that my rhino sense tingled is the presence of a Boolean query method  on the entity. Every time that I see one I become very suspicious. Boolean methods worry me because of their implications. If you have a Boolean method here, it means that somewhere else you have an if statement that works based on this method.

And at that point, you really have to ask yourself if this is an appropriate decision to make. Usually, I find, you can avoid it by moving the responsibility into the domain. But what about this case? I can’t really move the responsibility for creating the UI into the entity, after all. And somewhere in the application I must have this knowledge so I can build my UI.

I am not going to try to answer this question at this moment. I have my own thoughts about the subject, but I would like to have additional feedback about this from the community.

Analyzing a performance problem – Is a prisoner dangerous?

Recently I run into a performance problem in an application that I was reviewing, and I thought that it would make a great post. Since I can’t use the actual application model, I decided that I am tired of using the same old online shop model and turned to the one domain in which I am a domain expert. Prisons.

Let us imagine that this is part of the Prison 10’s* Dashboard. It looks pretty simple, right?

image

Let us talk about this as SQL, ignoring all layers in the middle. We can express this as:

SELECT TOP 10 Prisoner.Id, Prisoner.FirstName, Prisoner.LastName, Prisoner.Status
FROM Prisoner JOIN Imprisonment on Prisoner.Id = Imprisonment.Prisoner
WHERE Imprisonment.Prion = 6 AND Imprisonment.IsCurrent = 1
ORDER BY Imprisonment.ArrivalDate DESC

Seems simple enough, right? But notice that we don’t have Prisoner.Dangerous column on the result. That is because there is no such column in the table. So what does our good developer do? He goes to the domain expert and ask him where he is supposed to get the data from. The domain expert answer is that this is not an attribute of the prisoner itself, it is a decision based on many factors, any of which can flag a prisoner as a dangerous. He then gives the developer a list of the few common ones:

  • If the prisoner is not convicted
  • If the prisoner has a disciplinary notice in the last 2 months
  • If the prisoner has more than one disciplinary notice in the last 6 months
  • If the prisoner is charged with violent crime
  • If the prisoner is currently in withdrawal
  • If the prisoner request for vacation was denied in the last 3 months

As you can see, the list goes on (and on, and on). Our developer starts working on the problem. Since he is working on a small local database, he is starting out with a spike of everything as in memory filters. It is beautifully Object Oriented, and it looks like this:

public class NotConvicted : IPrisonerDangerousEvaluator
{
    public bool Eval(Prisoner p)
    {
        return p.Status != PrisonerStatus.Convicted;
    }
}

public class DisciplinaryNoticeInLast2Months : IPrisonerDangerousEvaluator
{
    public bool Eval(Prisoner p)
    {
        return p.DisciplinaryNotices
                .Any(x=>x.Date > DateTime.Today.AddMonths(-2));
    }
}

public class MorethanOneDisciplinaryNoticeInLast6Months : IPrisonerDangerousEvaluator
{
    public bool Eval(Prisoner p)
    {
        return p.DisciplinaryNotices
                .Where(x=>x.Date > DateTime.Today.AddMonths(-6))
                .Count() > 1;
    }
}

public class ChargedWithViolentCrime : IPrisonerDangerousEvaluator
{
    public bool Eval(Prisoner p)
    {
        return p.Charges
                .Any(x=>x.ChargeStatus.IsViolent);
    }
}

public class CurrentlyInDrugWithdrawal : IPrisonerDangerousEvaluator
{
    public bool Eval(Prisoner p)
    {
        return p.MedicalRecord
                .Withdrawals
                .Any(x=>x.IsCurrent);
    }
}

public class VacationRequestDeniedInLast3Months : IPrisonerDangerousEvaluator
{
    public bool Eval(Prisoner p)
    {
        return p.Requests
                .Where(x=>x.RequestType == RequestType.Vacation && x.Date > DateTime.Today.AddMonths(-3))
                .Any(x=>x.Status.Approved == false);
    }
}

Now, this works, and from OO perspective, it works just great. From a performance perspective, this is horrible. Let us do the math of how many queries this is going to generate, oaky?

  • Get all prisoners – 1
    • For each prisoner – N(Prisoners)
      • NotConvicted – 0 (load data in the prisoner entity, which was already loaded)
      • DisciplinaryNoticeInLast2Months – 1 (load DisciplinaryNotices)
      • MorethanOneDisciplinaryNoticeInLast6Months – 0 (DisciplinaryNotices already loaded)
      • ChargedWithViolentCrime – 1 (load Charges)
      • CurrentlyInDrugWithdrawal – 2 (load Medical Record, load Withdrawals)
      • VacationRequestDeniedInLast3Months – 2 (load Requests, load Status)

This isn’t SELECT N+1 ! This is so much worse. What we have is actually: SELECT 1 + N(1+1+2+2)

Or, in other words, assuming we want to display the first ten inmates, showing that little grid up above is going to result in 41 queries!

This actually show us several important worst practices:

  • In memory filtering should be avoided if possible.
  • Trying to ignore the realities of data access is going to bite you, hard.
  • There is a good reason for the call for aggregate roots and the avoidance of highly connected object graphs.

So, what is the obvious approach? We can change the way that we load the data. Instead of trying to go through the object model, we can query the data directly from the database. It is going to look something like this:

SELECT TOP 10 Prisoner.Id, Prisoner.FirstName, Prisoner.LastName, Prisoner.Status,
(
    SELECT 1 
    WHERE Prisoner.Status != 'Convicted'
    OR EXISTS (SELECT 1 FROM DisciplinaryNotices 
        WHERE DisciplinaryNotices.Prisoner = Prisoner.Id
        AND   DisciplinaryNotices.Date > getdate()-60)
    OR 1 < (SELECT COUNT(*) FROM DisciplinaryNotices 
        WHERE DisciplinaryNotices.Prisoner = Prisoner.Id
        AND   DisciplinaryNotices.Date > getdate()-180)
    OR  EXISTS (SELECT 1 FROM Charges
        WHERE Charges.Prisoner = Prsioner.Id
        AND Charges.IsViolent = 1)
    OR  EXISTS (SELECT 1 FROM MedicalRecords JOIN Withdrawals ON MedicalRecords.Id = Withdrawal.Record
        WHERE MedicalRecords.Prisoner = Prisoner.Id
        AND Withdrawal.IsCurrent = 1)
    OR  EXISTS (SELECT 1 FROM Requests JOIN RequestsStatus ON Requests.Id = RequestsStatus.Request
        WHERE Requests.Prisnoer = Prisoner.Id
        AND   Request.Date > getdate() - 90
        AND   RequestsStatus.Approved = 0
        AND   Request.Type = 'Vacation')
) as Dangerous 
FROM Prisoner JOIN Imprisonment on Prisoner.Id = Imprisonment.Prisoner
WHERE Imprisonment.Prion = 6 AND Imprisonment.IsCurrent = 1
ORDER BY Imprisonment.ArrivalDate DESC

From the point of view of database access, we are in a much better position, we now have only a single query to execute. For that matter, we can change the IPrisonerDangerousEval to perform their evaluation in the database, retaining the nice OO model that we have. It would look something like this:

public class NotConvicted : IPrisonerDangerousEvaluator
{
    public void AddSelectionCriteria(DetachedCriteria crit)
    {
        return crit.Add(Restrictions.NotEq("Status", PrisonerStatus.Convicted));
    }
}

public class DisciplinaryNoticeInLast2Months : IPrisonerDangerousEvaluator
{
    public void AddSelectionCriteria(DetachedCriteria crit)
    {
        return crit.CreateCriteria("DisciplinaryNotices)
                   .Add(Restrictions.Gt("Date", DateTime.Today.AddMonths(-2)));
    }
}

Note, doing something like this require a fairly interesting infrastructure, to make sure that the results are consistent and each rule isn’t stepping on its other toes.

Problem solved?

Not… really.

Take a look at the SQL that we have above. This query is hitting 8 tables, and please remember that I explicitly stated that I kept the number of things that make a prisoner dangerous limited, there are about forty rules relating to this, not half a dozen. But even with just half a dozen, trying to optimize the above query is going to be… problematic.

This is where we figure out something really interesting. In most systems, the number of reads far outweigh the number of writes. The most basic optimization that we can do is move work from the read phase of the application to the write phase, since it is going to execute so much less often.

It seems like it would be a very simple solution for the application to execute the rules on save and set the IsDangerous property, right? Well, it would, except that there are so many different places in the application that can need this. Remember, there are a lot of rules, and just about anything can change that. So we would need to execute the rules whenever we change something in the application.

Here, again, we see the importance of aggregates, because instead of spreading that logic all over the system, we can define Prisoner as the aggregate, and force all changes to it to occur through the entity. When saving a prisoner, then we can execute this.

However, there is another aspect, this can cause an unacceptable performance for saving, since we expect the number of rules to grow and only become more complex over time. This looks like a good opportunity to to shift work to a background process. There is also the matter that we have to process those rules against every day, to make sure that we reset any rules that depend on time.

I would build such a system using the following structure:

image

The web server would send a notification to the batch processing server whenever a prisoner aggregate root is updated. The batch server will then process all rules for the prisoner. Once a day, the batch server will re-evaluate the rules for all prisoners. This solves the performance problem that we have when updating a prisoner. It introduce a new one, a problem of consistency. Now, there is going to be some delay between updating a prisoner and updating the IsDangerous status. In practice, I would expect the window of inconsistency to be be in the order of the time it takes to process all the rules. I would be extremely surprised if it was more than a second, and probably a lot less.

In the duration, we can show in the UI that the prisoner’s danger status in unknown. The UI can then query the application periodically until the danger status became known again using Ajax. From the user experience point of view, there would be almost no change, but in cases where the evaluation takes long enough for the user to notice, it will be made explicit what is going on.

Yes, it is a more complex solution, but it is also one that neatly avoids all the complications that we have run into during this post.

* I am sorry, but this post is going to be choke full of jokes that only I will probably understand.

NHibernate – Get Thou Out Of My Database – 2nd Edition

Following up on my previous post, the customer has complained about table names like [tbl_-1434067361], apparently they felt that this was misusing their naming policy. I told them that while I understood that, it did meet their naming policy. I got a new naming policy that stated that numbers are not allowed in column or table name, (and showing forethought) that table names must be composed of valid English words.

I, of course, decided that if this is what they wanted, they will get just that. And created this:

image

The words.txt file was taken from this URL: http://www.puzzlers.org/pub/wordlists/pocket.txt

The result is exactly per their specification:

create table [tbl_colonel] (
  [col_verbiage] INT IDENTITY NOT NULL,
   [col_unsparing] NVARCHAR(255) null,
   [col_indomitable] NVARCHAR(255) null,
   [col_bulldog] BIT null,
   [col_thank] DATETIME null,
   primary key ([col_verbiage])
)
create table [tbl_stump] (
  [col_upheaval] INT IDENTITY NOT NULL,
   [col_promissory] NVARCHAR(255) null,
   [col_predecessor] NVARCHAR(255) null,
   [col_chafer] NVARCHAR(255) null,
   [col_unyoke] INT null,
   [col_vise] NVARCHAR(255) null,
   PostId INT null,
   primary key ([col_upheaval])
)

create table [tbl_reprieve] (
  [col_wherewith] INT IDENTITY NOT NULL,
   [col_wolf] VARBINARY(8000) null,
   [col_legendary] NVARCHAR(255) null,
   [col_ago] NVARCHAR(255) null,
   [col_carabineer] DATETIME null,
   [col_referee] NVARCHAR(255) null,
   primary key ([col_wherewith])
)

The fun part about this approach is that you still get great level of security, while maintaining the naming convention. Even more than that, you get queries like:

SELECT col_verbiage, col_indomitable, col_bulldog, col_wolf, col_legendary, col_referee
FROM tbl_colonel JOIN tbl_repreieve ON tbl_repreieve.tbl_referee = tbl_colonel.col.unsparing
WHERE col_ago = 'Fun@house.at'

It is obvious that we are getting users & blogs with specific email, right?

This is actually much harder than just numeric values, because this is going to really mess with your mind.

Nitpicker corner: Yes, this is a humorous post, don’t take it seriously, and please don’t really do it to unsuspecting customers / DBAs. Unleash it on silly integration teams only.

If stupidity was money, I would be rich

Here is a crash report that I got from NH Prof error reporting:

System.NullReferenceException: Object reference not set to an instance of an object.
  at HibernatingRhinos.NHibernate.Profiler.Client.Model.Sessions.RecentStatements.set_SelectedStatement(IStatementModel value)
  at HibernatingRhinos.NHibernate.Profiler.Client.Model.Sessions.RecentStatements.Clear()
  at HibernatingRhinos.NHibernate.Profiler.Client.Model.SessionPresenter.Clear()
  at HibernatingRhinos.NHibernate.Profiler.Client.Model.ApplicationModel.Clear()
  at HibernatingRhinos.NHibernate.Profiler.Client.Commands.Clear.Execute(Object parameter)

This seems to be pretty standard, isn’t it? Let us look at SelectedStatement:

image

There isn’t a possibility of a null reference exception, even the RaisePropertyChanged is safe from NRE.

So, what was going on? I started dismissing this as a ghost story, but I got several issues of those. Then I remember that I am not the only on working on the profiler (well, that is a lie, I got a ‘someone else updated this file’ when I tried to commit) and updated. Here is the updated version:

image

And there is was, right on the third line in the set.

Lesson learned? Don’t be stupid, svn update.

Connection Pooling: Implemention

Given the following contact:

/// <summary>
/// Thread Safety - This is NOT a thread safe connection
/// Exception Safety - After an exception is thrown, it should be disposed and not used afterward
/// Connection Pooling - It is expected that this will be part of a connection pool
/// </summary>
public class DistributedHashTableStorageClient

I decided that I needed to really didn’t want to pass the responsibility for that to the client, and that I wanted to handle that inside my library. Here is what I came up with:

public class DefaultConnectionPool
{
    private static readonly ILog log = LogManager.GetLogger(typeof (DefaultConnectionPool));
    readonly object locker = new object();

    private readonly Dictionary<NodeEndpoint, LinkedList<PooledDistributedHashTableStorageClientConnection>> pooledConnections =
        new Dictionary<NodeEndpoint, LinkedList<PooledDistributedHashTableStorageClientConnection>>();

    public IDistributedHashTableStorage Create(NodeEndpoint endpoint)
    {
        PooledDistributedHashTableStorageClientConnection storage = null;
        lock (locker)
        {
            LinkedList<PooledDistributedHashTableStorageClientConnection> value;
            if (pooledConnections.TryGetValue(endpoint, out value) && value.Count > 0)
            {
                storage = value.First.Value;
                value.RemoveFirst();
            }
        }
        if (storage != null)
        {
            if (storage.Connected == false)
            {
                log.DebugFormat("Found unconnected connection in the pool for {0}", endpoint.Sync);
                try
                {
                    storage.Dispose();
                }
                catch (Exception e)
                {
                    log.Debug("Error when disposing unconnected connection in the pool", e);
                }
            }
            else
            {
                return storage;
            }
        }
        log.DebugFormat("Creating new connection in the pool to {0}", endpoint.Sync);
        return new PooledDistributedHashTableStorageClientConnection(this, endpoint);
    }

    private void PutMeBack(PooledDistributedHashTableStorageClientConnection connection)
    {
        lock (locker)
        {
            LinkedList<PooledDistributedHashTableStorageClientConnection> value;
            if (pooledConnections.TryGetValue(connection.Endpoint, out value) == false)
            {
                pooledConnections[connection.Endpoint] = value = new LinkedList<PooledDistributedHashTableStorageClientConnection>();
            }
            value.AddLast(connection);
        }
        log.DebugFormat("Put connection for {0} back in the pool", connection.Endpoint.Sync);
    }

    class PooledDistributedHashTableStorageClientConnection : DistributedHashTableStorageClient
    {
        private readonly DefaultConnectionPool pool;

        public PooledDistributedHashTableStorageClientConnection(
            DefaultConnectionPool pool,
            NodeEndpoint endpoint) : base(endpoint)
        {
            this.pool = pool;
        }

        public bool Connected
        {
            get { return client.Connected; }
        }

        public override void Dispose()
        {
            if(Marshal.GetExceptionCode() != 0)//we are here because of some sort of error
            {
                log.Debug("There was an error during the usage of pooled client connection, will not return it to the pool (may be poisioned)");
                base.Dispose();
            }
            else if(Connected == false)
            {
                log.Debug("The connection was disconnected, will not return connection to the pool");
                base.Dispose();
            }
            else
            {
                pool.PutMeBack(this);
            }
        }
    }
}

I think that should pretty much cover everything I need.

Thoughts?

NHibernate – Get Thou Out Of My Database

image There are evil people in this world, and some of them want access to my database. Unfortunately, they are often part of that nasty integration team and they try to integrate directly into my database. I tried beating them with clubs and lobbing arguments about letting other people mess with my implementation details, but they have been persistent. That is when I reached out to a technological solution for the problem.

I want to emphasize that this is the nuclear option and  you want to really consider it before going that route.

We are going to use NHibernate to do that, naturally. Here is how:

image 

Which results in the following schema:

create table [tbl_-1434067361] (
   [col_287061521] INT IDENTITY NOT NULL,
   [col_4699698] INT not null,
   [col_-1966747349] NVARCHAR(255) null,
   [col_-649855325] NVARCHAR(255) null,
   [col_-649855326] NVARCHAR(255) null,
   [col_775690683] NVARCHAR(255) null,
   [col_-2127361396] NVARCHAR(255) null,
   [col_-1334581412] NVARCHAR(255) null,
   primary key ([col_287061521])
)

To make sure that we follow procedure, we are even using the naming convention of the organization! Lovely, isn’t it? It is obvious that this is my people table, right?

All arguments against this schema can be answered using: “It is more secured”.

It might be the nuclear option, but it tend to work :-)

Nitpicker corner: No, I don’t suggest you would do this, that is why the code is an image. This is just an example of how to programmatically modify the configuration and mapping. I only used it once, during a demo to the integration people, to get them to leave my DB alone. If you push something like that for production you should probably be shot.

WCF works in mysterious ways

Here is the result of about two hours of trying to figure out what WCF is doing:

class Program
    {
        static private readonly Binding binding = new NetTcpBinding
        {
            OpenTimeout = TimeSpan.FromMilliseconds(500),
            CloseTimeout = TimeSpan.FromMilliseconds(250),
            ReaderQuotas =
                {
                    MaxArrayLength = Int32.MaxValue,
                    MaxBytesPerRead = Int32.MaxValue,
                    MaxNameTableCharCount = Int32.MaxValue,
                    MaxDepth = Int32.MaxValue,
                    MaxStringContentLength = Int32.MaxValue,
                },
            MaxReceivedMessageSize = Int32.MaxValue,
        };
        static void Main()
        {
            try
            {
                var uri = new Uri("net.tcp://" + Environment.MachineName + ":2200/master");
                var serviceHost = new ServiceHost(new DistributedHashTableMaster(new NodeEndpoint
                {
                    Async = uri.ToString(),
                    Sync = uri.ToString()
                }));
                serviceHost.AddServiceEndpoint(typeof(IDistributedHashTableMaster),
                                               binding,
                                               uri);

                serviceHost.Open();

                var channel =
                    new ChannelFactory<IDistributedHashTableMaster>(binding, new EndpointAddress(uri))
                        .CreateChannel();
                channel.Join();
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }

        }
    }

    [ServiceBehavior(
        InstanceContextMode = InstanceContextMode.Single,
        ConcurrencyMode = ConcurrencyMode.Single,
        MaxItemsInObjectGraph = Int32.MaxValue
        )]
    public class DistributedHashTableMaster : IDistributedHashTableMaster
    {
        private readonly Segment[] segments;

        public DistributedHashTableMaster(NodeEndpoint endpoint)
        {
            segments = Enumerable.Range(0, 8192).Select(i =>
                                                        new Segment
                                                        {
                                                            AssignedEndpoint = endpoint,
                                                            Index = i
                                                        }).ToArray();
        }

        public Segment[] Join()
        {
            return segments;
        }
    }

    [ServiceContract]
    public interface IDistributedHashTableMaster
    {
        [OperationContract]
        Segment[] Join();
    }

    public class NodeEndpoint
    {
        public string Sync { get; set; }
        public string Async { get; set; }
    }

    public class Segment
    {
        public Guid Version { get; set; }

        public int Index { get; set; }
        public NodeEndpoint AssignedEndpoint { get; set; }
        public NodeEndpoint InProcessOfMovingToEndpoint { get; set; }

        public int WcfHatesMeAndMakeMeSad { get; set; }
    }

The problem? On my machine, executing this results in:

Maximum number of items that can be serialized or deserialized in an object graph is '65536'. Change the object graph or increase the MaxItemsInObjectGraph quota.

The freaky part? Do you see the WcfHatesMeAndMakeMeSad property? If I comment that one out, the problem goes away. Since MaxItemsInObjectGraph is set to int.MaxValue, I don’t know what else to do, and frankly, I am getting mighty tired of WCF doing stuff in unpredictable ways.

Protocol Buffers & TcpClient, here I comes.

NHibernate – Beware of inadvisably applied caching strategies

One of the usual approaches for performance problems with most applications is to just throw caching on the problem until it goes away. NHibernate supports a very sophisticated caching mechanism, but, by default, it is disabled. Not only that, but there are multiple levels of opt ins that you have to explicitly state before you can benefit from caching.

Why is that?

The answer is quite simple, caching is an incredibly sensitive topic, involving such things as data freshness, target size, repetitive requests, etc. Each and every time I have seen caching used as a hammer, it ended up in tears, with a lot of micro management of the cache and quite a bit of frustration.

I wanted to give you an example, using the simple Blog->>Posts model, what happens if I wanted to display the blog and its posts? The code could look like this:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var blog = session.Get<Blog>(2);
    foreach (var post in blog.Posts)
    {
        Console.WriteLine(post.Title);
    }
    tx.Commit();
}

And the mapping are:

<class name="Blog"
         table="Blogs">
    <cache usage="read-write"/>
    <id name="Id">
        <generator class="identity"/>
    </id>
    <property name="Title"/>
    <property name="Subtitle" />
    <property name="AllowsComments" />
    <property name="CreatedAt" />
    <bag name="Posts" table="Posts" inverse="true">
        <cache usage="read-write"/>
        <key column="BlogId"/>
        <one-to-many class="Post"/>
    </bag>
</class>

<class name="Post"
             table="Posts">
    <id name="Id">
        <generator class="identity"/>
    </id>
    <property name="Title" />
    <many-to-one name="Blog"
                             column="BlogId"/>
</class>

Are you seeing the horrible issue in here? You probably don’t see this, but you will see in a moment. Let us see what is going to happen in the first run of this code:

image

That is about as well as you can make it. But what about the second time?

image

Ouch!

What just happened?!

Well, we loaded the blog from the cache, and then we loaded the Blogs’s Post collection from the cache. So far, it is working really nicely for us. However, the next thing we see, we have a huge SELECT N+1 and we have a lot more queries in the cache scenario than in the non cache scenario.

The problem is that when we cache a collection, we aren’t caching the data in that collection. We are only caching the ids that means that NHibernate gets the collection of ids and then try to resolve them one by one. Remember that I said that the mapping above has a horrible problem? While the Posts collection is cached, the Post themselves are not, requiring NHibernate to go to the database for each an every one of them.

Have I said ouch already? Be careful what you cache, and make sure that you aren't doing caching in a way that will actively harm you.

The same is applicable for the query cache as well, if you have a cached query that loaded entities, you want to make sure that the entities are also cached.

NHibernate Mapping –

I am not going to talk about all the options that NHibernate has for collections, I already did it for <set/>, and most of that are pretty similar. Instead, I am going to show off the unique features of NHibernate’s <map/>, and then some how crazy you can get with it.

Let us start with the simplest possible scenario, a key/value strings collection, which looks like this:

public class User
{
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
    public virtual IDictionary<string, string> Attributes { get; set; }
}

And the mapping is pretty simple as well:

<class name="User" table="Users">
    <id name="Id">
        <generator class="hilo"/>
    </id>
    <property name="Name"/>

    <map name="Attributes" table="UserAttributesAreBoring">
        <key column="UserId"/>
        <index column="AttributeName" type="System.String"/>
        <element column="Attributevalue" type="System.String"/>
    </map>
</class>

How are we using this? Let us start by writing it to the database:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    id = session.Save(new User
    {
        Name = "a",
        Attributes = new Dictionary<string, string>
        {
            {"height", "1.9cm"},
            {"x","y"},
            {"is_boring","true, very much so"}
        },
    });
    tx.Commit();
}

Which give us:

image

And when we want to read it, we just use:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var user = session.Get<User>(id);
    Console.WriteLine(user.Name);

    foreach (var kvp in user.Attributes)
    {
        Console.WriteLine("\t{0} - {1}", kvp.Key,
            kvp.Value);
    }

    tx.Commit();
}

And the SQL:

image

This simple mapping is quite boring, so let us try to do something a bit more interesting, let us map a complex value type:

public virtual IDictionary<string, Position> FavoritePlaces { get; set; }

// Position is a value type, defined as:
public class Position
{
    public decimal Lang { get; set; }
    public decimal Lat { get; set; }

    public override string ToString()
    {
        return string.Format("Lang: {0}, Lat: {1}", Lang, Lat);
    }
}

Which we can map using:

<map name="FavoritePlaces" table="UsersFavoritePlaces">
    <key column="UserId"/>
    <index column="PlaceName" type="System.String"/>
    <composite-element class="Position">
        <property name="Lang"/>
        <property name="Lat"/>
    </composite-element>
</map>

Using composite-element, we can map value types (types that have no identifiers, and only exists as part of their parent). We can use it using the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    id = session.Save(new User
    {
        Name = "a",
        FavoritePlaces = new Dictionary<string, Position>
        {
            {"home", new Position{Lang = 10,Lat = 94.4m}},
            {"vacation house", new Position{Lang = 130,Lat = 194.4m}}
        },
    });
    tx.Commit();
}

And that give us:

image

By now you are probably already are familiar with what reading the FavoritePlaces collection would look like, so we won’t bother. Instead, let us look at a more complex example, what happen if we want the key of the map to be a complex value type as well? Let us look at this:

public virtual IDictionary<FavPlaceKey, Position> ComplexFavoritePlaces { get; set; }

// FavPlaceKey is another value type
public class FavPlaceKey
{
    public virtual string Name { get; set; }
    public virtual string Why { get; set; }

    public override string ToString()
    {
        return string.Format("Name: {0}, Why: {1}", Name, Why);
    }
}

We can map this collection as:

<map name="ComplexFavoritePlaces" table="UsersComplexFavoritePlaces" >
    <key column="UserId"/>
    <composite-index class="FavPlaceKey">
        <key-property  name="Name"/>
        <key-property name="Why"/>
    </composite-index>
    <composite-element class="Position">
        <property name="Lang"/>
        <property name="Lat"/>
    </composite-element>
</map>

And using this is pretty simple as well:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    id = session.Save(new User
    {
        Name = "a",
        ComplexFavoritePlaces = new Dictionary<FavPlaceKey, Position>
        {
            {
                new FavPlaceKey
                {
                    Name = "home",
                    Why = "what do you mean, why?"
                },
                new Position
                {
                    Lang = 123,
                    Lat = 29.3m
                }
            }
        },
    });
    tx.Commit();
}

Which results in:

image

But that still isn’t over, I am happy to say. So far we have dealt only with value types, what about using entities? You can probably guess:

id = session.Save(new User
{
    Name = "a",
    CommentorsOnMyPosts = new Dictionary<User, Post>
    {
        {anotherUser, post1}
    }
});

With the mapping:

<map name="CommentorsOnMyPosts" table="UserCommentorsOnPosts">
    <key column="UserId"/>
    <index-many-to-many column="CommentorUserId" class="User"/>
    <many-to-many class="Post" column="PostId"/>
</map>

Giving us:

image

But how are we going to read it?

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var user = session.Get<User>(id);
    Console.WriteLine(user.Name);

    foreach (var kvp in user.CommentorsOnMyPosts)
    {
        Console.WriteLine("\t{0} - {1}", kvp.Key.Name,
            kvp.Value.Title);
    }

    tx.Commit();
}

The resulting SQL is quite interesting:

image

When we load the commentors, we join directly to the posts (value), but we have to use a different query to load the user. This is something that you should be aware of, when using a <map/> with entity key.

I would like to point out that I have still not covered all the options for <map/>, there are even more options (Although, IDictionary<K, IList<V >> is not something that is possible to map, for the problematic SQL statements that would be required to do so).

And, of course, you are free to mix all types as you would like, entity key and value type, or a string to an entity, or a complex value type to an entity.

Happy (n)hibernating… :-)

a

NHibernate Mapping -

I am not going to talk about all the options that NHibernate has for collections, I already did it for <set/>, and most of that are pretty similar. Instead, I am going to show just the unique stuff about NHibernate’s <list//>. While <set/> is an unordered collection, of unique elements, a <list/> is a collection of elements where the order matter, and duplicate elements are allowed (because there is a difference between item A in index 0 and index 4).

Let us look at what it means by looking at the class itself:

public class User
{
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
    public virtual IList<Phone> EmergencyPhones { get; set; }
}

And the mapping:

<class name="User" table="Users">
    <id name="Id">
        <generator class="hilo"/>
    </id>
    <property name="Name"/>
    <list name="EmergencyPhones" table="UsersToEmergencyPhones" cascade="all">
        <key column="UserId"/>
        <index column="Position"/>
        <many-to-many column="PhoneId" class="Phone"/>
    </list>
</class>

Note that I am using <many-to-many/> mapping, if I wanted to use the <one-to-many/> mapping, the PhoneId would be located on the Phones table. Now, let us see how we are working with it:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    session.Save(new User
    {
        Name = "a",
        EmergencyPhones = new List<Phone>
        {
            new Phone{Name = "Pop", Number = "123-456-789"},
            new Phone{Name = "Mom", Number = "456-789-123"},
            new Phone{Name = "Dog", Number = "789-456-123"},
        }
    });
    tx.Commit();
}

This will produce the following SQL:

image

And now, what happen if we want to read it? Here is the code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var user = session.Get<User>(id);
    foreach (var emergencyPhone in user.EmergencyPhones)
    {
        Console.WriteLine(emergencyPhone.Number);
    }
    tx.Commit();
}

And the generated SQL:

image 

What happen if we update the list? Let us see the code (anyone is reminded in CS 101?):

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var user = session.Get<User>(id);
    user.EmergencyPhones.Add(new Phone{Name = "Cat", Number = "1337-7331"});
    var temp = user.EmergencyPhones[2];
    user.EmergencyPhones[2] = user.EmergencyPhones[0];
    user.EmergencyPhones[0] = temp;
    tx.Commit();
}

Which result in:

image

Beware of holes in the list! One thing that you should be careful about is something removing an entry from the list without modifying all the other indexes of the list would cause… problems.

If you were using Hibernate (Java), you would get an exception and that would be it. NHibernate, however, is much smarter than that, and will give you a list back, with one caveat. Let us say that we have the following data in the database:

image

Note that we don’t have a row in position #2. When we load the list, we will get: [ phone#3, phone#2, null, phone#4 ]

A lot of the time, we are processing things like this:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var user = session.Get<User>(id);
    foreach (var emergencyPhone in user.EmergencyPhones)
    {
        Console.WriteLine(emergencyPhone.Number);
    }
    tx.Commit();
}

When this code will encounter a hole in the list, it is going to explode in an unpleasant way. You should be aware of this possibility and either protect yourself from it or make sure that whenever you delete from a list, you will rearrange the list so there are no holes in it.

NHibernate Queries – Should I use HQL or Criteria?

This is a common question that I get asked, what is better? What should I use? The actual answer is complex, and some of it, at least, depends on personal preferences. More than that, it also depends on the type of queries that you have.

In general, all queries fall into one of the following distinctions:

  • Select by Id
    • select * from Blogs where id = @id
  • Select by fixed criteria (parameters may change, but the actual query itself is fixed)
    • select top 10 * from Posts where PublishDate <= @today order by PublishDate desc
  • Select by dynamic criteria
    • select * from SupportTickets where Title like @title
    • select * from SupportTickets join Resolutions on … where SupportTickets.Title = @title and Resulotions.Status = ‘Resolved’

The first is usually best served by using Get or Load, since they also check NHibernate’s caches and are intended to handle just this scenario.

The second is by far the most common, and it represent perform some kind of a query (not by id) using fixed logic. We may change the parameter values, but the query shape itself is fixed and cannot be changed. This is usually best served by using HQL. HQL is how NHibernate exposes most of its power, and it allow you to operate on your domain model using set based, object oriented, language. Its similarity to SQL means that it is often very intuitive when it comes the time to write and read it.

Its biggest weakness, however, is when you want to perform queries using some dynamic criteria. For example, if you want to have a search page with multiple optional search fields. For that scenario, we have the Criteria API, which allow us to dynamically, painlessly and easily compose queries.

There is also a slightly different side to that, since we can dynamically create queries, the Criteria API also allow us to programmatically create queries, which is very useful for things like query compositions, dynamically enhancing queries, etc.

Beyond those rough guidelines, you have to consider that HQL expose by far the most of NHibernate’s capabilities, but that the Criteria API is almost as complete. The rest is just a matter of personal taste.