Ayende @ Rahien

Refunds available at head office

A horrible, HORRIBLE, horrible test

I think, without question, that this is one of the most horrible nasty tests that I have seen. It has been ported as is from the java code, and you can literally feel the nastiness in reading it:

[Test]
public void IndexedEmbeddedAndCollections()
{
	Author a = new Author();
	a.Name = "Voltaire";
	Author a2 = new Author();
	a2.Name = "Victor Hugo";
	Author a3 = new Author();
	a3.Name = "Moliere";
	Author a4 = new Author();
	a4.Name = "Proust";

	Order o = new Order();
	o.OrderNumber = "ACVBNM";

	Order o2 = new Order();
	o2.OrderNumber = "ZERTYD";

	Product p1 = new Product();
	p1.Name = "Candide";
	p1.Authors.Add(a);
	p1.Authors.Add(a2); //be creative

	Product p2 = new Product();
	p2.Name = "Le malade imaginaire";
	p2.Authors.Add(a3);
	p2.Orders.Add("Emmanuel", o);
	p2.Orders.Add("Gavin", o2);


	ISession s = OpenSession();
	ITransaction tx = s.BeginTransaction();
	s.Persist(a);
	s.Persist(a2);
	s.Persist(a3);
	s.Persist(a4);
	s.Persist(o);
	s.Persist(o2);
	s.Persist(p1);
	s.Persist(p2);
	tx.Commit();

	s.Clear();

	IFullTextSession session = Search.CreateFullTextSession( s );
	tx = session.BeginTransaction();

	QueryParser parser = new MultiFieldQueryParser( new string[] { "name", "authors.name" }, new StandardAnalyzer() );

	Lucene.Net.Search.Query query = parser.Parse( "Hugo" );
	IList result = session.CreateFullTextQuery( query ).List();
	Assert.AreEqual( 1, result.Count, "collection of embedded ignored" );

	//update the collection
	Product p = (Product) result[0];
	p.Authors.Add( a4 );

	//PhraseQuery
	query = new TermQuery( new Term( "orders.orderNumber", "ZERTYD" ) );
	result = session.CreateFullTextQuery( query).List();
	Assert.AreEqual( 1, result.Count, "collection of untokenized ignored" );
	query = new TermQuery( new Term( "orders.orderNumber", "ACVBNM" ) );
	result = session.CreateFullTextQuery( query).List();
	Assert.AreEqual( 1, result.Count, "collection of untokenized ignored" );

	tx.Commit();

	s.Clear();

	tx = s.BeginTransaction();
	session = Search.CreateFullTextSession( s );
	query = parser.Parse( "Proust" );
	result = session.CreateFullTextQuery( query ).List();
	//HSEARCH-56
	Assert.AreEqual( 1, result.Count, "update of collection of embedded ignored" );

	// Tidy up
	s.Delete(a);
	s.Delete(a2);
	s.Delete(a3);
	s.Delete(a4);
	s.Delete(o);
	s.Delete(o2);
	s.Delete(p1);
	s.Delete(p2);
	tx.Commit();

	s.Close();
}

Point to anyone who want to start a list of how many good test metrics this test violates.

Oh, and right now, this test fails.

Buying the pot as a way of winning the database wars?

The news are just out, Oracle is buying Sun.

This is especially interesting since Oracle has previously bought InnoDB (a key component for MySQL in the enterprise) and Sun has bought MySQL. This means that, off the top of my head, Oracle is now the owner of the following database products:

  • Oracle DB
  • BerkleyDB
  • MySQL & InnoDB

I am pretty sure that they have others, but even so, that is quite a respectable list, I should think.

Of course, with Sun, Oracle is also getting Java…

Where is the ROI?

Justin has a post called: Keep your IQueryable in check, in which he basically says that while he finds value in the Query Object pattern that I recently talked about, he is concerned by the fact that I am returning an IQueryable (or ICriteria or IQuery) from the query object. The problem is that the UI layer can then modify the query in potentially undesirable ways. The example that he gave is:

latePayingCustomQuery  
  .Where(c => c.LastName.StartsWith("E"))  
  .Skip((page - 1) * itemCount)  
  .Take(10);  

Which perform a query on an unindex string column, which is slow.

He suggests limiting what the user can do by providing a IPagable && ISortable that would limit the options of the UI layer to abuse it.

My response to that is: And…?

I am not in favor of doing things to protect developers, but there is also another side to that.

What is the worst thing that can possibly happen? That part of the application would be slow. We will find it during testing, or (heaven forbid), on production.

And…?

How is this in any way different than anything else that we are doing? Presumably the developer who wrote the code didn’t wrote it just because he felt like it, there was a business reason behind it. At that point, we treat this issue as a standard perf issue, and fix it.

The cost of doing that in the rare cases where it will happen vs. the cost of creating the limited abstraction, forcing developers to deal with that, etc. More than that, I can assure you that you will need to break out of the straightjacket at some point, why put it on in the first place?

This is a question that I routinely ask my clients. Why do you want that? “To stop developers from…”

To which I reply: “What is the cost of doing it this way? What is the cost of a mistake? What is the cost of fixing that mistake? What is the difference between that and just another bug?”

There is a reason that I choose the ALT.Net logo to be running with scissors. Power tools are here to be used. Yes, there are things that you want to hide, but those are by far the exception, not the norm.

NHibernate Mapping -

We have previously explored the one-to-one mapping, which let you create 1:1 association in the database, but there is actually another way to map several tables to an object model. We aren’t constrained by the database model, and we can merge several tables into a single entity.

We do that using the <join/> element:

<join
        table="tablename"                        (1)
        schema="owner"                           (2)
        catalog="catalog"                        (3)
        fetch="join|select"                      (4)
        inverse="true|false"                     (5)
        optional="true|false">                   (6)

        <key ... />

        <property ... />
        ...
</join>

Let us explore this a bit, assuming the we have the following database model:"

image

And what we want is to map this to the following object model:

image

We can do this will the following mapping:

<class name="Person"
	 table="People">

	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="Name" />

	<join table="Addresses">
		<key column="PersonId"/>
		<property name="Line1"/>
		<property name="Line2"/>
		<property name="City"/>
		<property name="Country"/>
		<property name="ZipCode"/>
	</join>
</class>

And getting a Person will now result in the following SQL:

image

NHibernate will now take care of mapping everything to its place, and making sure that everything works just right.

By now you should be able to figure out what fetch means, and inverse should also be familiar. Optional is interesting, what is basically says is that we should always use an outer join to get the values from the Addresses table and if all the mapped properties are null, it wouldn’t create a new row in the Addresses table.

On the face of it, it looks like a nice way to merge tables together, but that isn’t actually the primary reason for this feature. You can use it for that, for sure, but that is mostly useful in legacy databases. In most cases, your object model should be more granular than the database model, not the other way around.

The really interesting part about this feature is that it allows us to mix & match inheritance strategies. It let us create a table per hierarchy that store all the extra fields on another table, for example. That significantly reduce the disadvantage of using a table per hierarchy or table per subclass, since we can tell very easily what is the type of the class that we are using, and act appropriately.

NHibernate Mapping -

In the database world, we have three kind of associations: 1:m, m:1, m:n.

However, occasionally we want to have a one to one relationship. We could simulate it easily enough on the database side using two many to one relations, but that would require us to add the association column to both tables, and things gets… tricky when it comes the time to insert or update to the database, because of the cycle that this creates.

NHibernate solves the problem by introducing a one-to-one mapping association, which allow you to define the two relationships based on a single column in the database, which controls the two way association.

<one-to-one
        name="PropertyName"                                (1)
        class="ClassName"                                  (2)
        cascade="all|none|save-update|delete"              (3)
        constrained="true|false"                           (4)
        fetch="join|select"                                (5)
        property-ref="PropertyNameFromAssociatedClass"     (6)
        access="field|property|nosetter|ClassName"         (7)
/>

1, 2, 3, 6, 7 were all discussed elsewhere, so I’ll skip them and move directly to showing how this can be used.

We have the follow object model:

image

And the database model:

image

Note that while in the object model we have a bidirectional mapping, in the database we have only a single reference on the employees table. In the relational model, all associations are naturally bidirectional, but that is not true on the object model. In order to bridge this inconsistency, we map them as:

<class name="Employee"
		table="Employees">

	<id name="Id">
		<generator class="native"/>
	</id>
	
	<property name="Role"/>

	<many-to-one name="Person"
		unique="true"
		column="Person"/>
</class>

<class name="Person"
		table="People">

	<id name="Id">
		<generator class="native"/>
	</id>
	<property name="Name" />
	<one-to-one name="Employee"
			class="Employee"/>
</class>

We have a unique many-to-one association from Employee to Person, but a one to one from Person to Employee. This will reuse the many-to-one association defined in the Employee mapping.

Let see how this works for saving and loading the data:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = new Person
	{
		Name = "test",
	};
	var employee = new Employee
	{
		Person = person,
		Role = "Manager"
	};
	person.Employee = employee;
session.Save(person);
session.Save(employee); tx.Commit(); } // person to employee using (var session = sessionFactory.OpenSession()) using (var tx = session.BeginTransaction()) { var employee = session.Get<Person>(1).Employee; Console.WriteLine(employee.Role); tx.Commit(); } // employee to person using (var session = sessionFactory.OpenSession()) using (var tx = session.BeginTransaction()) { var person = session.Get<Employee>(1).Person; Console.WriteLine(person.Name); tx.Commit(); }

And the SQL that would be generated would be:

image

This is quite interesting. We can see that we insert the entities as we expect, but when we pull a person out, we do a join to the employee, to get the one-to-one association. For that matter, even in the second scenario, we do a join to get the associated employee.

The reason that we have to do it is quite interesting as well. NHibernate makes some guarantees about the way the object model and the database model map to one another. And one of those guarantees is that if there is no association in the database, we will get back a null in the object model.

Generally, this works very well, since we can tell whatever an association exists or not using the value in the table (for many-to-one associations). But for one-to-one association, if we want to keep this guarantee, we have to check the associated table to verify if we need to have a null or a proxy there. That is somewhat annoying, but we can get around that by specifying constrained=”true”. This tell NHibernate that in this case, whenever there is a Person, there must also be a matching Employee value. We can specify it like this:

<one-to-one name="Employee" 
	constrained="true"
	foreign-key="none"
	class="Employee"/>

Something else to note is that we must specify this with foreign-key=”none”, because otherwise NHibernate’s Schema Export feature would create two foreign keys for us, which would create a circular reference that wouldn’t allow us to insert anything into the database.

When setting this, we can see that there is a dramatic change in NHibernate’s behavior:

image

Instead of generating joins, NHibernate now uses standard selects to get the data. And we don’t have to pre-populate the information on loading the entity, we can delay that as we usually do with NHibernate.

And the last thing that we will explore for <one-to-one/> is the fetch attribute. It defaults to select, so we have already seen how that works, but when we set fetch=”join”, we get an interesting flashback. Well, almost:

image

Again, we use a join to get the value upfront, but since we are now using constrained=”true”, we can use an inner join instead of a left outer join, which is more efficient in most cases.

How Microsoft should release guidance?

Phil Haack has a post about code sample taxonomy, in which he asks how Microsoft can ship high quality sample apps:

Obviously, this [shipping high quality samples] is what we should be striving for, but what do we do in the meantime? Stop shipping samples? I hope not.

Again, I don’t claim to have the answers, but I think there are a few things that could help. One twitter response made a great point:

a reference app is going to be grilled. Even more if it comes from the mothership. Get the community involved *before* it gets pub

Getting the community involved is a great means of having your code reviewed to make sure you’re not doing anything obviously stupid. Of course, even in this, there’s a challenge. Jeremy Miller made this great point recently:

We don't have our own story straight yet.  We're still advancing our craft.  By no means have we reached some sort of omega point in our own development efforts. 

In other words, even with community involvement, you’re probably going to piss someone off.

Um, no.

I feel that Jeremy’s point has been pulled out of context. While we may have disagreements about what constitute the future directions in software development, that isn’t actually what cause the community to reject the bad samples from Microsoft.

No one said a word about Kobe because it didn’t use NHibernate, or wasn’t built with messaging or using DDD.

What we reject in those sample app is that they are, put simply, bad coding. There is nothing amorphous about this, we aren’t talking about subjective feeling, we are talking about very concrete, objective and real metrics that we can apply.

Bad Code

Not some omega point.

Bad Code

Not lack of understanding of the platform because the developers writing that were new to ASP.Net MVC.

Bad Code

That would have been bad code if it was written 5 years ago or 15 years ago.

That is the problem.

And I think that there is quite a simple answer for that. Stop shipping official guidance package without involvement from the community.

Create a special CodePlex project where you are being explicit about putting things for review, not for publishing. After the guidance has been published, it is probably too late already.

Get some feedback from the community, then you can publish this in the usual places.

The DAL should go all the way to UI

My recent post caused quite a few comments, many of them about two major topics. The first is the mockability of my approach and the second is regarding the separation of layers. This post is about the second concern.

A typical application architecture usually looks like this:

image

 

This venerable structure is almost sacred for many people. It is also, incidentally, wrong.

The main problem is that the data access concerns don’t end up in the business layer. There are presentation concerns that affect that as well. Let us take a look at a common example. I want to show the invoices for a user:

image

Given that requirement, I quickly build my interface, implement it and throw it over the wall to the UI team to deal with it*.

Here is the interface that I came up with:

  • GetInvoicesForUser(userId)

Great, isn’t it? It satisfy all the business requirements that I have.

Except, my UI can’t actually work with this. We have to have paging there as well, and the only way to do paging using this API is to do that on the client side, which is probably bad. I grumble a little bit but change the interface to:

  • GetInvoicesForUser(userId, pageNum, pageSize)

Done, right?

Well, not really. I have a new UI requirement now, the user want to be able to sort the grid by any column. Now I grumble even more, because this is harder, but I create the following interface:

  • GetInvoicesForUser(userId, pageNum, pageSize, orderByCol, orderByDesc)

And then they want to order by multiple columns, and then they…

Do you notice a theme here? A lot of the data access concerns that I have are not actually concerns that the layer above me has, they are one layer removed.

But there is a more important problem here, I am violating (repeatedly) the Open Closed Principle. As a reminder:

software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification

My data access is not open for extension, and it is certainly not closed for modification.

The problem is that I am trying to create a hard isolation between two pieces of the system that need to work very closely together, and as long as we are going to insist on this strict separation, we are going to have problems.

My current approach is quite different. I define queries, which contains the actual business logic for the query, but I pass that query onward to higher levels of my application. Any additional processing on the query (projections, paging, sorting, etc) can be done. In that way, my queries are closed for modification and open for extension, since they can be further manipulated by higher level code.

A good example of that would be:

public class LatePayingCustomerQuery
{
	public DateTime? DueDate {get;set;}
	public decimal? MinAmount {get;set;}
	public deciaml MaxAmount {get;set;}
	
	public IQueryable<Customer> GetQuery(ISession session)
	{
		var payments = 	from payment in s.Linq<Payment>()
			 		where payment.IsLate
					select payment;
	
		if(DueDate != null)
			payments = payments.Where( payment => payment.DueDate >= DueDate.Value );
			
		if(MinAmount != null)
			payments = payments.Where( payment => payment.Amount >= MinAmount.Value );
		
		if(MaxAmount != null)
			payments = payments.Where( payment => payment.Amount <= MaxAmount.Value );
		
		return 	from customer in s.Linq<Customer>
				where payments.Any( payment => customer == payment.Customer )
				select customer;
	}
}

I hope that this clears the air a bit about how I approach data access using NHibernate.

* it is a joke, nitpickers will be shot.

Mocking NHibernate

My recent post caused quite a few comments, many of them about two major topics. The first is the mockability of my approach and the second is regarding the separation of layers.

This post is about the first concern. I don’t usually mock my database when using NHibernate. I use an in memory database and leave it at that. There are usually two common behaviors for loading data from the database. The first is when you need to load by primary key, usually using either Get or Load, the second is using a query.

If we are using Get or Load, this is extremely easy to mock, so I won’t touch that any further.

If we are using a query, I categorically don’t want to mock that. Querying is a business concern, and should be treated as such. Setting up an in memory database to be used with NHibernate is easy, you’ll have to wait until the 28th for the post about that to be published, but it is basically ten lines of code that you stick in a base class.

As such, I have no real motivation to try to abstract that away, I gain a lot, and lose nothing.

Posts vs. Comments

Just some interesting correlation:

image

Yesterday was the second most active day comments wise, eclipsed only by my single foray into politics.

Another interesting statistics:

# Posts / Day

1

2

3

4

5

6

7

8

9

10

11

12

13

14

17

22

# of Days

336

287

200

165

104

64

28

32

11

11

5

7

3

1

1

1

This table shows the relation between the number of posts per day and the number of times that I posted that number.

So, for example, you can see that my record is (once and never again!) 22 posts per day, and that I quite frequently post more than a single post per day.

Repository is the new Singleton

I mentioned in passing that I don’t like the Repository pattern anymore much, and gotten a lot of responses to that. This is the answering post, and yes, the title was  chosen to get a rise out of you.

There are actually two separate issues that needs to be handled here. One of them is my issues with the actual pattern and the second is the pattern usage. There most commonly used definition for Repository is defined in Patterns of Enterprise Application Architecture:

A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper, that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated. This becomes more important when there are a large number of domain classes or heavy querying. In these cases particularly, adding this layer helps minimize duplicate query logic.

A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes. Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers.

Note that while the actual pattern description defined in PoEAA and DDD are very nearly identical, the actual reasoning behind it is different, and the DDD repository is limited to aggregate roots only.

So, what is the problem with that?

The problem with this pattern is that it totally ignores the existence of mature persistence technologies, such as NHibernate. NHibernate already provides an illusion of in memory access, in fact, that is its sole reason of existing. Declarative queries, check. OO view on the persistence store, check. One way dependency between the domain and the data store, check.

So, what do I gain by using the repository pattern when I already have NHibernate (or similar, most OR/M have matching capabilities by now)?

Not much, really, expect as additional abstraction. More than that, the details of persistence storage are:

  • Complex
  • Context sensitive
  • Important

Trying to hide that behind a repository interface usually lead us to a repository that has method like:

  • FindCustomer(id)
  • FindCustomerWithAddresses(id)
  • FindCustomerWith..

It get worse when you have complex search criteria and complex fetch plan. Then you are stuck either creating a method per each combination that you use or generalizing that. Generalizing that only means that you now have an additional abstraction that usually map pretty closely to the persistent storage that you use.

From my perspective, that is additional code that doesn’t have to be written.

Wait, I can hear you say, but repositories encapsulate queries, and removing query logic duplication is one of the reasons for them in the first place.

Well, yes, but encapsulation of queries should be done in the repository. Queries are complex, and you want to encapsulate them in their own object. In most cases, I have something like this:

image

GetQuery takes an ISession an return ICriteria, which mean that my code gets the chance to set paging, ordering, fetching strategies, etc. That is not the responsibility of the query object, and trying to hide it only add additional abstraction that doesn’t actually give me anything.

I mentioned that I have two problems with the repository pattern, the second being the way it is being used.

Quite frankly, and here I fully share the blame, the Repository pattern is popular. A lot of people use it, mostly because of the DDD association. I am currently in the opinion that DDD should be approached with caution, since if you don’t actually need it (and have the prerequisites for it, such as business expert to work closely with or an app that can actually benefit from it), it is probably going to be more painful to try using DDD than without.

More than that, the way that most people use a Repository more closely follows the DAO pattern, not the Repository pattern. But Repository sounds more cool, so they call it that.

My current approach for data access now is:

  • When using a database, use NHibernate’s ISession directly
  • Encapsulate complex queries into query objects that construct an ICriteria query that I can get and manipulate further
  • When using something other than a database, create a DAO for that, respecting the underlying storage implementation
  • Don’t try to protect developers

Let us see how many call for my lynching we get now…

What do I expect from guidance?

Phil Haack has a post about code sample taxonomy. He suggests the following taxonomy:

  • Prototype
  • Demo
  • Sample
  • Production
  • Reference

I am not sure that I agree with his division. From my point of view, demo code need to focus on one single aspect, mostly because we are going to talk around that, and doing that means that I don’t have the time to deal with anything but the aspect that I need to talk about.

Sample code is a more fleshed out example.

I don’t make any distinction between reference and sample apps, maybe the size is different but that is about it.

What is important to follow is that for all of them, except maybe prototype (and I was bitten by that before), I have the same code quality expectations.

Note that I am saying code quality vs. production readiness. Most of the code that you see in demos or sample apps is not production ready. And that is fine.

Production readiness include such things as:

  • Parameter validation
  • Security
  • Error handling
  • Failover & recovery

And a lot more “taxes” that we need to deal with.

That does not include code quality. Code quality is a metric of its own, and it is one that you cannot give up. I would be personally ashamed to put out something that didn’t meet my own code quality metrics.

Sometimes, being production ready means that you have to give up on code quality, here is one example, that is understood and acceptable in certain circumstances.

What is not acceptable is to say that this is “Xyz” type of code sample, therefore code quality doesn’t matter.

And code quality isn’t really some amorphous thing. It is a very well defined metric that can be automatically checked. FxCop or NDepend are good tools to have, Simian will give you a lot of value all on its own.

When you put something out as an official Microsoft guidance release, the bar is higher, because whatever you do will be used by other people.

Kobe – Architectural Overview

Sometimes, you have to make a distinction between the system architecture and the system implementation. It is possible to have a great architecture and have the implementers butcher it by not paying attention at some points.

Sometimes, the problem in the code implementation is so bad that it becomes an architectural problem. Kobe is one such example.

We can start with the pervasive Primitive obsession that is all over the map. The common use of “string thisIsReallyGuid”, Copy & paste programming, horrible Exception Handling and total lack of respect for the fundamentals of OO.

All of those tie back together to the lack of architectural oversight over the project. I have the feeling that at no point someone stopped and said: “there has got to be a better way of doing this.”

DRY is a true principle in development, it transcend programming languages and paradigms, about the only one that I can think of that does so. If you violate DRY, you are in for a world of trouble, sooner than later.

SOLID is a set of principles used to build good OO applications. If you are using OO language and paradigms and fail to pay attention to SOLID, it will hurt you at some point. Not as soon as if you have violated DRY, but soon enough.

8.5% of Kobe is copy & pasted code. And that is with the sensitivity dialed high, if we set the threshold to 3, which is what I commonly do, is goes up to 12.5%. A lot of the duplicated code relates to exception handling and caching. Those are classic cases for using an aspect to handle cross cutting concerns. That would reduce a lot of the duplicated code.

The other problems that I have with Kobe is its total lack of respect for SOLID principles. You know what, let us throw solid out the window, I am going to focus on a single principle, Single Responsibility Principle. Just about any method that you care to name does more than one thing, in many cases, they do a lot more than one thing.

This is absolutely wrong.

The “repositories” in Kobe has so many responsibilities that they are just shy of having a singularity around them.

Overall, Kobe feels very amaturish, as if it was written without true understanding of what is expected of quality code, and without really understanding the underlying platform, frameworks or the infrastructure that they are using.

I don’t like it at all, and it is a horrible sample application. In my considered opinioned, it should be pulled from the site, rewritten, and only then published again.

Kobe – an example of exception handling done wrong

In Kobe, pretty much all exception handling is this:

catch (Exception ex)
{
    bool rethrow = ExceptionPolicy.HandleException(ex, "Service Policy");
    if (rethrow && WebOperationContext.Current == null)
    {
        throw;
    }
    return null;
}

These lines appear in just about any method in the application.

They are wrong. In fact, they are wrong on more than one level. Violating DRY is a big offence, but even ignoring that, they are still wrong.

Exception handling is not something that you should take lightly, and there is a strong tie between the way that you write the code and the way errors are handled. The whole purpose of this exception handling is to avoid throwing an exception by setting some configuration parameter.

The problem with this is that this purpose is wrong. This is a case of writing the code without actually trying to understand what the implications are. You cannot do this sort of thing, because the calling code never expects a null, it expect a valid result or an exception.

This approach bring us back to the bad old days of error codes. Not to mention that even if we actually checked for a null return value, what the hell is the error that actually caused this to happen.  Oh, it was put in the log, so now we need to go and look at the log file and try to match timestamps (if we even can).

Exception handling should appear in exactly two places:

  • When an error is expected (making a web request call, for example) and there is some meaningful behavior to be done in the case of a failure (such as retrying after some delay)
  • On a system boundary, in which case you need to make a decision about how you are going to expose the error to the outside world.

It is important to note that I explicitly used the term system boundary, instead of service boundary, which is more commonly used. Exception handling within a system should not try to hide errors. Error should propagate all the way to the system boundary, because otherwise we lose a lot of important information.

System boundary is defined as whatever is visible to the actual user or other systems.

Oh, and exception handling in the system boundary is a system infrastructure problem, it is not something that you have to write code for all over the place.

Kobe – When the documentation is the only delivery that matters

When comparing the Kobe code base to the Kobe documentation, I am starting to get a sinking feeling. The documentation is really nice, the code is anything but.

When you put out guidance, the documentation is nice, but it is the code that people are going to look it, learn and try to emulate. And when you put out something that doesn’t meet basic parameters for good software, that is a huge problem.

With Oxite, Microsoft had the excuse of it being slipped out without anyone noticing. This time it is official, it has passed the proper review procedure, and is explicitly marketed as “intended to guide you with the planning, architecting, and implementing of Web 2.0 applications and services.”

Sorry, that is entirely unacceptable.

Putting something like that out there as guidance is actively doing harm to the community as a whole. It means that people who would look at that and try to use what they see there are going to get bitten by bad practices, bad architecture and overly complex and redundant approaches.

Phil Haack said that: “I was part of a high-level review and I felt most of the app that I saw was reasonably put together.”

I submit that high level review of such guidance packages is absolutely not sufficient. You have to get people to do a full review cycle of that, which include the code, the documentation, the reasoning and anything else that is going out there.

Kobe – Data Access done wrong

From the Kobe documentation:

The Repository pattern is applied to mediate access to the data store through a Repository provider.

From PoEAA:

A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper (165), that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated. This becomes more important when there are a large number of domain classes or heavy querying. In these cases particularly, adding this layer helps minimize duplicate query logic.

I think that Repository is a fad, which is why I don’t use it much anymore myself, and looking at the usage semantics of the “repositories” in Kobe, they certainly don’t fit the bill.

Repositories are here to encapsulate query logic, that is not the case with Kobe. Here is a partial view of some of the repositories that it have:

image

In general, looking at the any of the repositories in detail, I really want to cry. I mean, just take a look:

image

More to the point, let us look at GroupRepository.AcceptGroupInvite (I removed all the copy & paste crap to concentrate on the actual code):

GroupInvite groupInvite = _context.GroupInviteSet
                                 .Where(gi => gi.GroupInviteId == groupInviteId)
                                 .FirstOrDefault();

if (groupInvite == null)
    throw new DataException("Invalid Invite Id", null);

groupInvite.InvitationStatuses = _context.InvitationStatusSet
                                        .Where(s => s.InvitationStatusName == "Accepted")
                                        .FirstOrDefault(); ;

_context.SaveChanges(true);

There are so many things wrong with this approach that I don’t even know where to start.

You can see how the architects or developers resented having to use an ORM. If they couldn’t have stored procedures in the database, then by Jove and his keyboard, they will write stored procedures in code.

Kobe uses Entity Framework for data access, and they treat it as if it was a dataset. As an OR/M aficionado, I am insolated. This shows complete lack of understanding how Entity Framework work, doing a lot more work than you should, brute force & big hammer approach.

From an architectural perspective, most of what Kobe is doing is calling down to the repository to perform the data access details. Business logic, such as there is, is spread between the data access, the controllers and the service layer, in such a way that make is very hard to have a good idea about what is actually happening in the application.

Data access details are quite pervasive throughout the application, and the whole feeling of the application is of a circa 2001 sample app using stored procedures and hand rolled data access layers.

Kobe – In the nuts & bolts and don’t really liking it

There is another ASP.Net MVC sample app, this time it is official, passed the proper review procedure, and is explicitly marketed as “intended to guide you with the planning, architecting, and implementing of Web 2.0 applications and services.”

I am saying all of that in order to distinguish it from Oxite, which was non of this things. There have been a couple of reviews of Kobe already. Frankly, I don’t really care for them, mostly because I think that they dealt too much with nitty gritty details of the app that doesn’t really matter much. I don’t much care for extra using or the use of System.Int32 vs. int, the naming convention used or even what sort of HTML formatting they used. I mostly care about the code and architecture. So I decided to take a look myself.

This post is going to go over the nuts & bolts, looking at low level coding details. I am going to have another that is going to look at the architecture and probably a few others that talk about some additional concerns that I have.

Let us start by looking at the code via some tools. Simian reports:

image

And that is when the similarity threshold is 6 lines, I usually run it with three, if we try that, we get:

Found 5138 duplicate lines in 873 blocks in 67 files

Next, the most important code metric to me in most cases, Cyclomatic Complexity. Two methods literally jump out of the solution and beg for mercy.

  • HomeController.Discovery with CC index of 35(!)
  • GroupController.Profile with CC index of 22

I am going to show them both in all their glory, and let you be the judge of them:

public ActionResult Discovery(string query)
{
    query = query.Trim();

    query = Server.HtmlEncode(query);

    List<PresentationSummary> presentationsAll = null;
    List<PresentationSummary> presentationsMostPopular = null;
    List<PresentationSummary> presentationsMostViewed = null;
    List<PresentationSummary> presentationsMostDownload = null;
    List<PresentationSummary> presentationsNew = null;

    List<UserSummary> activeusers = null;
    List<UserSummary> allusers = null;
    List<Group> activegroups = null;
    List<Group> allgroups = null;

    int iNewMemberPageCount = 0;
    int iNewGroupPageCount = 0;
    int ipresentationsAll = 0; 
    int ipresentationsMostPopular =  0;
    int ipresentationsMostViewed = 0;
    int ipresentationsMostDownload = 0;
    int ipresentationsNew = 0;

    UserSummary user = _userService.GetUserByName(query);
    if (user != null)
    {
        presentationsAll = _userService.GetAuthoredPresentations(user.UserName);
        presentationsMostPopular = presentationsAll.OrderByDescending(p => p.Favorites).ToList();
        presentationsMostViewed = presentationsAll.Where(p => p.Views > 0).OrderByDescending(p => p.Views).ToList();
        presentationsMostDownload = presentationsAll.Where(p => p.Downloads > 0).OrderByDescending(p => p.Downloads).ToList();
        presentationsNew = presentationsAll.OrderByDescending(p => p.InsertedDate).ToList();

        ipresentationsAll = decimal.Divide(presentationsAll.Count, 10).ToInt();
        ipresentationsMostPopular = decimal.Divide(presentationsMostPopular.Count, 10).ToInt();
        ipresentationsMostViewed = decimal.Divide(presentationsMostViewed.Count, 10).ToInt();
        ipresentationsMostDownload = decimal.Divide(presentationsMostDownload.Count, 10).ToInt();
        ipresentationsNew = decimal.Divide(presentationsNew.Count, 10).ToInt();

        activeusers = new List<UserSummary> { user };
        allusers = new List<UserSummary> { user };
        iNewMemberPageCount = decimal.Divide(allusers.Count, 8).ToInt();

        allgroups = _userService.GetGroups(user.UserName);
        activegroups = allgroups.OrderByDescending(g => g.InsertedDate).ToList();
        iNewGroupPageCount = decimal.Divide(allgroups.Count, 8).ToInt();
    }
    else
    {
        Group group = _groupService.GetGroupByName(query);
        if (group != null)
        {
            presentationsAll = _groupService.GetPresentations(group.GroupName);
            presentationsMostPopular = presentationsAll.OrderByDescending(p => p.Favorites).ToList();
            presentationsMostViewed = presentationsAll.Where(p => p.Views > 0).OrderByDescending(p => p.Views).ToList();
            presentationsMostDownload = presentationsAll.Where(p => p.Downloads > 0).OrderByDescending(p => p.Downloads).ToList();
            presentationsNew = presentationsAll.OrderByDescending(p => p.InsertedDate).ToList();

            ipresentationsAll = decimal.Divide(presentationsAll.Count, 10).ToInt();
            ipresentationsMostPopular = decimal.Divide(presentationsMostPopular.Count, 10).ToInt();
            ipresentationsMostViewed = decimal.Divide(presentationsMostViewed.Count, 10).ToInt();
            ipresentationsMostDownload = decimal.Divide(presentationsMostDownload.Count, 10).ToInt();
            ipresentationsNew = decimal.Divide(presentationsNew.Count, 10).ToInt();

            allusers = _groupService.GetMembers(group.GroupName);
            activeusers = allusers.OrderByDescending(u => u.DateOfJoining).ToList();
            iNewMemberPageCount = decimal.Divide(allusers.Count, 8).ToInt();

            allgroups = new List<Group> { group };
            activegroups = new List<Group> { group };
            iNewGroupPageCount = decimal.Divide(allgroups.Count, 8).ToInt();
        }
        else
        {
            presentationsAll = _presentationService.GetAllPresentationsByKewordTimeLine(query, "Day", "0", 10, 1);
            presentationsMostPopular = _presentationService.GetMostPopularPresentationsByKewordTimeLine(query, "Day", "0", 10, 1);
            presentationsMostViewed = _presentationService.GetMostViewedPresentationsByKewordTimeLine(query, "Day", "0", 10, 1); 
            presentationsMostDownload = _presentationService.GetMostDownloadedPresentationsByKewordTimeLine(query, "Day", "0", 10, 1); 
            presentationsNew = _presentationService.GetNewPresentations(query, "Day", "0", 10, 1);

            ipresentationsAll = decimal.Divide(_presentationService.GetAllPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsMostPopular = decimal.Divide(_presentationService.GetMostPopularPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsMostViewed = decimal.Divide(_presentationService.GetMostViewedPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsMostDownload = decimal.Divide(_presentationService.GetMostDownloadedPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsNew = decimal.Divide(_presentationService.GetNewPresentations(query, "Day", "0").Count, 10).ToInt();

            activeusers = _userService.GetMostActiveUsers(query, 8, 1);
            allusers = _userService.GetAllUsers(query, 8, 1);
            iNewMemberPageCount = decimal.Divide(_userService.GetMostActiveUsers(query).Count,8).ToInt();

            activegroups = _groupService.GetMostActiveGroupByKeyword(query, 8, 1);
            allgroups = _groupService.GetAllGroupByKeyword(query, 8, 1);
            iNewGroupPageCount = decimal.Divide(_groupService.GetMostActiveGroupByKeyword(query).Count, 8).ToInt();
        }
    }

    ViewData.Add("membersList-mostactive", activeusers);
    ViewData.Add("membersList-all", allusers);
    ViewData.Add("groupsList-mostactive", activegroups);
    ViewData.Add("groupsList-all", allgroups);

    ViewData.Add("presentations-all",presentationsAll);
    ViewData.Add("presentations-mostpopular",presentationsMostPopular);
    ViewData.Add("presentations-mostviewed",presentationsMostViewed);
    ViewData.Add("presentations-mostdownload",presentationsMostDownload);
    ViewData.Add("presentations-new",presentationsNew);
   
    ViewData.Add("Query", query);
    //ViewData.Add("Presentations", presentations);

    ViewData.Add("members-totalcount", iNewMemberPageCount);
    ViewData.Add("groups-totalcount", iNewGroupPageCount);

    ViewData.Add("presentations-alltotalcount", ipresentationsAll);
    ViewData.Add("presentations-mostpopulartotalcount", ipresentationsMostPopular);
    ViewData.Add("presentations-mostviewedtotalcount", ipresentationsMostViewed);
    ViewData.Add("presentations-mostdownloadtotalcount", ipresentationsMostDownload);
    ViewData.Add("presentations-newtotalcount", ipresentationsNew);

    return View();
}

This is… a very busy method, I must say. But in a way, the Profile method is much worse:

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Profile(string gname, string type, string section, string subSection, string page)
{
    if (string.IsNullOrEmpty(gname))
        return new ContentResult { Content = "" };

    if (type != "widget")
        return new ContentResult { Content = "" };

    Group group = null;

    try
    {
        group = _groupService.GetGroupByName(gname);
    }
    catch (Exception)
    {
        return new ContentResult { Content = "" };
    }

    if (group == null)
    {
        return new ContentResult { Content = "" };
    }

    string groupName = group.GroupName;

    AddUserLevelToViewData(groupName);

    int pageNo = 1;
    Int32.TryParse(page, out pageNo);
    if (pageNo == 0)
        pageNo = 1;

    if (section == "div-GroupPresentations")
    {
        List<PresentationSummary> presentations = null;

        switch (subSection)
        {
            case "div-GroupPresentations-RecentltAdded":
                presentations = _groupService.GetRecentlyAddedPresentations(groupName, 5, pageNo);
                break;
            case "div-GroupPresentations-MostViewed":
                presentations = _groupService.GetMostViewedPresentations(groupName, 5, pageNo);
                break;
            case "div-GroupPresentations-MostDownloaded":
                presentations = _groupService.GetMostDownloadedPresentations(groupName, 5, pageNo);
                break;
            case "div-GroupPresentations-All":
                presentations = _groupService.GetPresentations(groupName, 5, pageNo);
                break;
        }

        return View("PresentationsList", presentations);
    }
    else if (section == "div-GroupWall")
    {
        switch (subSection)
        {
            case "div-GroupWall-Messages":
                ViewData["GroupMessages"] = _groupService.GetMessages(groupName, 5, pageNo);
                return View("GroupMessageList", ViewData["GroupMessages"]);
            case "div-GroupWall-MemberRequests":
                ViewData["GroupJoiningRequests"] = _groupService.GetGroupJoiningRequests(groupName, 5, pageNo);
                return View("GroupJoiningRequestList", ViewData["GroupJoiningRequests"]);
        }
    }
    else if (section == "div-GroupInfoExtended")
    {
        switch (subSection)
        {
            case "div-GroupInfoExtended-GroupMembers":
                ViewData["GroupMembers"] = _groupService.GetMembers(groupName, 4, pageNo);
                return View("MembersList", ViewData["GroupMembers"]);
        }
    }

    return new ContentResult { Content = "" };
}

Just look at the code. I thought that the whole point of MVC was to separate the logic from the view. Having the view strongly tied to the controller output is fine by me, but having the controller strongly tied to the HTML format of the page? That isn’t right.

Another thing that isn’t right is HomeController.Index():

public ActionResult Index()
{
    GetFeaturedPresentations(); //*** Dummy call the the Database to activate the Connection.
    List<PresentationSummary> featured = _presentationService.GetFeaturedPresentations();
    List<PresentationSummary> beingViewed = _presentationService.GetPresentationRecentlyViewed();
    List<PresentationSummary> mostDownloaded = _presentationService.GetMostDownloadedPresentation();
    PresentationSummary presentationOfDay = _presentationService.GetPresentationOfDay();

    ViewData.Add("FeaturedPresentations", featured.ToArray());
    ViewData.Add("RecentlyViewedPresentations", beingViewed.ToArray());
    ViewData.Add("MostDownloadedPresentations", mostDownloaded.ToArray());
    ViewData.Add("presentationsOfDay", presentationOfDay);
    ViewData["Tags"] = _presentationService.GetPopularTags();

    return View();
}

Notice the first call?

private void GetFeaturedPresentations()
{
    try {
        //*** Dummy call to the Presentation Service to get the Featured presentations,
        //*** this call is place because, an exception thrown from the Data layered on first hit to the DB (failed to open DB) 
        //*** and the second hit to the DB gets success.
        _presentationService.GetFeaturedPresentations();
    }
    catch (Exception)
    { /*do nothing with this exception*/ }
}

I really like it when you work around a bug instead of actually fix it.

Moving on, let us look at the service layer. Most of it looks like this:

public ADs GetAdById(string adId)
{
    try
    {
        string key = "Ads-" + adId;
        ADs data = cacheService.Get<ADs>(key);
        if (data == null)
        {
            data = provider.GetAdById(adId.ToGuid());
            if (data != null && data.Image != null)
            {
                cacheService.Add(key, data as Object, null, DateTime.MaxValue, new TimeSpan(0, 10, 0), System.Web.Caching.CacheItemPriority.Normal, null);
            }
        }
        return data;

    }
    catch (Exception ex)
    {
        bool rethrow = ExceptionPolicy.HandleException(ex, "Service Policy");
        if (rethrow && WebOperationContext.Current == null)
        {
            throw;
        }
        return null;
    }
}

I am not joking about all of them looking alike, by the way, it is obviously has been cut & paste a lot.

Nitpick, “data as Object” – that is not something you often see. But this is even better (from CacheService):

image

I like how we have a cacheService but we coupled its interface with System.Web.Caching, or the fact that most of this code is just a very long winded way of calling GetAdById.

But wait, I spared you the method documentation, which is a real masterpiece:

/// <summary>
/// Returns Advertisment.
/// </summary>
/// <param name="adId"> GUID of an Advertisment</param>
/// <returns>Ads</returns>
public ADs GetAdById(string adId)

Yes, the adId is a string which is a guid. We are so lucky to work with VARIANT again, right?

Let us take another method, just to see what is going on in a typical method inside the project. I intentionally avoid the ones that we already looked at. I took a peek at CommunityController and found the Index method:

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Index(string type, string section, string subSection, string page)
{
    if (type != "widget")
        return new ContentResult { Content = "" };

    int pageNo = 1;
    Int32.TryParse(page, out pageNo);
    if (pageNo == 0)
        pageNo = 1;

    if (section == "members")
    {
        List<UserSummary> users = null;

        switch (subSection)
        {
            case "members-new":
                users = _communityService.GetNewUsers(8, pageNo);
                break;
            case "members-mostactive":
                users = _communityService.GetMostActiveUsers(8, pageNo);
                break;
            case "members-all":
                users = _communityService.GetAllUsers(8, pageNo);
                break;
            default:
                users = _communityService.GetAllUsers(8, pageNo);
                break;
        }

        return View("MembersList", users);
    }
    else if (section == "groups")
    {
        List<Group> groups = null;

        switch (subSection)
        {
            case "groups-new":
                groups = _communityService.GetNewGroups(8, pageNo);
                break;
            case "groups-mostactive":
                groups = _communityService.GetMostActiveGroups(8, pageNo);
                break;
            case "groups-all":
                groups = _communityService.GetAllGroups(8, pageNo);
                break;
            default:
                groups = _communityService.GetAllGroups(8, pageNo);
                break;
        }

        return View("GroupsList", groups);
    }
    else if (section == "favourites")
    {
        List<PresentationSummary> favouritePresentation = _communityService.GetCommunityFavorite(10, pageNo);
        return View("PresentationsView", favouritePresentation.ToArray());
    }


    return new ContentResult { Content = "" };
}

Let me see how many things I can find in a cursory examination:

  • Hard coding galore
  • Long method
  • Complex method
  • Controller method return several different views

And note that I am still not even trying for the architectural concepts or code quality metrics. That I’m going to leave to another post.

Frankly, I am seeing way too much bad things in the code to overview all of them. I am going to stop with a goodie, though.

Let us explore GroupRepository.GetGroup, shall we?

 private Group GetGroup(Guid groupId, KobeEntities Context)
 {
     try
     {
         Group group = Context.GroupSet
                          .Where(g => g.GroupId == groupId)
                          .FirstOrDefault();

         if (group == null)
             throw new DataException("Invalid Group Id", null);

         return group;
     }
     catch (Exception ex)
     {

         bool rethrow = ExceptionPolicy.HandleException(ex, "Data Policy");
         if (rethrow)
         {
             throw;
         }
         return null;
     }
 }

On the face of it, except for the repeated stupid error handling, there doesn’t seems to be something wrong here, right?

Take note for the different parameter casing on the GetGroup, though. Why is KobeEntities PascalCase? Well, that is because there is also a property called Context on the GroupRepository that you might use by accident. So, what is this Context parameter all about? GetGroup is a private method, who is calling it?

Here is one such callsite:

 public void AddGroupInterests(Guid groupId, string[] interests, Guid userId)
 {
     try
     {
         KobeEntities _context = Context;

         Group group = GetGroup(groupId, _context);
         User user = GetUser(userId, _context);

So, we take the Context property, put it in a _context local variable. Then we pass it to GetGroup, which uses it.

I must say that I am at a loss to understand what was going on in the mind of whoever wrote it. Was he trying to optimize the number of time we access the property?

As I said, I am currently writing a few other posts about Kobe, this was just to get to see the code itself, so we have an impression about its quality.

I am… not impressed.

NHibernate Mapping – Named queries and

Queries are business logic, as such, they can be pretty complex, and they also tend to be pretty perf sensitive. As such, you usually want to have a good control over any complex queries. You can do that by extracting your queries to the mapping, so they don’t reside, hardcoded, in the code:

<query name="PeopleByName">
	from Person p
	where p.Name like :name
</query>

And you can execute it with:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	session.GetNamedQuery("PeopleByName")
		.SetParameter("name", "ayende")
		.List();
	tx.Commit();
}

PeopleByName is a pretty standard query, and executing this code will result in:

image

Now, let us say that we discovered some performance problem in this query, and we want to optimize it. But the optimization is beyond what we can do with HQL, we have to drop to a database specific SQL for that. Well, that is not a problem, <sql-query/> is coming to the rescue.

All you need is to replace the query above with:

<sql-query name="PeopleByName">
	<return alias="person"
					class="Person"/>
	SELECT {person.*}
	FROM People {person} WITH(nolock)
	WHERE {person}.Name LIKE :name
</sql-query>

And you are set. You don’t need to make any changes to the code, but the resulting SQL would be:

image

Fun, isn’t it?

NHibernate mapping -

I, like many, have grown used to NHibernate’s schema generation capabilities. Those make working with databases such a pleasure that I cannot imagine trying without them.

However, at some point, even NHibernate’s smarts reach an end, and such an occasion requires the use of direct SQL to manipulate the database directly. A good example of that would be:

<!-- SQL Server need this index -->
<database-object>
	<create>
	CREATE INDEX PeopleByCityAndLastName ...
	</create>
	<drop>
	DROP INDEX PeopleByCityAndLastName 
	</drop>
	<dialect-scope name="NHibernate.Dialect.MsSql2000Dialect"/>
	<dialect-scope name="NHibernate.Dialect.MsSql2005Dialect"/>
	<dialect-scope name="NHibernate.Dialect.MsSql2008Dialect"/>
</database-object>

<!-- Oracle need this stats only -->
<database-object>
	<create>
	CREATE STATISTICS PeopleByCityAndLastName ...
	</create>
	<drop>
	DROP STATISTICS PeopleByCityAndLastName 
	</drop>
	<dialect-scope name="NHibernate.Dialect.OracleDialect"/>
	<dialect-scope name="NHibernate.Dialect.Oracle9Dialect"/>
</database-object>

As you can see, this allows us to execute database specific SQL, using the dialect scope. It is not a common feature, but it can be incredibly useful.

NHibernate Mapping - Concurrency

NHibernate has several concurrency models that you can use:

  • None
  • Optimistic
    • Dirty
    • All
  • Versioned
    • Numeric
    • Timestamp
    • DB timestamp
  • Pessimistic

We will explore each of those in turn.

None basically means that we fall back to the transaction semantics that we use in the database. The database may throw us out, but aside from that, we don’t really care much about things.

Optimistic is more interesting. It basically states that if we detect a change in the entity, we cannot update it. Let us see a simple example of using optimistic dirty checking for changed fields only:

<class name="Person"
			 optimistic-lock="dirty"
			 dynamic-update="true"
			 table="People">

Using this with this code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = session.Get<Person>(1);
	person.Name = "other";
	tx.Commit();
}

Will result in:

image

Note that we have so specify dynamic-update to true. This is required because doing so will generally cause much greater number of query plan to exist in the database cache.

Setting optimistic-lock to all would result in:

image

If the update fails because the row was updated, we will get a StaleObjectException. Like all exceptions, this will make the session ineligible for use, and you would have to create a new session to handle it.

Usually a better strategy is to use an explicit version column. We can do it by specifying <version/>:

<version name="Version" column="Version"/>

And that would result in:

image

As you can probably guess, if the version doesn’t match, we will get StaleObjectException.

Instead of using numeric values, we can use a timestamp:

<version name="Version" column="Version" type="timestamp"/>

In this case, the property type should be DateTime, and the resulting SQL would be:

image

This is, of course, a less safe way of doing things, and I recommend that you would use a numeric value instead.

Another option is to use the database facilities to handle that. in MS SQL Server, this is the TimeStamp column, which is a 8 byte binary that is changed any time that the row is updated.

We do this by changing the type of the Version property to byte array, and changing the mapping to:

<version name="Version"
				 generated="always"
				 unsaved-value="null"
				 type="BinaryBlob">
	<column name="Version"
					not-null="false"
					sql-type="timestamp"/>
</version>

Executing the code listed above will result in:

image

We use the value of the timestamp to ensure that we aren’t overwriting the row data after it was changed. The database will ensure that the row timestamp will change whenever the row itself is updated. This plays well with system where you may need to update the underlying tables outside of NHibernate.

Pessimistic concurrency is also expose with NHibernate, by using the overloads that takes a LockMode. This is done in a database independent way, using each database facilities and syntax.

For example, let us example the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = session.Get<Person>(1,LockMode.Upgrade);
	person.Name = "other";
	tx.Commit();
}

This will result in the following SQL:

image

We can also issue a separate command to the database to obtain a lock on the row representing the entity:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = session.Get<Person>(1);
	session.Lock(person, LockMode.Upgrade);
	person.Name = "other";
	tx.Commit();
}

The Get() would generate a standard select, without the locks, but the Lock() method would generate the following SQL:

image

The behavior for conflict in this case is very simple, we wait. If we wait for too long, the timeout will expire and we will get a timeout exception, because we could not obtain the lock.

That is consistent with how we would use pessimistic concurrency elsewhere.

Haven’t we been tortured enough?!

This has nothing to do with technology. It has to do with books. In particular the Wheel of Time books.

Go and read this announcement. I want to cry!

For crying out loud, I have been reading this series of book for the last decade. I spent most of my high school re-reading the books, and I consider them a big reason for why I able to understand English at the level that I want.

Hell, I own several copies of some of the books, but for crying out loud, another three books? And ones that would be basically cut in the middle?

Gnashing of teeth describe my current status quite nicely.

NHibernate Mapping -

And now it is time to go to the <set/> and explore it. Most of the collections in NHibernate follow much the same rules, so I am not going to go over them in details:

<set
    name="propertyName"                                         (1)
    table="table_name"                                          (2)
    schema="schema_name"                                        (3)
    lazy="true|false"                                           (4)
    inverse="true|false"                                        (5)
    cascade="all|none|save-update|delete|all-delete-orphan"     (6)
    sort="unsorted|natural|comparatorClass"                     (7)
    order-by="column_name asc|desc"                             (8)
    where="arbitrary sql where condition"                       (9)
    fetch="select|join|subselect"                               (10)
    batch-size="N"                                              (11)
    access="field|property|ClassName"                           (12)
    optimistic-lock="true|false"                                (13)
    outer-join="auto|true|false"                                (14)
>

    <key .... />
    <one-to-many .... />
</set>

1) is the collection property name, just like <property/> or <many-to-one/> are the value property names.

2) table is obviously the table name in which the values for this association exists.

3) schema is the schema in which that table lives.

4) lazy controls whatever this collection will be lazy loaded or not. By default it is set to true. Let us see how this work:

<set name="Posts" table="Posts">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

With the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.Get<Blog>(1);
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

This produces the following statements:

image

image

We need two select statements to load the data.

However, if we change the set definition to:

<set name="Posts" table="Posts" lazy="false">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

We… would get the exact same output. Why is that?

The answer is quite simple, lazy only control whatever the collection will be loaded lazily or not. It does not control how we load it. The default is to use a second select for that, because that tend to be more efficient in the general case, since this avoid the possibility of a Cartesian product. There are other options, of course.

If we just set lazy to false, it means that when we load the entity, we load the collection. The reason that we see the same output from SQL perspective is that we don’t have a time perspective of that. With lazy set to true, the collection will only be loaded in the foreach. With lazy set to true, the collection will be loaded on the Get call.

You are probably interested in outer-join, which we can set to true, which will give us:

<set name="Posts" table="Posts" outer-join="true">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And would result in the following SQL:

image

Here we get both the blog and its posts in a single query to the server.

The reason that lazy is somewhat complicated is that there are quite a bit of options to select from when choosing the fetching strategy for the collection, and in general, it is suggested that you would not set this in the mapping, because that is usually too generic. It is preferred to control this at a higher level, when you are actually making use of the entities.

5) inverse is something that I talk about extensively here, so I’ll not repeat this.

6) cascade is also something that I already talked about

7) sort gives you a way to sort the values in the collection, by providing a comparator. Note that this is done in memory, not in the database. The advantage is that it will keep thing sorted even for values that you add to the collection in memory.

8) order-by gives you the ability to sort the values directly from the database.

Note that both 7 & 8 does not work with generic sets and that in general, you don’t want to rely on those ordering properties, you want to use the natural properties of the selected collection. Sets are, by definition, unordered set of unique elements. But generic sorted bags does work:

<bag name="Posts" table="Posts" order-by="Title ASC">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</bag>

And would produce in the following SQL:

image 

9) where allow us to use some arbitrary SQL expression to limit the values in the collection. Usually this is used to filter out things like logically deleted rows. Here is a silly example:

<set name="Posts" table="Posts"
		  where="(len(Title) > 6)">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

Which would result in:

image

Note that there is important subtlety here, if you intend to use this collection with eager loading, you must make sure that your where clause can handle null values appropriately (in the case of an outer join).

10) fetch controls how we get the values from the database. There are three values, select, join and subselect. The default is select, and you are already familiar with it. Setting it to join would result in:

<set name="Posts" table="Posts" fetch="join">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(5)
		.List<Blog>()[0];
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Will give us:

image

Setting it to subselect will show something quite a bit more interesting:

image

We have two queries, the first to load the blogs, and the second one:

image

In this case, we load all the related posts using a subselect. This is probably one of the more efficient ways of doing this. We load all the posts for all the blogs in a single query. That assumes, of course, that we actually want to use all those posts. In the code seen above, this is actually a waste, since we only ever access the first blog Posts collection.

11) batch-size is another way of controlling how we load data from the database. It is similar to fetch, but it gives us more control. Let us see how it actually work in action before we discuss it.

<set name="Posts" table="Posts" batch-size="5">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And this code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blogs = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(30)
		.List<Blog>();
	foreach (var post in blogs.SelectMany(x=>x.Posts))
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Produces:

image

Fist we load the blogs, and we load 30 of them. Now, when we access any of the unloaded collections, something very interesting is going to happen. NHibernate is going to search for up to batch-size unloaded collections of the same type and try to load them all in a single query. The idea is that we take a SELECT N+1 situation and turn that into a SELECT N/batch-size + 1 situation.

In this case, it will turn a 31 queries situation into a 7 queries situation. And we can increase the batch size a bit to reduce this even further. As usual, we have to balance the difference between local and global optimizations. If we make batch-size too large, we load too much data, if we make it too small, we still have too many queries.

This is one of the reasons that I consider those fancy options important, but not as important as setting the fetching strategy for each scenario independently. That is usually a much better strategy overall.

12) access was already discussed elsewhere.

13) optimistic-lock was already discussed elsewhere.

14) outer-join was discussed above, when we talked about lazy.

NHibernate Mapping -

Like the <component/> mapping, <dynamic-component/> allows us to treat parts of the entity table in a special way. In this case, it allow us to push properties from the mapping into a dictionary, instead of having to have the entity have properties for it.

This is very useful when we need to build dynamically extended entities, where the client can add columns on the fly.

Let us take this entity as an example:

image

And this table:

image

Where we want to have the SSN accessible from our entity, but without modifying its structure. We can do this using <dynamic-component/>:

<class name="Person"
		table="People">

	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="Name" />

	<dynamic-component name="Attributes">
		<property name="SSN"
			type="System.String"/>
	</dynamic-component>
</class>

And the query just treat this as yet another column in the table:

image

NHibernate Mapping – Inheritance

I wanted to explore a few options regarding the way we can map inheritance using NHibernate. Here is the model that we are going to use:image

And the code that we are going to execute:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	session.CreateCriteria(typeof(Party)).List();
	session.CreateCriteria(typeof(Company)).List();
	session.CreateCriteria(typeof(Person)).List();
	tx.Commit();
}

From now on we are going to simply play with the mapping options to see what we can come up with. We will start with a very simple discriminator based mapping (table per hierarchy):

<class name="Party"
			 abstract="true"
			 table="Parties">
	<id name="Id">
		<generator class="identity"/>
	</id>
	<discriminator column="Discriminator"
			not-null="true"
			type="System.String"/>

	<subclass
		name="Person"
		discriminator-value="Person">
		<property name="FirstName"/>
	</subclass>

	<subclass
		name="Company"
		discriminator-value="Company">
		<property name="CompanyName"/>
	</subclass>
</class>

Which result in the following table structure:

image

And the SQL that was generated is:

Select Party

image

Select Company

image

Select Person

image

But that is just one option. Let us see what happen if we try the table per concrete class option:

<class name="Person"
	table="People">
	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="FirstName"/>
</class>

<class name="Company"
	table="Companies">
	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="CompanyName"/>
</class>

Which result in the following table structure:

image

And the following queries:

Select Party

image

image

No, that is not a mistake, we issue two SQL queries to load all possible parties.

Select Company

image

Select Person

image

The inheritance strategy is table per subclass:

<class name="Party"
		abstract="true"
		table="Parties">
	<id name="Id">
		<generator class="identity"/>
	</id>

	<joined-subclass
		table="People"
		name="Person">
		<key column="PartyId"/>
		<property name="FirstName"/>
	</joined-subclass>

	<joined-subclass
		table="Companies"
		name="Company">
		<key column="PartyId"/>
		<property name="CompanyName"/>
	</joined-subclass>
</class>

Which result in the following table structure:

image

And the queries:

Select Party

image

This is slightly tricky, basically, we get the class based on whatever we have a row in the appropriate table.

Select Company

image

Select Person

image

The final option is using unioned subclasses, which looks like this:

 

<class name="Party"
		abstract="true"
		table="Parties">
	<id name="Id">
		<generator class="hilo"/>
	</id>

	<union-subclass
		table="People"
		name="Person">
		<property name="FirstName"/>
	</union-subclass>

	<union-subclass
		table="Companies"
		name="Company">
		<property name="CompanyName"/>
	</union-subclass>
</class>

Note that it is not possible to use identity with union-subclasses, so I switched to hilo, which is generally much more recommended anyway.

The table structure is similar to what we have seen before:

image

But the querying is drastically different:

Select Party

image

Select Company

image

Select Person

image

The benefit over standard table per concrete class is that in this scenario, we can query over the entire hierarchy in a single query, rather than having to issue separate query per class.

An allegory of an optimization story

I got a call today from a team mate about a piece of software that we wrote. We initially planned it to work with data sizes of hundreds to thousands of records, and considering that we wrote that one in a day, I considered it a great success. We didn’t pay one bit of attention to optimization, but the perf was great for what we needed.

However, a new requirement came up which require us to handle hundred thousand records, and our software was working… well under the new constraints. As a matter of fact, it would take about 0.4 seconds to compute a result when the data size was hundred thousand records. That was probably well within acceptable range, but it had me worried, because 0.4 seconds may be acceptable for this single computation, but it was a CPU bound computation, and it was probably going to kill us when we start hammering the server with multiple requests for that.

Moreover, adding 0.4 to the clients of the system would add an unacceptable delay. So we sat down with dotTrace and tried to figure out what we were doing that could be better.

We spent several hours on that. At some point, a simple extraction of a variable from a loop reduced 7% of the total execution time, but we weren’t really interesting in the small fish. We soon identified a O(N) operation in the code, which was taking about 20% of the total time of the calculation, and focused on that.

By changing the way that we worked with this, we changed this from an O(N) operation to O(1). We did it by introducing indexes and doing intersects of those indexes. The usual choice of time vs. memory has saved us again.

Current time for computing a result? 20 – 50 ms, and that include network time.

I think that I’ll find that acceptable.