Ayende @ Rahien

Refunds available at head office

NHibernate, polymorphic associations and ghost objects

image

One of the more interesting points of my posts about Entity Framework & NHibernate is the discovery of things that Entity Framework can do that NHibernate cannot. In fact, if you’ll read the posts, instead of the comments, you can see that this is precisely what I asked, but people didn’t really read the text.

I wanted to dedicate this post to ghost objects, and how NHibernate deals with them.

Before we start, let me explain what ghost objects are. Let us say that you have a many to one polymorphic association, such as the one represented as Comment.Post.

A post may be either a Post or an Article, and since NHibernate by default will lazy load the association, NHibernate will generate a proxy object (also called a ghost object). That, in turn, result in several common issues: Leaking this and the inability to cast to the proper type are the most common ones.

In practice, this is something that you would generally run into when you are violating the Liskov Substitution Principle, so my general recommendation is to just fix your design.

Nevertheless, since the question pop up occasionally, I thought that I might write a bit more details on how to resolve this. Basically, the main issue is that at the point in time where we are loading the Comment entity, we don’t have enough information to know what the actual entity type is. The simplest way to work around this issue is to tell NHibernate to load the associated entity as part of the parent entity load.

In the case of the comment, we can do it like this:

<many-to-one name="Post" 
             lazy="false"
             outer-join="true"
             column="PostId"/>

The lazy=”false” tell NHibernate to load the association eagerly, while the outer-join will add a join to load it in a single query. One thing to note, however, is that (by design) HQL queries will ignore any hints in the mapping, so you would have to specify join fetch explicitly in the mapping, otherwise it would generate a separate query for that.

Since we eagerly load the associated entity, and we know its type, we don’t have to deal with any proxies, and can avoid the ghost objects problem completely.

Comments

Stephen
01/09/2010 02:54 PM by
Stephen

Do you ever wonder if learning all these abstraction leaks (and this isn't specifically a fault with nhibernate but imo that an orm like nhibernate is really trying to do something that cannot be done without leaks).. causes more problems than the orm solves?

I've fallen out with orms recently because I feel that I'm substituting one problem for another, these days I'm using a conceptual model for queries, where the loading requirements are specified by a specific query (vs the relationship type, which imo is important because rarely is a relationships loading manner consistent in my experience).

Perhaps orms come into their own on gigantic solutions where you want to just try and generalize your domain as much as possible to save time..

Chris Wright
01/09/2010 05:22 PM by
Chris Wright

You didn't explicitly state what a ghost object is. It looks like you intended to.

Chris Wright
01/09/2010 05:38 PM by
Chris Wright

HAHA DISREGARD THAT, I CAN'T READ

Szymon Kulec
01/09/2010 06:42 PM by
Szymon Kulec

By fixing the design, you mean creating IPost and IArticle interfaces, implemented respectively by Post and Article classes - creating proxies for them. Does it make sense in terms of your post (or maybe article ;-)).

Ajai Shankar
01/09/2010 07:29 PM by
Ajai Shankar

Also curious as to how to fix the design - is this not proper design?

By bad design are you referring to if the code needs to cast comment.Post to an Article?

Frank
01/09/2010 07:54 PM by
Frank

Lets get an example where you have a base class called Employee and for managers a Manager class, which is derived from Employee. Employees get an annual bonus of a 100 euro and managers get 200 euro.

public class EmployeeService

{

public decimal CalculateAnnualBonus(Employee employee)

{

if (employee is Manager)

  return 200;

return 100;

}

}

The solution above will have a problem with gost classes. It is also an improper design with regards to object orientation. A more correct solution would be:

public class Employee

{

public virtual decimal CalculateAnnualBonus()

{

  return 100;

}

}

public class Manager : Employee

{

public override decimal CalculateAnnualBonus()

{

  return 200;

}

}

This won't have problems with ghost classes, because we placed the behavior where it belongs to be.

Stephen
01/09/2010 09:16 PM by
Stephen

@Frank, Thats very arguable and imo wrong- stood off behavior is expected for extension to systems (and really in general for making systems in components), and so you should expect the (imo better designed) original code to work..

Stephen
01/09/2010 09:24 PM by
Stephen

(just to add, my argument isn't the polymorphism, but how data is handled, and how you this rule makes open/closed impossible to do)

Alex Yakunin
01/09/2010 11:06 PM by
Alex Yakunin

Oren, your imagination of Liskov Substitution principle is wrong: this principle implies that subclasses must behave at least as superclass, so any instance of subclass can be used instead of superclass.

But in described case you might get a proxy of superclass instead of a proxy of subclass, and this isn't related to Liskov's principle at all:

  • This principle does not imply that superclass must behave properly if it the data (fields) it contains is a data of one of instances of subclass. E.g. if I have Shape and Circle, an instance of Shape filled with a part of Circle's data won't be able to calculate its Area anyway (that's the reason for making such methods abstract).

  • Moreover, it does not imply that any code requiring superclass must properly deal with such a "cutout" instance of its proxy . As it was just shown, such code may rely on fact that e.g. Area method always returns shape's area (because in real world its subclasses either implement Area, or they're abstract, and thus there are no instances of these types).

So actually I don't see a relation to Liskov's principle here. To make the example more dramatic, let's imagine we have just one hierarchy of persistent types in our application, base type there is called MyBaseObject; it contains no real data, but shares just Id (e.g. GUID) and e.g. CreateDate\ModifyDate. So following your version of Liskov's principle, the code I write must be fully ready to deal with MyBaseObject proxy (that actually can't do anything, neither even provide any data) instead of actual type everywhere, or, equally, porting the same case to .NET, any .NET code must be ready to accept a dummy Object instance (new object() ) instead of any other everywhere - obviously, that's a huge difference with actual Liskov's principle ( en.wikipedia.org/.../Liskovsubstitutionprinciple ).

Frank
01/09/2010 11:26 PM by
Frank

@Stephen, I have to disagree with you. Checking for types is not better designed code in my opinion.

Steve Py
01/10/2010 12:06 AM by
Steve Py

@Alex : Yeah, I was reading that and I didn't see any link to LSP.

@Ayende : I too would be interested in how the design in that case could be more ORM-friendly without resorting to eager-fetching.

I don't believe it is an issue from the "1" side. Retrieving an article or a post, then accessing it's comments collection should work. It's only an issue if you are sifting through Comments then want to access their associated article or post.

Ayende Rahien
01/10/2010 06:30 AM by
Ayende Rahien

Ajai,

I am refering to code that try the cast, yes.

In general, down casting means that you are violating Liskov

Ayende Rahien
01/10/2010 06:31 AM by
Ayende Rahien

Stephen,

Sorry, I am not following you. How does following Liskov makes the open/closed impossible?

Ayende Rahien
01/10/2010 06:34 AM by
Ayende Rahien

Alex,

Yes, is does. Usually, it goes the other way, a subclass should keep the contract of the superclass, but the direction that I refer to is that if you accept an argument of some type, downcasting it is an indication that you should accept the downcasted version instead.

Imagine the type of code that your logic lead to, a huge SWITCH statement with type checking. That is why we have "replace conditional with polymorphism"

Alex Yakunin
01/10/2010 08:42 AM by
Alex Yakunin

Oren, what you just described is absolutely correct (i.e. unnecessary genericity must be eliminated - e.g. ReSharper automatically highlights such cases), but this has no relation to Liskov substitution principle.

Btw, I also do not understand how this is related to such a behavior of NHibernate: if you mean I must always deal just with an exact type, then this makes usage of inheritance almost useless. I.e. replacing inheritance with aggregation brings nearly the same effect.

And I'm curious, what really happen in this case:

a) Polimorphic query returns Shape proxy instead of Circle proxy because it couldn't recognize the actual instance type

b) But later we execute a query (e.g. against Circle type), that it must return Circle object for the same identifier.

Problems:

1) If the last query completes successfully, now we have two different objects belonging to the same hierarchy and sharing the same key in session. So o1!=o2, but o1.Id==o2.Id and we know their Ids belongs to the same primary key (because they have a common base type declaring the key). Thus we get identity violation.

2) What ISession.Get(typeof(Share), shapeId) will return for such an Id in case part a) is completed? I suspect, Shape proxy, even if actual type there is Circle. But what the same .Get will return after part b) is completed? ISession already knows that actual type of an object with this Id is at least Circle, not Shape. But:

  • If returns new Circle proxy, we loose predicativity: ISession.Get return different results dependently on ISession state (= dependently on history of operations with it). That's especially not good for applications built from loosely coupled modules (i.e imagine parts a) and b) are executed by a code that is external to my own module - there are just read-only operations, but I get different results in my code).

  • If it returns old Shape proxy, it must return it under any query like ISession.Get(typeof(Share), ...) - in particular, if only part b) is executed w/o previous execution of part a), otherwise we loose the predicativity again. But this makes ISession.Get completely non-polymorphic; moreover, ISession must maintain a set of ghost objects for each actual one (one per each its base type) in worst case, that seems completely wrong.

Note that so far I was talking just about read-only scenarios. Read-write scenarios are interesting as well - e.g. what happens if a ghost object (i.e. Shape proxy with the same Id as some Circle proxy) gets modified together with original object, and modifications are conflicting with each other (= only one can be persisted)?

For me even the possibility to see actually the same object under different "angles" and in different states dependently on previous history of interaction with ISession is frightening: the applications we (I mean my company) develop are built of hundreeds of classes (e.g. > 500 in MEScontrol), so such an inpredicativity (= necessity to keep this behavior in mind) makes developer's life much more complex.

Alex Yakunin
01/10/2010 08:59 AM by
Alex Yakunin

That is why we have "replace conditional with polymorphism"

That's the problem: with such a behavior you can't do this successfully - in fact, using a base type is risky (= leads to inpredictable result) in queries and fetch operations (= data access operations), thus all data access patterns in my application must not be polimorphic, if I'd like to avoid a chance of getting a proxy. How they must look like then? Well, a big switch statement, or an equivalent sequence of "if (thisIsGhost) letsTryAnotherWay" is what comes to my mind first. So I can't replace conditional with polimorphism - on contrary, I think how to replace the polimorphism with conditional here.

Btw, as I said, I don't see much reasons for having such a behavior at all: replacing the inheritance with aggregation (Circle.Shape instead of Circle : Shape) leads to the same effect from the point of data access patterns. If you need something like virtual methods here (= polimorphic behavior), you can use interfaces.

alwin
01/10/2010 10:31 AM by
alwin

Would it work with

public decimal CalculateAnnualBonus(Employee employee)

{

if (employee.IsManager) <-- prop that gets overridden by Manager class to return true

{

return 200;

}

return 100;

}

When you access the IsManager property, doesn't the proxy load the real entity and thereby get the correct class?

Stephen
01/10/2010 10:54 AM by
Stephen

@frank, ayende the point wasnt that polymorphism is bad but in an system open for extension you cannot be expected to modify a base class in order to create new behavior.. meaning stood off behavior should work

Ayende Rahien
01/10/2010 11:14 AM by
Ayende Rahien

Alwin,

Yes, it would work. I don't like this sort of code.

Ayende Rahien
01/10/2010 11:15 AM by
Ayende Rahien

Alwin,

Yes, it would work. I don't like this sort of code.

alwin
01/10/2010 11:21 AM by
alwin

Ah ok. Yeah I don't like it either, but is an easy/dirty fix for those having problems with "if (employee is Manager)".

Frank
01/10/2010 01:05 PM by
Frank

@Stephen, I can see your point, but can you give a more concrete example of what you actually mean? I am having a hard time coming up with a good reason why I would have to do it that way.

newbie
01/10/2010 02:34 PM by
newbie

So, EF does not suffer from ghost objects? It is because does not support polymorphism or lazy loading? it is a bug or a feature? someone explain please

Stephen
01/10/2010 02:47 PM by
Stephen

@frank, ayende-

The scenario is, you are adding behavior to another system, or adding behavior that isn't specifically yours, to your system.. for example, 'GetsSpecialHolidayHat'.. do i modify the base class of employee thus changing the system.. or do I have a service that takes an employee and tells me if they get a hat or not..

Polymorphism is fantastic, but in systems like this, just because I'm making decisions based upon type doesn't mean I HAVE to use a polymorphic route, because writing extensions to the system would then require me to modify the original code, vs extend it.

My entire point is that BOTH examples should be valid, and personally I find that when talking about data specific objects, stood off (where the behavior is determined externally to the entity) tends to be a better model for consistency than polymorphism..

Also and this isn't so much of a serious point, but since when do you ask an employee what their bonus is going to be.. I would ask 'someone' what an employees bonus is going to be (much more like example one).

Cherian Thomas
01/14/2010 04:50 PM by
Cherian Thomas

Oren,

You might want to bold "hints in the mapping". Too many people make mistakes in this area and think hbm mapping are the final say..

Comments have been closed on this topic.