AnswerThe lazy loaded inheritance many to one association OR/M conundrum
Update: It appears that I am wrong, and NHibernate can support this functionality by eagerly loading the association at load time. You can do by specifying lazy="false" (and optionally, outer-join="true") on the many to one association.
Yesterday I presented an interesting problem that pop up with any OR/M that supports inheritance and lazy loading.
Let us say that we have the following entity model:
Backed by the following data model:
As you can see, we map the Animal hierarchy to the Animals table, and we have a polymorphic association between Animal Lover and his/her animal. Where does the problem starts?
Well, let us say that we want to load the animal lover. We do that using the following SQL:
SELECT Name, Animal, Id FROM AnimalLover WHERE Id = 1 /* @p0 */
And now we have an animal lover instance:
var animalLover = GetAnimalLoverById(1); var isDog = animalLover.Animal is Dog; var isCat = animalLover.Animal is Cat;
Can you guess what would be the result of this code?
The answer is that both isDog nor isCat would be… false.
But how is that?
To answer that question, let us take a look at the SQL that was used to load the animal lover, and let us take a look at a typical example of hydrating entities. I am using Davy’s DAL here to show off the problem, because the code is simple and it demonstrate that the problem is not unique to a particular OR/M, but is shared among all of them (Davy’s DAL doesn’t even support inheritance, for example).
private void SetReferenceProperties<TEntity>( TableInfo tableInfo, TEntity entity, IDictionary<string, object> values) { foreach (var referenceInfo in tableInfo.References) { if (referenceInfo.PropertyInfo.CanWrite == false) continue; object foreignKeyValue = values[referenceInfo.Name]; if (foreignKeyValue is DBNull) { referenceInfo.PropertyInfo.SetValue(entity, null, null); continue; } var referencedEntity = sessionLevelCache.TryToFind( referenceInfo.ReferenceType, foreignKeyValue); if(referencedEntity == null) referencedEntity = CreateProxy(tableInfo, referenceInfo, foreignKeyValue); referenceInfo.PropertyInfo.SetValue(entity, referencedEntity, null); } }
Take a look at what the code is doing, we are currently processing the Aminal property on the AnimalLover class. And we try to find an Animal that was loaded with a primary key matching to the value of the Animal column in the AnimalLovers table.
When we can’t find it, we have to create a lazy loading proxy for the referenced entity. And here is where the conundrum kicks into play. When we have inheritance, we have a real problem here. What is the type of the referenced entity?
From the model, we know that it must be a derivation of Animal of some sort, and we have its PK, but we have no way of knowing which without going to the database for it.
So what are we going to do? Because we don’t have enough information to create a lazy loading proxy of the appropriate type, we actually generate a lazy loading proxy of the type that we do know about, Animal.
But what about when it is being loaded?
Well, that is where the lack of #become in .NET becomes painful, we already have an instance, and we can’t change its types. And we can’t replace the reference on the AnimalLover because someone might have grab a reference to the animal before the lazy load.
The way to handle it is by turning the lazy loading proxy into a real one. We load a new instance that represent the entity, now with the correct type, since we query the DB to find out what it is (along with the rest of the entity’s data).
And the lazy loading proxy that we originally used is now loaded, and any call made on it will be forwarded to the new instance that was loaded.
animalLover.Animal stays an AnimalProxy, and cannot be cast to a Dog or a Cat, even if the actual row it is pointing to is a Dog or a Cat.
More posts in "Answer" series:
- (05 Jan 2023) what does this code print?
- (15 Dec 2022) What does this code print?
- (07 Apr 2022) Why is this code broken?
- (20 Jan 2017) What does this code do?
- (16 Aug 2011) Modifying execution approaches
- (30 Apr 2011) Stopping the leaks
- (24 Dec 2010) This code should never hit production
- (21 Dec 2010) Your own ThreadLocal
- (11 Feb 2010) Debugging a resource leak
- (03 Sep 2009) The lazy loaded inheritance many to one association OR/M conundrum
- (04 Sep 2008) Don't stop with the first DSL abstraction
- (12 Jun 2008) How many tests?
Comments
Your "the answer" sentence confuses me. If the proxy only knows being an Animal, then checking "is Dog" or "is Cat" would both return false. In which case you should write that "both isDog and isCat would be… false". Your sentence implies that both return true.
I suppose that the only way to have the true specializations is to say lazy=false in this case?
That is a mistake on my part, got the double negative confused.
And yes, if you want to avoid this behavior, you can simply specify lazy=false
Few days ago I left a comment related to similar possible mistake in his blog: davybrion.com/.../#comment-22337
Check out Davy's answer ;)
Alex,
This isn't a mistake, this is by design feature
I understood this. I mentioned this here to show this is actually bad (such a design, I mean), since your faced the issue almost immediately.
Btw, how NH handles this? I'll describe in my blog how we deal with this issue.
Why not just lazy load the Animal object? You'll have to hit the database to load it when it's first accessed so just create the right proxy when it's first accessed.
This is not cheap. Although you're right: if there is no choice, this is the only way. I just described how DO4 handles this: blog.dataobjects.net/.../...ties-with-unknown.html
Alex,
NH handles this in the usual manner.
We give the user this behavior as default, because it is the safest.
If the user wants to override it, and accept the perf hit associated with it, they specify lazy=false and it goes away
P,
Look at the generated SQL, we want to avoid hitting the other table at load time for the animal lover instance.
This is a very important optimization because otherwise you very quickly have 17 ways joins.
If you “Program to an interface and not to an implementation” (like fabiomaulo.blogspot.com/.../...han-few-is-gof.html), you can avoid this problem?
andres
Yes
Err... NH will return wrong type of entity by default? Why don't you e.g. throw an exception, if this happens (that's what really safe)? In fact, returning wrong object type seems not safe at all...
I'm curious, what will happen then, if user will modify this object?
Or what happens, if reference type is an abstract one?
Alex,
NH doesn't return the wrong type.
It just works,
That's actually the worst issue with such LL (prefetch)... You really can't join any table from the hierarchy.
On the other hand, if there is a type discriminator, it's enough to join the root & get its value.
So what will really happen, if type is unknown? I see just few options:
Return base type (= return wrong type)
Throw an exception ("Can't materialize the entity because its type isn't known")
Transparently fetch its type & materialize it properly. Slow, but safe.
NH isn't limited to discriminator inheritance
And even with discriminator, unless you put the type in the parent table, you have to join
Return the base type, which isn't wrong, when type is loaded, forward all calls to the actual instance.
"Forward all calls" = "forward by proxy" or "start returning new entity by key resolver"?
Concerning "this isn't wrong" - well, it's clear that "by design" can't be "wrong" in any framework ;)
Btw, I didn't say "this is wrong" - I said "return wrong type", which is correct: the type you will get is really wrong ;)
"...that pop up with any OR/M that supports inheritance and lazy loading..." "...it demonstrate that the problem is not unique to a particular OR/M, but is shared among all of them..."
As Roger pointed out yesterday. This is rather misleading since there are solutions in other OR/Ms. Namely, loading the data and type when property is accessed instead of using a lazy proxy.
You may not agree with the performance rationale but it is a valid alternative paradigm which also solves other issues as well.
I don't see any hinderance in the architecture that would prevent Hibernate to optionally support such that in the future as well.
Yes... I'd simply fix this...
It is interesting whether it is possible to do unsafe memory-patching to achieve this (basically rewrite the base object with an inherited one without modifying the pointer).
Sebastian,
Look at the SQL that I put in the post, I was very specific.
And if you don't want this feature, all you need to do is specify lazy=false and avoid it
Alex,
It isn't the wrong type, it is the type specified in the mapping and matching the DB configuration.
Also, please be aware that based on our previous discussion I am aware that this can be a long thread, and I am not going to invest any time in this
Andrey,
Not very likely
alex and sebastian: the nhibernate way can seem strange but if you will works on real application (real with the mean of not stupid example) you will discover that the nhibernate way it's the most useful, with it you can built a graph of objects with zero trip to the DB.
if somebody uses lazy load on an object is becouse in that case
is foundamental to reduce the trip to the DB, thing that NH optimize, if reducing the TRIP to the DB is not foundamental you set lazy load = false
Yes, we had zero roundtrips to the DB in NPersist too and still used the correct type on the lazy instance.
How?
We simple sacrificed mapping capabillities instead of performance and correctness.
We had the option to use animal id + animal type in the relation.
And thus we did not have to hit the DB in order to instance the correct object type.
Ayende did specifiy that the animal type was not part of the foreign key so this optimization is not possible under his criterias for the previous blog post so this is not a valid option in this scenario.
However, in every case where you do not create a join, you may get incorrect data back.
e.g. lets say that we use Ayendes first example, and lets also assume that a specific animal ID is missing in the animal table.
NH (and NP) will instance the ghost proxy in the relation since it didn't join the data.
If you then run code like:
bool HasAnimal = lover.Animal != null;
You would not only get the incorrect type back, but you would also get incorrect data..
Well enough from me :-)
I hope you don't take offence for my mindless ramblings Ayende.
I just think it is an extremely interesting subject.
Roger,
Of course I don't take offence :-)
With NHibernate, you can make use of the exact same method by defining the association as [any/]
Any association encode both the type of the association and the association id in the parent table, so you can get the info (and the appropriate type) in a single call.
The problem is that [any/] relations are pretty rare in the DB world
animal is a foreign key, it's the DB that checks about data corruption.
about your question with that option you have 2 column, 1 with the FK and 1 with the .NET type? (i'm not sure to have understand)
Ayende, I see, you mean that it should use that SQL query ONLY.
Marco, I have used this practise a lot in real applications and it's not always as simple as that. For example a reference to a ghost object may not exist: var proxy = session.get(missingID); or many-to-one references that has since been removed etc. There is added value to it but there's also a useful middle ground between lazy false and lazy true.
The premise here is that I need to retrieve the Animal property without actually doing anything with it. Most of the time that I visit the Animal property I will want to do something with it. Then it's ok for me to load it during the "property get".
The exception is when I have things like lover.Animal.ID or lover.Animal.Ancestor.Ancestor.GetDNA();
As for reading the ID field without loading the entity, I tend do that using session.GetIdentifier(entity) in my architectural pattern or by mapping the relationship as a Dictionary where the ID is the key.
If I'm traversing multiple nodes down a graph, for me, this usually means I have a projection coming so I'd use a query instead.
For the exception where I do need lazy graph building I'd happily opt-in, in my mapping.
So I'd love to see an option to map many-to-one properties as "eager getter" or something.
Sorry for hogging the thread. :)
This is definitely impossible:
If you already got a reference, its type can be cached somewhere. And you don't know where.
Moreover, likely there is simply no free space after the object`s area in RAM. If you aware how .NET distributes the memory & how mem. compactions are actually running, it's clear the only case when additional space is located after your object is case when there is some "dead" object. But only GC will be able to recognize this.
Oren, you know, you're right! That's just one more "feature" :)
Me too ;)
Marco,
Yes, that is one option
Sebastian,
NH doesn't do eager getter, if you want that, use the any option
Hi,
We came accros the same issue in our system and came up with a way of solving the issue. Not sure if there is some underlying problem with the solution that we haven't spotted but here goes:
All our domain objects inherit from an object called BaseDomainObject. BaseDomainObject has a property on it called UnProxied and all this does is returns the current object:
public virtual BaseDomainObject UnProxied
{
}
So for any proxied domain object you can just access the UnProxied property and be given the actual object.
Would there be any NHibernate issues with this approach?
If you had mappings for all the subclasses, with the animaltype field being the discriminator value to distinguish between the types - would it not return a proxy of a Cat, Dog, etc instead?
Josh,
Only if it was on the AnimalLover table, otherwise you still don't have enough info to tell when you select just from AnimalLovers
Right and wrong isn't easy here, because:
From an OO perspective, "var isDog = animalLover.Animal is Dog;" is not so good, there are better alternatives like visitors etc. Or you could even do "var isDog = animalLover.Animal.AnimalKind == AnimalKinds.Dog;". Since relying on casting isn't good, there is no need to support it at the cost of performance. (And yes you can do lazy=false in NH)
But from a new user perspective with maybe less OO knowledge, "var isDog = animalLover.Animal is Dog;" is perfectly valid, and he expects it to work. I can understand if with a commercial ORM, users want this to work and the ORM vendor wants to please its customers, and so this gets implemented.
"Look at the generated SQL, we want to avoid hitting the other table at load time for the animal lover instance.
This is a very important optimization because otherwise you very quickly have 17 ways joins."
You don't have to look at the Animal table at all until the Animal property itself is accessed. The AnimalLover class loads as a proxy. The Animal property is lazy loaded so that the first time it's accessed it hits the table. It already needs to do to support lazy loading so there's no extra hits on the database, yet it actually performs as developers expect.
P,
That doesn't works if the AnimalLover itself is non lazy, though.
Right...but so what? Lazy loading is the default behavior in most ORMs references. Since AnimalLover is returned from the persistence layer it's easy to return a proxy that overrides the Animal property and loads it as needed.
This just doesn't seem like that complicated a problem.
The only time it becomes an issue is when you're querying a bunch of AnimalLovers at the same time that each love a different kind of animal.
P,
NHibernate doesn't support just a single use case, and there are reasons to want non lazy entities.
Your remark about 'it's not specific for any O/RM' is wrong, ours does this properly (it returns the proper entity with the right type, not a proxy). The only o/r mappers which do this wrong are the ones which use proxies for Lazy loading and/or return proxies.
It's easy to solve really: just call the polymorphic fetch method you already have for fetching polymorphic single entities (e.g. you want animal with ID 3, if it's a cat you return a cat instance etc.) and you're done. THis isn't difficult, actually it's re-using code that's already there. The only change you have to make is that the animallover proxy knows this. But then again, I'm not a NH expert.
However saying this is a thing that all o/r mappers suffer from is misleading, some do the proper thing.
Frans,
Given the class:
create table AnimalLovers ( Id, Name, AnimalId )
create table Animals (Id, Name, Type )
And the following code:
var animalLover = GetById(1);
Console.WriteLine(animalLover.Animal.Id);
Console.WriteLine(animalLover.Animal.Name);
What SQL would be generated and at which line?
I have to add that the article assumes that lazy loading is done, but that's not really done, is it? You return an instance from the property which doesn't contain data. But isn't that odd, as accessing the property will trigger lazy loading, go to the db, load the related instance and return the instance with the right type (at least that would be the proper action).
Frans,
I am sorry, I don't get the last comment. Lazy loading is done here.
at this line: var animalLover = GetById(1);
you'll get the SELECT ... FROM AnimalLover WHERE ID=@id
at this line: Console.WriteLine(animalLover.Animal.Id);
you'll get the SELECT ... FROM Animal WHERE ID = @id -- @id is the 'Animal' value from the Animal lover. This is because you touch the lazy loaded property. This is a decision in the O/R mapper what will happen: some will indeed trigger lazy loading, load the related entity and be done with it, but ones which don't support FK fields, have the FK field 'AnimalLover.Animal' stored in the animalLover.Animal.Id value, and thus have a problem which isn't really solveable. That doesn't mean it's true for all o/r mappers, just the ones which store FK field values that way.
At the line: Console.WriteLine(animalLover.Animal.Name);
you don't get any sql as the instance was already loaded.
It's a decision for the ORM to support FK fields or not, and some will say it's stupid to support them at the entity level, while others say it's wise to do so. 7 years ago I made the decision to support them as it makes things much easier than when you ignore them. That turned out to be a good decision as it solves a lot of problems like the ones you presented here. It also makes things less 'OO', as an FK field is simply a copy of the PK of the related entity, so storing that twice sounds smelly.
I guess, it depends on how you want to work with the entity instances (data) in memory when they're stored inside entity class instances (objects): not supporting FK field values (and thus use the pk of the related entity) makes things more OO, but gives problems, supporting them makes things less OO but solves problems. (and gives some others, like you have to actively dereference entities when fk field values change, something you don't have to do) In the end, I think (but that's a personal perspective I think) it depends on whether you want to abstract away the relational aspect of a database or not. I definitely don't want to do that, as it has advantages as well, and others think differently and think it gets in the way. Both have good/bad points, so it's up to the user whether fk field values belong to the entity they're in or not. Personally, I think they belong to the entity, as they're the only evidence in an entity it is related to another entity, but opinions on that differ :)
then I don't see why you can't return the right typed object. The reason for this is that the discriminator value is in the data you fetch from the DB, so the session can create the right instance based on the discriminator value. So 'is' will work (IMHO, but I'm not an expert on NH's proxy system).
Frans,
Okay, I see how you do it.
I don't like this approach, because of its data focus, but I agree that it is a choice that you can make.
With NH, just accessing the property will not cause a load (see the discussion on the advantages that it gives NH in the previous post comments).
One major implication of that is that we can support a non proxied root classes.
Frans,
The difference is why you are generating the proxy. NH is generating the proxy on hydration of the AnimalLover, you seems to generate it on accessing the Animal property.
There are different data sets that you have at each point.
Yes I agree that there are two approaches and if you pick one you have the advantages of one but the disadvantages as well. Your advantage, that accessing the property in comparisons for example doesn't load the entity is good for some situations (as it avoids unnecessary queries perhaps), but it can be a burden as well in others (as roger pointed out in the other thread).
So 'it depends on the ORM used' if this fails or not. Which is better than 'it is true for any orm' ;)
Would it really be that hard to give people "a choice" about the proxying behavior (admittedly an advanced choice)? Why do you want to keep things "pure"?
I understand both decisions to make the proxying behave like it does. In commercial software it's either a strategic design choice or what the customer asks for. In opensource I think it's purely a strategic choice.
I would argue that it would be preferable to
1) have the AnimalType type discriminator field be part of the primary key of the Animals table
2) Also (therefore) have an AnimalType field in the AnimalLover table, which is also part of the foreign key pointing to the Animals table.
When you have this setup, the problem goes away, since you can know when loading the proxy what the correct type is.
When the type discriminator field is absent from the AnimalLover table, I'd argue that the proper behavior is to join in the type discriminator from the Animals table so that you have the information when you need it - even if lazy loading had been turned on for the AnimalLover.Animal reference property (the rest of the Animals columns should then not be loaded), since this is what has to be done for correctness.
/Mats
Also...with NH's current approach, wouldn't you need an additional round-trip to the database (when the proxy is accessed, fetching the type discriminator) in order to determine if the object is already in the identity map (In which case, is the rest of the data loaded redundantly)? Or can you find the object in the identity map without knowing the type discriminator somehow ?
/Mats
Mats,
No, the identity map is checked for the type first.
but how do you know the type without access to the type discriminator ??
doh, strike that - when the type discriminator is not part of the Animals table PK (the case I tend to assume) then ofc there will be only one object with that id in the type hierarchy!
Comment preview