Ayende @ Rahien

It's a girl

NHibernate Mapping -

And now it is time to go to the <set/> and explore it. Most of the collections in NHibernate follow much the same rules, so I am not going to go over them in details:

<set
    name="propertyName"                                         (1)
    table="table_name"                                          (2)
    schema="schema_name"                                        (3)
    lazy="true|false"                                           (4)
    inverse="true|false"                                        (5)
    cascade="all|none|save-update|delete|all-delete-orphan"     (6)
    sort="unsorted|natural|comparatorClass"                     (7)
    order-by="column_name asc|desc"                             (8)
    where="arbitrary sql where condition"                       (9)
    fetch="select|join|subselect"                               (10)
    batch-size="N"                                              (11)
    access="field|property|ClassName"                           (12)
    optimistic-lock="true|false"                                (13)
    outer-join="auto|true|false"                                (14)
>

    <key .... />
    <one-to-many .... />
</set>

1) is the collection property name, just like <property/> or <many-to-one/> are the value property names.

2) table is obviously the table name in which the values for this association exists.

3) schema is the schema in which that table lives.

4) lazy controls whatever this collection will be lazy loaded or not. By default it is set to true. Let us see how this work:

<set name="Posts" table="Posts">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

With the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.Get<Blog>(1);
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

This produces the following statements:

image

image

We need two select statements to load the data.

However, if we change the set definition to:

<set name="Posts" table="Posts" lazy="false">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

We… would get the exact same output. Why is that?

The answer is quite simple, lazy only control whatever the collection will be loaded lazily or not. It does not control how we load it. The default is to use a second select for that, because that tend to be more efficient in the general case, since this avoid the possibility of a Cartesian product. There are other options, of course.

If we just set lazy to false, it means that when we load the entity, we load the collection. The reason that we see the same output from SQL perspective is that we don’t have a time perspective of that. With lazy set to true, the collection will only be loaded in the foreach. With lazy set to true, the collection will be loaded on the Get call.

You are probably interested in outer-join, which we can set to true, which will give us:

<set name="Posts" table="Posts" outer-join="true">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And would result in the following SQL:

image

Here we get both the blog and its posts in a single query to the server.

The reason that lazy is somewhat complicated is that there are quite a bit of options to select from when choosing the fetching strategy for the collection, and in general, it is suggested that you would not set this in the mapping, because that is usually too generic. It is preferred to control this at a higher level, when you are actually making use of the entities.

5) inverse is something that I talk about extensively here, so I’ll not repeat this.

6) cascade is also something that I already talked about

7) sort gives you a way to sort the values in the collection, by providing a comparator. Note that this is done in memory, not in the database. The advantage is that it will keep thing sorted even for values that you add to the collection in memory.

8) order-by gives you the ability to sort the values directly from the database.

Note that both 7 & 8 does not work with generic sets and that in general, you don’t want to rely on those ordering properties, you want to use the natural properties of the selected collection. Sets are, by definition, unordered set of unique elements. But generic sorted bags does work:

<bag name="Posts" table="Posts" order-by="Title ASC">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</bag>

And would produce in the following SQL:

image 

9) where allow us to use some arbitrary SQL expression to limit the values in the collection. Usually this is used to filter out things like logically deleted rows. Here is a silly example:

<set name="Posts" table="Posts"
		  where="(len(Title) > 6)">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

Which would result in:

image

Note that there is important subtlety here, if you intend to use this collection with eager loading, you must make sure that your where clause can handle null values appropriately (in the case of an outer join).

10) fetch controls how we get the values from the database. There are three values, select, join and subselect. The default is select, and you are already familiar with it. Setting it to join would result in:

<set name="Posts" table="Posts" fetch="join">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(5)
		.List<Blog>()[0];
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Will give us:

image

Setting it to subselect will show something quite a bit more interesting:

image

We have two queries, the first to load the blogs, and the second one:

image

In this case, we load all the related posts using a subselect. This is probably one of the more efficient ways of doing this. We load all the posts for all the blogs in a single query. That assumes, of course, that we actually want to use all those posts. In the code seen above, this is actually a waste, since we only ever access the first blog Posts collection.

11) batch-size is another way of controlling how we load data from the database. It is similar to fetch, but it gives us more control. Let us see how it actually work in action before we discuss it.

<set name="Posts" table="Posts" batch-size="5">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And this code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blogs = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(30)
		.List<Blog>();
	foreach (var post in blogs.SelectMany(x=>x.Posts))
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Produces:

image

Fist we load the blogs, and we load 30 of them. Now, when we access any of the unloaded collections, something very interesting is going to happen. NHibernate is going to search for up to batch-size unloaded collections of the same type and try to load them all in a single query. The idea is that we take a SELECT N+1 situation and turn that into a SELECT N/batch-size + 1 situation.

In this case, it will turn a 31 queries situation into a 7 queries situation. And we can increase the batch size a bit to reduce this even further. As usual, we have to balance the difference between local and global optimizations. If we make batch-size too large, we load too much data, if we make it too small, we still have too many queries.

This is one of the reasons that I consider those fancy options important, but not as important as setting the fetching strategy for each scenario independently. That is usually a much better strategy overall.

12) access was already discussed elsewhere.

13) optimistic-lock was already discussed elsewhere.

14) outer-join was discussed above, when we talked about lazy.

Comments

Toloma&#252;s
04/13/2009 08:50 PM by
Tolomaüs

"

If we just set lazy to true, it means that when we load the entity, we load the collection. The reason that we see the same output from SQL perspective is that we don’t have a time perspective of that. With lazy set to false, the collection will only be loaded in the foreach. With lazy set to true, the collection will be loaded on the Get call.

"

Isn't it the other way around?

Otherwise an excellent read again so far!

Ayende Rahien
04/13/2009 08:54 PM by
Ayende Rahien

Tolomaüs,

Yes, thank you. Fixed.

Toloma&#252;s
04/13/2009 09:01 PM by
Tolomaüs

I agree on your subselect comment. This is my default fetch value for one-to-manies.

I have noted however that even when setting lazy to false, the subselect is not immediately executed. Instead I have to do an initialize on one of the subcollections after loading the initial collection.

Steve
04/13/2009 09:43 PM by
Steve

Let me try to understand this better, as I must admit, the options can be confusing :)

I tend to have scenarios where I need to do as your example - get a parent and then loop through the child collections - ie. let's say you want to display a list of those posts in some sort of DataGrid.

Would subselect be the way to go then ?

Ayende Rahien
04/13/2009 09:46 PM by
Ayende Rahien

Steve,

No way to tell without knowing more about your scenario.

If this is a parent - children scenario, probably you want to eager load the children.

I suggest doing this locally and not in the mapping, though.

Steve
04/13/2009 09:50 PM by
Steve

ok gotcha

My scenario would be: I have an average of 5-10 'posts' for each 'blog'.

I tend to always need to be doing something with posts - so I decide I want to pull all the posts for each blog anytime I get a blog.

Currently I'm doing what I think you are suggesting - I'm using ICriteria and using a fetch join on posts.

huey
04/14/2009 12:22 PM by
huey

I'm a bit confused with outer-join and fetch. I don't think I've ever used outer-join in my mappings but have done fetch=join. Does it have to do with lazy? Eg:

"lazy=false, fetch=join" == "outer-join=true" ?

Also:

"Note that there is important subtlety here, if you intend to use this collection with eager loading, you must make sure that your where clause can handle null values appropriately (in the case of an outer join)."

Not to be dense, but I'm not 100% sure I follow this :(


Is there an opinion on having multiple collections for the same relationship but with different where clauses versus using selection criteria in your repository? Like if I have a customer with orders, and I have say 3 order 'sets' one for completed orders, one for unprocessed orders, and one for in process orders. I hadn't really thought about using where clause and defining the one-to-many relationship three times, but it seems sort of cool. I can just ask for the customer and then go over the sets without having to ask my repository to do anything. Is this a bad idea, or pretty common and I'm just realizing it :)

Frank Quednau
04/14/2009 02:47 PM by
Frank Quednau

@huey

In my current test, the statement being generated with lazy=false, fetch=join contains a left outer join. Such a join may produce DB Nulls for child items, in this example it will return Blogs with no Posts. here all Columns to Post will be filled with Null.

Frank Quednau
04/14/2009 02:57 PM by
Frank Quednau

You rightfully note that joins generate a cartesian product. What it also does for me is creating an instance of "parent" for every "child" returned to it (which, however, each contain the correct "childs") . The inadequacies of cartesian products aside, is there any way to avoid this or would it be my job to reduce the list to those entities not equal to each other?

On a "minor" note: Combining fetch mode "subselect" with pagination (set maxresults, etc.) against an Oracle DB results in a broken SQL statement because some statement does not get its parameter that checks whether rownum stays below the max result value.

Jan Willem
04/15/2009 10:41 AM by
Jan Willem

Thanks for the posts.

It should be noted that the fetching strategy is ignored when using HQL. That is not obvious (at least for me it wasn't).

Ayende Rahien
04/15/2009 10:48 AM by
Ayende Rahien

Jan,

That is explicitly mentioned in the CreateQuery docs.

Noam Gal
04/16/2009 06:25 AM by
Noam Gal

I am also confused regarding the differences between

"lazy=false, fetch=join" and "outer-join=true", like Huey wrote. Is there some major difference between the two options (apart from the count select into the formula there)?

I vaguely remember from Hibernate that one option was getting obselete. Is this the case here as well? Is one just here for backwards compatibility?

Ayende Rahien
04/21/2009 07:32 AM by
Ayende Rahien

huey & Noam,

outer-join is deprecated in Hibernate, IIRC, probably should be in NH as well.

huey,

regarding the where clause, since we can use this in a join, it is possible that the value (even if it is non nullable) be null, and we should take that into account.

Comments have been closed on this topic.