Ayende @ Rahien

It's a girl

NH Prof Alerts: Unbounded result set

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

Unbounded result set is perform a query without explicitly limiting the number of returned result (using SetMaxResults() with NHibernate, or using TOP or LIMIT clauses in the SQL). Usually, this means that the application is assuming that a query will only return a few records. That works well in development and testing, but it is a time bomb in production.

The query suddenly starts returning thousands upon thousands of rows and in some cases, it is returning millions of rows. This leads to more load on the database server, the application server and the network. In many cases, it can grind the entire system to a halt, usually ending with the application servers crashing with out of memory errors.

Here is one example of a query that will trigger unbounded result set warning:

session.CreateQuery("from OrderLines lines where lines.Order.Id = :id")
       .SetParameter("id", orderId)
       .List();

If the order have many line items, we are going to load all of them, which is probably not what we intended. A very easy fix for this issue is to add pagination:

session.CreateQuery("from OrderLines lines where lines.Order.Id = :id")
	.SetParameter("id", orderId)
	.SetFirstResult(0)
	.SetMaxResult(25)
	.List();

Now we are assured that we need to only handle a predictable number, and if we need to work with all of them, we can page through the records as needed. But there is another common occurrence of unbounded result set, directly traversing the object graph, as in this example:

var order = session.Get<Order>(orderId);
DoSomethingWithOrderLines(order.OrderLines); 

Here, again, we are loading the entire set (in fact, it is identical to the query we issued before) without regard to how big it is. NHibernate does provide robust handling of this scenario, using filters.

var order = session.Get<Order>(orderId);
var orderLines = session.CreateFilter(order.OrderLines, "")
	.SetFirstResult(0)
	.SetMaxResults(25)
	.List();
DoSomethingWithOrderLines(orderLines);

This allow us to page through a collection very easily, and save us from having to deal with unbounded result sets and their consequences.

Comments

Udi Dahan
12/29/2008 10:03 AM by
Udi Dahan

Yet there are OLTP scenarios (not showing data to a user) where you actually want the full list.

I'm not saying that that code could not benefit, performance-wise, from being rewritten in a more denormalized fashion, or possibly using straight SQL, but that domain models will continue to have regular "parent.children.etc.etc.so_forth".

What is the NH Prof story for OLTP?

Ayende Rahien
12/29/2008 12:33 PM by
Ayende Rahien

Udi,

If you want to read the full list, you need to explicitly acknowledge that, although I think that even when you need the full list, you should try to process that in batches rather than the full list (which may be hundreds of thousands of items).

I am not sure that I understand what you mean about NH Prof story for OLTP

Sjaaky
12/29/2008 02:50 PM by
Sjaaky

Sometimes we need to update all records in one table of de database. We wanted to keep the ISession small, since large ISession's seems to be a performance killer. So I wrote this function:

    public static IEnumerable

<t PagedResults <t(ICriteria criteria, int pagesize)

    {

        int i = 0;

        while (true)

        {

            IList

<t results = criteria

                .SetFirstResult(i)

                .SetMaxResults(pagesize)

                .List

<t();

            if (results == null || results.Count == 0) break;


            foreach (T item in results)

            {

                yield return item;

                i++;

            }


            NHibernateHelper.Session.Current.Flush();

            NHibernateHelper.Session.Current.Clear();

        }

    }

It would be much, much harder for us to do updates without the domainmodel.

Ayende Rahien
12/29/2008 02:54 PM by
Ayende Rahien

Sjaaky,

This is actually something that you would want to use the Stateless Session for.

configurator
12/29/2008 10:33 PM by
configurator

If this is a common scenario, shouldn't there be an overload for CreateFilter(object) that assumes the query is String.Empty?

Steve Wagner
01/02/2009 05:41 PM by
Steve Wagner

Could a limit on all querys not also be a time bomb?

If i assume a table has max 25 record because it currently have only 10. But after 3 years someone which dose not know of this limitation adds 30 i have also an application which dose not work correctly.

Wouldn't it better if i check the resultset count and send warnings to the developers when a border is reached?

Ayende Rahien
01/04/2009 01:34 AM by
Ayende Rahien

Steve,

No, it wouldn't be. Because when you enter paging into the mix, you also start handling things in batches.

Sure, if you blindly slap a limit clause on anything, this is a problem, but you shouldn't do that anyway.,

Ross Beehler
01/05/2009 05:18 PM by
Ross Beehler

Can you list the factors that may lead you to use 'batches' when dealing with a large set of entities you know you will all have to process?

In our specific scenario, I have a little tougher time seeing it, as from a database server perspective, we've noticed that the sum of all "page" queries is more expensive than a single query to get all data. On our app server(s), we're using 64 bit, so running out of memory is less of a concern. Also, there's usually a pretty big pipe between the database and the application server.

Anything else I'm not considering?

Ayende Rahien
01/05/2009 07:57 PM by
Ayende Rahien

The question is what you are trying to optimize.

If you are trying to optimize throughput of a single action, you might be better of with a single large process. Even then, a streaming approach would work much better than any other.

If you have several processes running at the same time, the batching approach is far more performant, because you don't have a single big process sucking out all the system resources.

Even if you have lot of memory, and lot of bandwidth, we have more data than that.

Comments have been closed on this topic.