NH Prof AlertsUnbounded result set
This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.
Unbounded result set is perform a query without explicitly limiting the number of returned result (using SetMaxResults() with NHibernate, or using TOP or LIMIT clauses in the SQL). Usually, this means that the application is assuming that a query will only return a few records. That works well in development and testing, but it is a time bomb in production.
The query suddenly starts returning thousands upon thousands of rows and in some cases, it is returning millions of rows. This leads to more load on the database server, the application server and the network. In many cases, it can grind the entire system to a halt, usually ending with the application servers crashing with out of memory errors.
Here is one example of a query that will trigger unbounded result set warning:
session.CreateQuery("from OrderLines lines where lines.Order.Id = :id") .SetParameter("id", orderId) .List();
If the order have many line items, we are going to load all of them, which is probably not what we intended. A very easy fix for this issue is to add pagination:
session.CreateQuery("from OrderLines lines where lines.Order.Id = :id") .SetParameter("id", orderId) .SetFirstResult(0) .SetMaxResult(25) .List();
Now we are assured that we need to only handle a predictable number, and if we need to work with all of them, we can page through the records as needed. But there is another common occurrence of unbounded result set, directly traversing the object graph, as in this example:
var order = session.Get<Order>(orderId); DoSomethingWithOrderLines(order.OrderLines);
Here, again, we are loading the entire set (in fact, it is identical to the query we issued before) without regard to how big it is. NHibernate does provide robust handling of this scenario, using filters.
var order = session.Get<Order>(orderId); var orderLines = session.CreateFilter(order.OrderLines, "") .SetFirstResult(0) .SetMaxResults(25) .List(); DoSomethingWithOrderLines(orderLines);
This allow us to page through a collection very easily, and save us from having to deal with unbounded result sets and their consequences.
More posts in "NH Prof Alerts" series:
- (31 Dec 2008) Use statement batching
- (30 Dec 2008) Too many database calls per session
- (29 Dec 2008) Excessive number of rows returned
- (29 Dec 2008) Unbounded result set
- (28 Dec 2008) Use of implicit transactions is discouraged
- (28 Dec 2008) Select N + 1
Comments
Yet there are OLTP scenarios (not showing data to a user) where you actually want the full list.
I'm not saying that that code could not benefit, performance-wise, from being rewritten in a more denormalized fashion, or possibly using straight SQL, but that domain models will continue to have regular "parent.children.etc.etc.so_forth".
What is the NH Prof story for OLTP?
Udi,
If you want to read the full list, you need to explicitly acknowledge that, although I think that even when you need the full list, you should try to process that in batches rather than the full list (which may be hundreds of thousands of items).
I am not sure that I understand what you mean about NH Prof story for OLTP
Sometimes we need to update all records in one table of de database. We wanted to keep the ISession small, since large ISession's seems to be a performance killer. So I wrote this function:
<t PagedResults <t(ICriteria criteria, int pagesize)
<t results = criteria
<t();
It would be much, much harder for us to do updates without the domainmodel.
Sjaaky,
This is actually something that you would want to use the Stateless Session for.
If this is a common scenario, shouldn't there be an overload for CreateFilter(object) that assumes the query is String.Empty?
Could a limit on all querys not also be a time bomb?
If i assume a table has max 25 record because it currently have only 10. But after 3 years someone which dose not know of this limitation adds 30 i have also an application which dose not work correctly.
Wouldn't it better if i check the resultset count and send warnings to the developers when a border is reached?
Steve,
No, it wouldn't be. Because when you enter paging into the mix, you also start handling things in batches.
Sure, if you blindly slap a limit clause on anything, this is a problem, but you shouldn't do that anyway.,
Can you list the factors that may lead you to use 'batches' when dealing with a large set of entities you know you will all have to process?
In our specific scenario, I have a little tougher time seeing it, as from a database server perspective, we've noticed that the sum of all "page" queries is more expensive than a single query to get all data. On our app server(s), we're using 64 bit, so running out of memory is less of a concern. Also, there's usually a pretty big pipe between the database and the application server.
Anything else I'm not considering?
The question is what you are trying to optimize.
If you are trying to optimize throughput of a single action, you might be better of with a single large process. Even then, a streaming approach would work much better than any other.
If you have several processes running at the same time, the batching approach is far more performant, because you don't have a single big process sucking out all the system resources.
Even if you have lot of memory, and lot of bandwidth, we have more data than that.
Comment preview