Ayende @ Rahien

It's a girl

NHibernate: Streaming large result sets

Note: I am not feeling very well for the past week or so, which is why I am posting so rarely.

NHibernate is meant to be used in an OLTP system, as such, it is usually used in cases where we want to load a relatively small amount of data from the database, work with it and save it back. For reporting scenarios, there are better alternatives, usually (and before you ask, any reporting package will do. Right tool for the job, etc).

But there are cases where you want to do use NHibernate in reporting scenarios nonetheless. Maybe because the reporting requirements aren’t enough to justify going to a separate tool, or because you want to use what you already know. It is in those cases where you tend to run into problems, because you violate the assumptions that were made while building NHibernate.

Let us imagine the following use case, we want to print a list of book names to the user:

using (ISession s = OpenSession())
{
    var books = s.CreateQuery("from Book")
        .List<Book>();

    foreach (var book in books)
    {
        Console.WriteLine(book.Name);
    }
}

There are several problems here:

  • We query on a large result set without a limit clause.
  • We read a lot of data into memory.
  • We only start processing the data after it was completely read from the database.

What I would like to see is something like this:

while(dataReader.Read())
{
     Console.WriteLine(dataReader.GetString("Name"));
}

This still suffer from the problem of reading a large result set, but we will consider this a part of our requirements, so we’ll just have to live with it. The data reader code has two major advantages, it uses very little memory, and we can start processing the data as soon as the first row loads from the database.

How can we replicate that with NHibernate?

Well, as usual with NHibernate, it is only a matter of finding the right extension point. In this case, the List method on the query also has an overload that accepts an IList parameter:

image

That make it as simple as implementing our own IList implementation:

public class ActionableList<T> : IList
{
    private Action<T> action;

    public ActionableList(Action<T> action)
    {
        this.action = action;
    }

    public int Add(object value)
    {
        action((T)value);
        return -1;
    }

    public bool Contains(object value)
    {
        throw new NotImplementedException();
    }

    // ...
}

And now we can call it:

using (ISession s = OpenSession())
{
    var books = new ActionableList<Book>(book => Console.WriteLine(book.Name));
    s.CreateQuery("from Book")
        .List(books);

}

This will have the exact same effect as the pervious NHibernate code, but it will start printing the results as soon as the first result loads from the database. We still have the problem of memory consumption, though. The session will keep track of all the loaded objects, and if we load a lot of data, it will eventually blow out with an out of memory exception.

Luckily, NHibernate has a ready made solution for this, the stateless session. The code now looks like this:

using (IStatelessSession s = sessionFactory.OpenStatelessSession())
{
    var books = new ActionableList<Book>(book => Console.WriteLine(book.Name));
    s.CreateQuery("from Book")
        .List(books);

}

The stateless session, unlike the normal NHibernate session, doesn’t keep track of loaded objects, so the code here and the data reader code are essentially the same thing.

Comments

orip
06/27/2010 10:22 AM by
orip

awesome - I guess you could also create a list that creates an IEnumerator that supplies the results one by one. Would the effect be similar?

var books = new IterableList

<book ();

s.CreateQuery("from Book").List(books);

foreach (var book in books)

  Console.WriteLine(book.Name);
gs
06/28/2010 07:40 PM by
gs

__For reporting scenarios, there are better alternatives, usually (and before you ask, any reporting package will do. Right tool for the job, etc).

I'll bite, what's everyone use? I've been using Quartz.Net to run heavy reporting jobs at night and as sort of a load balancer (so really I use NHibernate, as Quartz.net just triggers the report). I wasn't aware of any report specific tools. Of course searching ".NET Reporting" brings up a bunch of garbage ...

robsosno
06/28/2010 08:07 PM by
robsosno

Partial solution would be also to use Enumerable (if books table is not extra large):

using (ISession s = OpenSession())

{

var books = s.Enumerable("from Book")

    .List

<book();

foreach (var book in books)

{

    Console.WriteLine(book.Name);

}

}

However there is performance penalty:

  • (select Id from Book ) * 1

  • (select * from Book Where Id = :id) * number of records

Ideally I would like to see in NHibernate Iterable interface which would work like this:

using (ISession s = OpenSession())

{

var books = s.Iterable("from Book")

    .List

<book(); // DB not touched; books contains only proxy

foreach (var book in books) // fetch from collection actually from Db

{

    Console.WriteLine(book.Name);

}

}

I don't know if this is technically possible. I know that for my particular needs it would be very convenient - calculating complex fees for each account periodically. Currently we are using stored procedures for things like this.

Steve Py
06/28/2010 10:20 PM by
Steve Py

Effectively an NHibernate-friendly cursor.

I nice demonstration of consuming large amounts of data responsively without allocating the memory to store the entire result set.

Alexey Romanov
06/29/2010 08:12 PM by
Alexey Romanov

On one hand I want to say: "That violates semantics of IList completely!" On the other, I don't see a better solution.

NC
06/29/2010 09:33 PM by
NC

Wheres NH3! Heard no news about NH in ages.

Ajai Shankar
06/29/2010 10:23 PM by
Ajai Shankar

Hi Ayende

Is that really true?

I remember quite some while back I wanted to stream results and posting to nhusers made it seem that was not the case:

groups.google.com/.../d9423d9a45ff9c62

One of the findings in that thread was:

The SqlDataReader is wrapped in a Driver.NDataReader which loads

everything into an internal list!

Maybe like you said all that was missing is the correct extension point?

Ajai

Hendry Luk
06/30/2010 02:21 AM by
Hendry Luk

This is precisely what Rx is for. When you created that ActionableList, you actually just reinvented a push-model (akin to IObservable in Rx). I for one would be interested to see Rx (and other .net features) getting built into NH that would enrich it from its java ancestry

Hendry Luk
06/30/2010 03:38 AM by
Hendry Luk

Not to mention that if we use Rx (instead of reinventing another ActionableList), we'll also get all the nice works those guys have been working hard on at Redmond, e.g. Linq support, parallelism. Hence..

// not only caters for simplest use-cases

IObservable books = s.CreateQuery("from Book")

.Stream(); //Extension method?

books.Subscribe(book=> Console.Writeline(book.Name));

// but also fancy stuff

IObservable customers = s.CreateQuery("from Customer").Stream();

var mayLike = from book in books

join cus in customers on book.Genre == cus.FavouriteGenre

select new {Book = book, Customer = cus};

mayLike.Subscribe(x=>

Console.WriteLine("Customer {0} may like {1}", 

    x.Book.Title, x.Customer.FullName)};

Maybe a bad example, but thats beside the point.

I thought it's a really neat abstraction for a "push-model queryable cursor" :)

Ayende Rahien
06/30/2010 06:36 AM by
Ayende Rahien

Ajai,

The use of NDataReader depends on the exact mix of what you are doing, the driver that you use and the configuration option.

Yes, if it is wrapped, it isn't good, but it is controllable.

Nabil
07/07/2010 10:22 AM by
Nabil

Does this work with Crtieria API?

Frank
07/19/2010 04:01 AM by
Frank

Doesn't this break NHibernate.Linq which only work under ISession and In essence breaks people's generic repository pattern?

Ayende Rahien
07/20/2010 11:10 AM by
Ayende Rahien

@Frank,

Probably, but I don't really like that attempt anyway

Frank
07/21/2010 04:27 PM by
Frank

So what are your suggestions or pointers? To me, it doesn't make sense to have IStatelessSession and ISession not inherit from some other common interface and I believe there's already a suggested patch (NH-2211) which doesn't look like it made it into the trunk (3.0).

Ayende Rahien
07/22/2010 07:08 AM by
Ayende Rahien

@Frank,

The problem is that there really aren't is-a between stateless and regular session.

They serve different purposes.

Frank
07/22/2010 03:57 PM by
Frank

I guess the folks who did the NHibernate.Linq thought otherwise. Whether or not you're reading data from ISession or IStatelessSession, the behavior is the same, it's just one does not do any tracking and the when the foundation is not designed properly, we start seeing things like Linq not working on IStatelessSession. I guess we can agree to disagree but I thank you for your input.

Ayende Rahien
07/22/2010 04:58 PM by
Ayende Rahien

the behavior is the same

Not really, no.

Sure, on the surface it looks like that is all it is doing, but they produce quite a different behavior.

Bahador Nooraei B.
07/22/2010 07:14 PM by
Bahador Nooraei B.

Is there a nice way to terminate data fetching when we use this approach?

I mean, say, we need 100 objects that meet a criteria that can only be enforced in-memory.

Thilo R
08/16/2010 02:48 PM by
Thilo R

I tried this approach while streaming a few MBs over UMTS - it takes over 55 seconds. Unfortunately no performance impact was measureable. It seems that the streaming starts as soon as all the data was loaded from the database. Is there something I am missing or is this method not working when streaming from the database directly row by row (like the DataReader)?

Ayende Rahien
08/16/2010 03:31 PM by
Ayende Rahien

Thilo,

There are several factors at play here.

a) you may be using an ADO.Net provider that will only push results to NHibernate after a significant amount has been reached.

b) you may be using a buffered output.

The value of streaming is reducing the time to first byte, and you need to ensure that you deal with this in both ends.

Comments have been closed on this topic.