A question of an untenable situation
One of the most common issues that I run into in my work is getting all sort of questions which sound really strange. For example, I recently got a question that went something like this:
What is the impact of reflection on NHibernate’s performance?
I started to answer that there isn’t one that you would notice, even beyond the major optimizations that NH had in that role, you are accessing a remote database, which is much more expensive. But then they told me that they profiled the application and found some stuff there about that.
I asked them what their scenario was, and the exchange went like that:
Well, when we load a million rows…
And that is your problem…
To be fair, they actually had a reasonable reason to want to do that. I disagree with the solution that they had, but it was a reasonable approach to the problem at hand.
Comments
That's why we have CQRS. Loading a million rows via NHibernate is insane (been there, done that, failed)
I hate the fact that there is this huge problem with refelction in .NET - there was a problem in 1 / 1.1 - 7 odd years ago - that it wasn't as optimised as it should have been - but in following releases of the framework - reflection has been really tuned. People always seem to jump and point the finger at reflection - rather than profile and measure their own code / logic / process. I can guarentee it's gone through a lot less scrutiny and optimisation than the CLR / BCL.
And I guess HN generated code to make typical reflection accesses faster like materialization or extracting all property values of an object into an array. The difference can only be measured in micro-benchmarks.
Why should loading million rows be a problem? It's not that many; anyway, the need to deal with massive amount of data is not that unusual outside of the typical customer-orders pet stores. Fortunately, there are frameworks around (like BLToolkit) that give you flexibility to efficiently do whatever you want without saying "you're doing it wrong!" or "you won't notice the performance hit".
Andrew,
million rows is generally an indication that you aren't doing business logic. You are generally doing reporting, ETL, etc.
That is something that requires different approaches than OLTP
Or doing a mass-update that is much more efficiently handled via an update statement or stored proc. An ORM is a good hammer, but that doesn't mean you have to build an application without a screwdriver or a saw.
I've seen it plenty of times, loading huge #'s of objects just to set some relevant state and save them again.... Because it's consistent that way.
Exactly the situation a project I'm joined.
NHibernate + 2nd level cache are used for a financial ETL + reporting engine where we're having to deal with million of rows generated daily and where the process takes over 6 hour to complete.
NHibernate is clearly more suited to OLTP scenarios with as small UoWs as possible.
And no, CQRS doesn't necessarily help for pure ETL scenarios.
Possibly RavenDb could be suited for this kind of scenario where it handles streams of data which gets processed over a period of time through map reduce functions.
Daniel
Talking about a million rows...
Is it insane to use NHibernate to work on huge amounts of data like this...
Scenario:
I convert a DB table with exactly 1.000.000 rows by code, maybe with some joins :)
Using Linq to Entities with the Option "No change tracking" so there memory usage stays low. Complete DB Schema importet, no changes on the definition.
With an foreach() on all source rows to convert all records
It are basically customer specific data converstions(import) that are too complex to do with xml. The 'ORM' Mapper is just used as a more compfortable & discoverable DbDataReader.
Entity Framework seems good for that use case. Good idea to do the same with NHibernate + some generator?
Daniel,
That is when you use NHibernate's Stateless Session
Comment preview