Optimizing NHibernate
Aaron (Eleutian) is talking about some issues that he has with optimizing with NHibernate.
So in short, I feel NHibernate (and any ORM for that matter) needs the following features to really be optimization friendly:
- Lazy field initialization
- Querying for partial objects: select u(Username, Email) from User u
- Read-only queries that do not get flushed.
- Join qualifiers (on in T-SQL)
Let me try to take this in order.
Lazy Field Initialization:
On the surface, it looks very good, because you can do something like:
Customer customer = session.Load<Customer>(15);
Console.Write(customer.Name);
And the OR/M would generate this SQL:
SELECT Name FROM Customers where Id = @p0; @p0 = 15;
That sounds fine, until you realize that the database roundtrip is far more expensive than loading a single row, even if you load all its columns (leaving BLOB aside for now). This means that it is usually much more efficient to load the entire row at a single go, rather than piecemeal.
I tried to toy a bit with the way you would do that, and I can't really think of a good way to handle it without causing major management issues for the OR/M user.
That said, it is a feature that can be very valuable if you make it optional, per field basis. That way, you can have a customer object that contains a Photo BLOB column, and have it accessible only when needed, yet keep the natural programming model.
Partial object queries
Well, NHibernate has that:
select new UserSummaryDetails(u.Username, u.Email) from User u
It will return a list of UserSummaryDetails that you can use. As an aside, they are also not tracked by NHibernate.
Real Only Queries
Those are actually fairly simple to implement, you just need something like:
ISession tempSession = factory.OpenSession( currentSession.Connection );
tempSession.FlushMode = FlushMode.Never;
results = tempSession.CreateCriteria(typeof(Customer)).List();
tempSession.Dispose();
Abstract that to a helper function and you are set.
Another option is to use SetResultTranformer to inject an modifier that will evict the instance from the session (which has its own issues).
Join Qualifiers
From a few experiments that I have done, there is not difference on the query plan if you are using ANSI joins or where clause joins. This is one solution to the problem. Another issue would be what syntax to choose. NHibernate would need to map that to all the relevant databases, which may not always support ANSI joins.
Not simple answer there, but the HQL Parser that we are building should make it more accessible to developers to go in and change it.
Comments
Re: Lazy Field Initialization --
In your example, loading everything but the Photo BLOB, this could be done by creating a separate CustomerPhoto table+entity and setting it up as a one-to-one. This has a secondary benefit at the db layer in that you may want to put such data into a secondary filegroup.
I'd have to have a really good reason to do it, but you could always bypass NHibernate for iBatis.Net to regain full control over the SQL for optimization or trickery. I think it's more repetitive work to use iBatis than Hibernate, but you get full control. Of course it's setting you on a slippery slope to procedural, data-driven code hell.
Here's my reply:
http://blog.eleutian.com/2007/08/26/OptimizingNHibernatePt2.aspx
As an aside, we just had a horrendous issue with having a Photo blob in our user table. Any time we loaded several users, the flush would take a good 5 seconds as it compared every byte in every photo. Eek.
We've been wanting to move photos to a separate table, and we're doing that now.
Just for clarity (SQL Server): ANSI joins and where clause joins differ when you add more clauses to the join than just the A.key =B.key clause for outer joins. E.g. "left outer join B on A.key=B.key and B.type = 1" is not the same as adding these clauses to the where statement, as the latter removes all rows from the result set that does not have B.type = 1, while the former just return NULL for all columns from B for the rows in the result set.
Lazy field initialization is pretty slow in general, so it's a perfect example of a feature requested by someone who has no real clue how an o/r mapper works. Sure, if you just want to fetch 1 or 2 field from a single entity, it might be more optimal, but then again, you can also use a projection.
"select new UserSummaryDetails(u.Username, u.Email) from User u"
I think he wants:
select new User(u.Username, u.Email) from User u
so he can exclude fields for fetching, e.g. an Order entity which has a blob field with an offer .doc, an employee entity which has a photo field etc.
Non ansi joins have some problems with some left/right join setups, where they lead to different results than you'd expect.
Any o/r mapper should offer the ability to specify the 'on' clause for a join as it otherwise could lead to blind spots in what the o/r mapper could do so the user (the developer) is then forced to bypass the o/r mapper core which is always a pain.
If you read my reply I try to clarify that I'm not at all advocating the use of Lazy field (or even collection) initialization. I think it's a smell. That doesn't mean it shouldn't be there to protect the developer and to enable more advanced scenarios, but it should set off a red light when it's used, or with adaptive fetching strategies, the next time the query was made it would be included.
But yes, you're right about how I want projections to work, but it doesn't have anything to do with excluding large fields. It has everything to do with excluding fields I'm not going to use so that I can actually take advantage of indexes that cover those fields (an index on username that includes email for example) and cut down on bandwidth, hydration overhead and flush overhead while not introducing new anemic single use objects.
@Ayende - On the partial object queries part, do you know whether the fact that these objects don't end up in the session causes a noticable speed improvement with NHibernate?
Colin,
Usually, that is not meaningful. It only comes into play in rare occasions where you load several thousands of entities.
Lazy field initialization is indeed key to many advanced optimization scenarios, such as:
1) Loading only the fields you´re going to need for a use case
2) Avoiding loading fields twice
3) Enable lazy loading of large fields such as blobs
Projections are (unless they are updatable) only really useful for read only use cases.
Comment preview