Is OR/M an anti pattern?
This article thinks so, and I was asked to comment on that. I have to say that I agree with a lot in this article. It starts by laying out what an anti pattern is:
- It initially appears to be beneficial, but in the long term has more bad consequences than good ones
- An alternative solution exists that is proven and repeatable
And then goes on to list some of the problems with OR/M:
- Inadequate abstraction - The most obvious problem with ORM as an abstraction is that it does not adequately abstract away the implementation details. The documentation of all the major ORM libraries is rife with references to SQL concepts.
- Incorrect abstraction – …if your data is not relational, then you are adding a huge and unnecessary overhead by using SQL in the first place and then compounding the problem by adding a further abstraction layer on top of that.
On the the other hand, if your data is relational, then your object mapping will eventually break down. SQL is about relational algebra: the output of SQL is not an object but an answer to a question. - Death by a thousand queries – …when you are fetching a thousand records at a time, fetching 30 columns when you only need 3 becomes a pernicious source of inefficiency. Many ORM layers are also notably bad at deducing joins, and will fall back to dozens of individual queries for related objects.
If the article was about pointing out the problems in OR/M I would have no issues in endorsing it unreservedly. Many of the problems it points out are real. They can be mitigated quite nicely by someone who knows what they are doing, but that is beside the point.
I think that I am in a pretty unique position to answer this question. I have over 7 years of being heavily involved in the NHibernate project, and I have been living & breathing OR/M for all of that time. I have also created RavenDB, a NoSQL database, that gives me a good perspective about what it means to work with a non relational store.
And like most criticisms of OR/M that I have heard over the years, this article does only half the job. It tells you what is good & bad (most bad) in OR/M, but it fails to point out something quite important.
To misquote Churchill, Object Relational Mapping is the worst form of accessing a relational database, except all of the other options when used for OLTP.
When I see people railing against the problems in OR/M, they usually point out quite correctly problems that are truly painful. But they never seem to remember all of the other problems that OR/M usually shields you from.
One alternative is to move away from Relational Databases. RavenDB and the RavenDB Client API has been specifically designed by us to overcome a lot of the limitations and pitfalls inherit to OR/M. We have been able to take advantage of all of our experience in the area and create what I consider to be a truly awesome experience.
But if you can’t move away from Relational Databases, what are the alternative? Ad hoc SQL or Stored Procedures? You want to call that better?
A better alternative might be something like Massive, which is a very thin layer over SQL. But that suffers from a whole host of other issues (no unit of work means aliasing issues, no support for eager load means better chance for SELECT N+1, no easy way to handle migrations, etc). There is a reason why OR/M have reached where they have. There are a lot of design decisions that simply cannot be made any other way without unacceptable tradeoffs.
From my perspective, that means that if you are using Relational Databases for OLTP, you are most likely best served with an OR/M. Now, if you want to move away from Relational Databases for OLTP, I would be quite happy to agree with you that this is the right move to make.
Comments
Actually SQL is bad.. if it's not people wouldn't use NHibernate (or other OR/M) The real problem is lack of "true" object database with some new object query language, linq provider, stored procedures written in language like c# etc
You are right that the author makes some good points, but some of his criticisms are only surface deep; especially the one about getting the whole object when you just need three columns. Most OR/Ms have that capability today.
On the surface it seems like he's making good points, but then after re-reading it. I realize it is just the criticism of someone who hasn't learned the details of a new technology and has held onto the ghosts of the past.
I sum up his article as "Hey that super-model has a wart, gross! I'm staying single for now." Well when you progress to something new, that road isn't always smooth and hassle free.
Good article. Raises a lot of interesting points. I would have to agree that ORMs are not the solution to the object relational miss match and in most cases add more complexity than they remove. They are generic solutions that people try to use in very specific ways, that almost always leads to pain.
A generic tool will never be as simple or as fit for purpose as something built for the job. The other issue with tools like NHibernate is that it tries to cater for all the edge cases and as a result is a very complicated beast to master so in the end does this really save you any time?
The points about HQL are also interesting. I've always found HQL querying a bit convoluted as it mixes object and relational concepts to form a bastardisation of SQL.
Just my 2 cents, I think we are far from solving the data persistence problem.
The problem with ORM is that you really need to know and understand what is going on under the hood in order to avoid shooting yourself in the foot e.g. with select n+1 problems.
Unfortunately the goal of ORM is to abstract away all the SQL so you can just concentrate on your domain model and business logic without having to worry about it. So the requirement to know and understand everything that is going on under the hood compromises the purpose of ORM in the first place.
Having said that, once you understand what is going on, and what you can and can't do, ORM is pretty useful.
So I wouldn't describe it as an "antipattern," so much as a "leaky abstraction."
There is also the question whether object databases are better. They exist: DB4O. Unfortunately, performance destroys the dream of transparency because you still need to have efficient queries. You just cannot traverse the object graph in a natural way, just like you would do it on an in-memory model. You need to load eagerly.
@James - I agree you need to understand, to a certain extent, what is going on under the hood, too many people blindly use ORM's and never even look at the SQL generated, they assume that because they get the data, it works.
Example: http://www.philliphaydon.com/2011/08/nhibernate-work-around-is-not-really-a-work-around/
LINQ in the NHibernate is a joke, even the latest and the greatest implementation.
For a start, I don't believe the author is correct in their opinion that ORMs are meant to completely abstract away the relational side of the mapping and act as if you are just dealing with objects. I think the clue is in the name.
I've found that the problems a lot of users seem to have when dealing with ORMs fall on the read model side (especially when using a LINQ provider) rather than the write model side. For the read-side I just tend to use something like Dapper in addition to NHibernate where necessary.
For the write model side I find automated dirty-tracking, cascading, transactional write-behind and identity map management increase my productivity and my code's readability immensely.
I'm sure these things are possible using raw SQL or stored procedures while still using a relational database but I get the feeling I'd be rewriting significant portions of an ORM anyway.
ORMs just add another layer of unnecessary abstraction and confusion and do nothing for large enterprise applications except get in the way. They are a tool to help none sql developers work against a database.... just learn your database and sql.
@shep, learning sql database and does not necessarily solve the problems being discussed. At some point you need to do OR/M, and I believe on these tools, many people thought about this problem and what would be the best time to do this things When I started to develop systems, for example, nobody thought about when would be the best time to load an object (eager or lazy loading) or to map (our Data Access Objects do the mappings) in the company where I worked. Often not object-oriented solutions were used to solve problems that actually exist and that these tools try to solve when a pure OO solution can be used.
Actually you do it without thinking about. You loaded the data, from the database, at the point you needed it there by only using resources at that moment. You also mapped that data at that given point in time and only kept the data as long as needed for edit/update/removal. Which I think follows "lazy" loading of data, only load it when you need it. Developers do it daily without thinking about it even if not using an ORM tool.
My point is these ORM tools add another layer of complexity and understanding that, to me anyhow, is unnecessary and an unnecessary burden of resources.
Every layer of additional processing degrades performance in an application. Not to mention you now have to be versed in yet another tool as to not sink your application if you do things poorly.
Sometimes there is something to be said for simplicity. :-)
@shep - it depend on the kind of application you have to build - for a data-intensive application, with over 100 tables, it's a pain in the ass to manually write all SQL just to perform data retrival and storage, or to write manually hundred of stored procedures, one for each different query you need..
Adding to this, an O/RM ease the refactoring when a column or table name is changed, provides strongly-typed query capabilities (a mispelled column name in a hand-written SQL usually is discovered only at runtime) and to some extend, hides the differences between various database servers (in some rare cases, an application has to support different database servers, and without an O/RM this is very hard to accomplish due to differences in SQL sintax between different vendors)..
Sure, somebody, given enough time can build a custom persistence solution, optimized for the problem at hand (perhaps using code generation), but most companies don't have this option and have to use a pre-build framework..
Hey Oren - with regards to Massive there is less chance of N+1 since the objects returned are purely dynamic - ExpandoObjects to be specific. So you can't loop and execute a query.
What this means is there's a tradeoff - you have to explicitly ask for the data you want, when you want it. Some might not like that in concept, but I find that in practice it works out just fine.
In terms of "Unit of Work" - indeed there is one. It's a literal transaction that is invoked when required. If you shove a bunch of objects into a Save method ... well it's a transaction.
In terms of aliasing... you know how to write SQL yes?
Rob, Think about Order->Order Lines->Products scenario. I can just about guarantee you that no matter what data access method you'll use, the first impl of the Order page will have at least 1 SELECT N+1 in it.
Yes, select n+1 is a problem sometimes but my experience is that in most oltp scenarios you can quite happily live with it. First of all, usually we are talking about OLTP systems for medium size companies having few hundred workers and a reasonable number of customers (if you're Amazon you won't probably be using a RDBMS with ORM) In a typical oltp system most of the data is 'dead' (historical) and only a small portion of it is alive - this is the data users are working on. A medium sized company will have no more than few thousand active processes (orders, shipments or whatever) and all this data will easily fit in the ORM L2 cache. The GUI usually will use paging when displaying lists so even if you have a select N+1 query your N will be no more than 50 - in the worst case you'll execute no more than 50 sql queries per page of data displayed. But with L2 cache enabled this number will be much smaller because the data that must be lazily fetched from the database is very likely to be in the L2 cache. Of course still assuming that users are working only on the alive data.
This topic comes popping up every other year...
ORM is bad because OO doesn't match RMDBS features, the new aspect on the article is that instead of OODBMS, NoSQL is presented as an alternative.
Listen, the problem with an OODBMS is that it isn't an RDBMS, and NoSQL has the same problem. I can't remember when the functional requirements obligated that using an RDBMS or its features, it's just that at some point a decision maker simply states: "You won't store data somewhere else than in SQL Server, right? We have DBAs that ensure your data is safe there..."
Back to ORM: Hand-written SQL might be faster, and SP even more, but I have a project to finish. And if I need to answer a special question with complicated joins etc., I create a view and map it to a (view) model class.
Oren, why did you feed this obvious troll post from Seldo? :)
There is still one thing that annoys me about OR/M is that it tightly couples the DB schema to the application tier.
Sprocs at least provide a facade pattern rather than a fine grained interface.
@David - the 'mapper' part from an O/RM, if used properly, alows the DB schema and the domain model to be decoupled..
Stored procedures are needed sometimes (very complex queries, performance reasons etc.), but most of the time just add to the complexity: somebody has to write or generate them, they make refactoring harder (due to the lack of refactoring tools in the db-side, at least until recently), no compile-time checking, procedural style of programming etc..
@Tudor - There are still quite a few cases where you still need an extra step to go from a DB Schema to the Domain Model even when using an ORM.
For example, take StackOverflow question -> tags relationships. In the DB model tags are not a separate entity but instead just a string field that is delimited by some character (forgot which exactly). However, it's probably best to represent tags in the domain model as an IList<string> property on the domain class.
You would still need an data transfer object in order to decouple the DB schema away from the domain model in cases like this.
Matthew, Only if you are working with a very bad OR/M. With NHibernate, for example, you can do that directly inside the ORM
Well I'm using EF4 and I haven't found a good way to do that (though I'm not impressed with EF4 for various other reasons).
@Matthew Why do you have this "relatonship" anyway? It violates 1NF. Performance?
This is usually done for performance reasons. There's no reason to use a separate table to store these type of tags if they are usually required to be retrieved along with the parent entity. Unless the number of tags per parent is massive, it can easily cheaper to return a single table than having to do 1 or more joins (more depending on if you do a many to many type of tag system) for data that is usually just one text word and no additional data.
"Many of the problems it points out are real. They can be mitigated quite nicely by someone who knows what they are doing, but that is beside the point."
No it's not. I don't like this endless discussion about ORM and the argument about what developers need to know to prevent problems. Granted, I don't code very often anymore, but coding is just using one huge abstraction after the other. Wether it's knowing about HTTP verbs or the C# yield behavior and IEnumerable, wether it's understanding lazy loading, SELECT N+1 and session per request, or the cost of reflection vs inspecting an expression tree.
The argument is lame, development is hard, I get paid good money by enterprise companies. I fuck up regularly (ORM got in my way), usually not too bad. We offer support and everyone is happy.
Or development is simple, you get paid less, or the same but you are much faster to market. You still fuck up (wish you had an ORM), usually not too bad. You offer support and everyone is happy.
I don't get it, why do we discuss the same issue with so little original arguments? It's endless, the vietnam article said it all. I'm tired, I shouldn't even have commented.
Oh well there is always BLToolkit :)
Also a very thin non-leaky / no-learning curve abstraction over SQL, but with very a good Linq provider. which means no strings. So it's more productive then a Micro-ORM.
Anyway a short and funny video about ORMs comparing them to cake-mixes :)
http://vimeo.com/28885655
Comment preview