25 Reasons Not To Write Your Own Object Relational Mapper
Not long ago I was ask to evaluate what features are need to build a robust data access layer. The choices were:
Build / Code Gen a DAL
Use an existing Object Relational Mapper
Build an Object Relational Mapper spesific for this application.
DataSets were not appropriate to this application, so the the first and last choice were essentially the same. The database schema is not one that allows code generation of a DAL, so it was down to build a DAL or use an exsiting one. I recommended that the application will use NHibernate (of course), but the initial thought was to build a simple DAL on top of stored procedures, and manually unpack the result set to collection of objects.
This is really simple code, mostly, and that can be code gen'ed fairly easily. The problem with this approach is that for a complex DAL (as is this case) you aren't really going to be able to handle all the functionality that most application will eventually need. It is highly procedural in nature, and require you to track a lot of things in the application code.
I sat down and compile a list of features that I made use of in a recent project, which used NHibernate for data access. Some of them are basic O/RM features, some of them are convenience features, some of them are advanced stuff that I'm using. Some of those features are overlapping, some required other features to be present, some are a stepping stone for interesting things.
This list is based of NHibernate, and it is not complete by any sense. I'm probably using some other stuff that I'm not thinking about right now. There are fully featured O/RMs out there that doesn't support all those features, and there are some really nice features that NHibernate does not support.
This also assumes a Unit Of Work pattern that drives the O/RM (NHibernate, DLinq, etc). There are other ways to handle this (Active Record, Disconnected Objects, etc), but I find that I really like the Unit Of Work style of work, since it frees me from manually tracking what changed and what didn't. It also maps closely to the way that web applications work (a web request is a unit of work), and can be modified to match a windows application pattern using real unit of works.
Without any particular order, here they are:
Mandatory for doing any work at all.
Needed to support updates across several tables / rows / objects in a consistent way and to handle several users working on the same set of objects.
Mandatory for doing any work at all.
Notification (usually exception) when saving a set of changes that has been changed by a different user since the last time the data was read.
I prefer an implementation that relies on a field in the object that is used to version the changes. I would use a TIMESTAMP column on SQL Server, and try to use something similar when I'm using other databases.
Another way to do this is to check all the columns in the table, to see if they have changed. This is not as good, since it may cause granularity problems (DateTime sometimes does not round trip well, for instance), it has performance implications. as well. One thing to note if this approach is used is null handling. This also require mainting the original state of the object.
Builtin Caching (per unit of work & per application):
Caching highly improve the performance of an application. Per unit of work caching means that if a single request needed a certain peice of data twice in the same unit of work, the fraemwork is smart enough to give the data from memory instead of hitting the database again.
Per application caching means that if the same piece of data is requested from two different units of work, only the first request hit the database, the second one is served from the cache. Ideally this should be transperant to the code. The application cache should be smart enough to handle updates / deletes / inserts without the application code needing to handle it explicitly.
Maintaining Object Identity:
Requesting a certain object twice in the same unit of work returns the same object instance. This is important since it means that any changes already made to this object in this unit of work are retained. Otherwise you may get concurrency violation in the same unit of work.
A really nice side affect of this is that you can perform object equality tests using normal OO semantics. Otherwise you would need to compare PK and maybe versioning fields.
Caching invalidation policies:
Assuming that per application caching is enabled, you need to have a way to invalidate the cache if the database has changed by another application.
Allow to query for a single object or a set of objects based on their properties. Preferably with an object API. This is needed to support such things as search forms, which are notoriously hard to do easily. It is also very useful when you want to do reporting and / or some smart data manipulation.
The data in the database is not limited to a single row. There are associations between the different tables. Those associations should translate to the object level as well. Preferably, you will need support for generics and for sets and maps at a minimum.
The associations that are needed are:
One to Many - Person ->> Addresses
Many to many - Users << -- >> Roles
Many to one - Address -> Person
(Variant of many to one) Many to Any - Rule -> IHasRules
(Variant of one to many) Any to Many - IHasRules ->> Rules
- One to One - Person < - > Consort
This is very important for increasing throughput. If you can send several statements to the database in one go, it will significantly reduce the number of network roundtrips that you have.
Query for all objects and their sub types. For instance, get all rules for object, where a rule is a part of object hierarchy.
Using a Unit Of Work pattern means that I don't need to track my modified objects. If they are associated with the current unit of work (retrieved from the unit of work, or attached to it) the unit of work track them and saves them when I decide to commit the current unit of work.
Set the state of an object to its state in the database.
Mandatory if you are using associations.
Load only the current object, without any collections. Allows to load related objects the first time the association is accessed.
Flexible Eager Fetch:
Allows selective overriding of lazy loaded behavior when needed. Solves the Select N+1 problem.
A change to a parent object means a change to a child object. The change can be a delete (which can also be handled by a cascade delete in DB) or an update (can be handled by a trigger in DB), but it should also be able to follow the association paths to find new objects that were added to existing object, and insert them to the database.
When (not if) something goes wrong, there should be a way to follow the path of what is going on until I find what happened. This preclude certain types of smart-ass tricks. (Ever had the pleasure of debugging run time generated code?).
This can be provided by logs as well, though, and is usually not an issue. I maintain this because I saw some fairly hair raising implementations that I wouldn't really wish to debug.
Safe For Multithreading:
The framework should be usable from mutliply threads. How this is done is not really important, but this mean that it can rely on thread affinaty, since ASP.Net can and will move you from one thread to another if it feels like it.
Life Cycle Events:
Allows an object to take action when certain actions (usually Create / Update / Delete / Load) are happening. This is very useful for fetching data from other locations as well (Active Directory, Web Service, etc).
A well defined policy for what happens when an exception occurs. (What exception is cosidered fatal, for instance? What should happen if exception Foo occurs? etc).
Loading data without loading the object:
Grab a few property of an object without loading the whole object. This is very useful if you want to load just the neccecary data for a list box, where you usually want just the PK and some sort of description.
Composite Primary Keys:
They are not something that I like, but they are needed in many cases.
Create, Update, Delete, Load:
Not much to say about this, is there?
If I insert two objects, and one of them has a foriegn key to the other, they should be inserted in the correct order to avoid errors.
You are going to need it, and I rather like it to be in the database level than on the application side.
Custom Types Support:
It should be possible to map to more than just the basic types. The simplest example is to map an XML column to an XmlDocument.
Allow to execute queries that will retreive such things as count(), sum(), avg(), etc. Those are needed frequesntly to do a lot of things, and it shouldn't be hard to do them.
All of this is to point to the fact that building a robust data access layer is not something that can be dealt easily. It require quite a bit of thinking, and I don't think that any of the features that I had mentioned here are trivial. Getting them all into a coherent package is going to be hard. It is also not something that you can delegate to a junior programmer, in my opinion.
Too many places where you need to know what you are doing. Again, this is just a partial list of features that I used in a recent project. Some of them are deal breakers (transactions, concurrency, paging, aggregation), some are simply convenient (undo, loading object's properties), and some are required for performance (lazy loading, caching).
If you plan to use the Unit Of Work pattern (most O/RM do, but not all), take a look at this list and consider what do you think that you'll need for the application, and then consider whatever you want to do it yourself, or use an existing tool that can do this for you. Personally, I really hate duplicating existing functionality for no reason except NIH Syndrome.
I heard something that I really liked in DotNetRocks a while ago that is somewhat related: