Soft Deletes aren’t Append Only model
There seems to be some confusion regarding my post about soft deletes, in particular, people brought up the idea of append only models.
I had the chance to work on both types of systems, and I can tell you that I would much rather work with append only model than with soft deletes. The append only model means that you can only ever insert, never delete or update.
Thing about the way your bank account works. If I had the clerk transfer money from one account to another, and he had a typo and send tens times the amount that I wanted, the bank will not “delete” the transaction. What will happen is that there will be a separate transaction, canceling the first one.
There are several reasons for going with this approach, which Jim has brought up in his post:
- automatic audit logging, since nothing is ever UPDATE'd or DELETE'd, you've got a constant trail of changes
- automatic support for infinite undo/roll-back support of data, as you simply load a prior version and then save as usual
- automatic support for labeling of versions, much like in source/version control systems, at an individual record level, table level, "aggregate root level", or database level
- automatic support for "back querying" a system, in search of what the situation looked like last month, last year, etc. (though raising this "aspect", as in AOP, to the ORM level would be crucial)
As I said, this makes things much simpler from a lot of aspects. It does mean that you have a more complex data model (because all associations are now using: Id + max(version) ), but that is manageable.
But, as I said, there is a distinct difference between that and soft deletes. Soft deletes, as I refer to them, portend to IsDeleted columns that perform a logical deletion in the database. I don’t really like those, and I explained my reasoning in my previous post.
Append Only models represent some complexity with regards to managing things, but in general, they force you to think in a very different fashion than CUD models. For one thing, you are almost always going to have a different reporting model, instead of trying to query the append only model directly (which gets to be complicated).
There is one thing that I want to emphasis, using Append Only model should be reflected in your API. Trying to abstract that away is going to lead to a world of pain.
A great post as always :)
Some times using an append only approach can't be avoided specially if a financial solution is involved and each record in database means real money but as always the DB size and performance issues are my concerns in this kind of approach . I'd like to know if there are any guidelines/best practices/advices on designing such kind of databases.
As I realized Microsoft uses separate tables for processing instances and completed instances (in BizTalk for example) although I'm not sure if they are using apply only approach.
How about only doing mutations? So you would still only do inserts but not with the whole information but only with what got changed.
It's a paradox that a record is "more" deleted by editing out all its fields than by deleting it in the application which use a isDeleted column.
Greg Young has a good talk about this:
Its very interesting stuff.
Taking CQS to the architecture level is something I find very interesting but also daunting. It would make code so much cleaner, but the level of infrastructure is rarely required for the simple things I do -- more than CRUD, less than DDD.
I've been working in this area for a while. I call this technique "historic modeling". I've written a set of rules, a walkthrough, and a library in support of this idea.
Since you brought it up, Oren, can you shed some light from your experience how do you approach the 'Append Only' model? And, how do you use NHibernate to tackle that (if you use NHibernate at all)? You mentioned the reporting should be done differently, how should we do that without all the labors of ADO.NET / Stored Procedures /SQL ...
That Greg Young talk above talks about it some. Udi Dahan has a talk that also addresses this:
(I watched the NDC 2009 video, not sure if this is all the same stuff).
Anyways, the root of most of this is working in Domain Events. Moving to events allows you to have multiple handlers with each (potentially) having their own tailor designed model (domain, reporting, logging, etc).
Udi Dahan is a good source on this topic: www.udidahan.com/.../domain-events-salvation/
Immutable rows... I'm wondering what it would be like if such constraints could be modeled at the database level, and what kind of optimization could be applied to the database engine to favor last version reading for example, or efficient and transparent storage. Also wondering about whether the transaction log has aspects of a different implementation of something similar. Vague questions, I know, thinking out loud and way out of my area of expertise...
In the DB level, you can't really use constraints.
But something that would be easy to do is to have a separate reporting model (physical one) where you do updates.
You only read from it to show stuff.
I'll have a post about it.
Append Only also allow different systems to collect information about the same entity.
This is the approach taken by openEHR (openehr.org), which allows any number of systems to collect information about a person without fear of conflicts when consolidating records. Not a standard SQL Data Model though.
Ah yes, Append-Only with NHibernate.. This is something I'm tinkering with but have been running into a few hurdles. I'm trying to avoid having to do stuff like Evicts. So far meaningless PKs are a must, but this raises issues on how to tie in locking & ensure you aren't appending to an already appended object, and how to ensure cached references reflect the latest entities reliably. I'm looking forward to that post.
Building audit trails after the fact with NHibernate around a CRUD system is a really, really messy operation. (Been there, debugged that.)
I have a post about that that will show up in the 16th
Been a busy week... anyway, thanks for posting this Oren, as obviously I couldn't agree more. And I'm definitely looking forward to the post on the 16th that describes how you would tackle this problem via NHibernate.
Here's to hoping that you spending some time writing about this subject will actually bring it closer to the forefront, much as I'd love to see it!