Ayende @ Rahien

Refunds available at head office

Append Only Models with NHibernate

I mentioned that using an append only model isn’t just an infrastructure choice, it has a big effect on your API and your general approach to working with your model.

Let us look at a typical example of change the martial status of an employee:

// transaction is opened before method by the infrastructure
public void Consume(ChangeMaritalStatus msg)
{
var emp = Session.Get<Employee>(msg.EmployeeId);
emp.ChangeMaritalStatus(msg.NewMaritalStatus, msg.MaybeNewSurname);
}
/*
transaction is committed here and changes are flushed to the database
by the infrastructure
*/

As you can see, using this approach, this will issue an update statement from NHibernate. This is a typical model for using NHibernate.

Next, let us look at the same action, using an append only model:

// transaction is opened before method by the infrastructure
public void Consume(ChangeMaritalStatus msg)
{
var emp = Session.GetLatest<Employee>(msg.EmployeeId);
var newEmpVersion = emp.ChangeMaritalStatus(msg.NewMaritalStatus, msg.MaybeNewSurname);
Session.Save(newEmpVersion)
}
/*
transaction is committed here and changes are flushed to the database
by the infrastructure
*/

Notice what is going on in here. We changed two major things, first, we moved from just using a Get<Employee>(), an NHibernate method, to using GetLatest<Employee>(), which is an extension method. Second, where before we relied on NHibernate’s change tracking to do the deed for us, now we get a new version from the method and we save it explicitly.

If it reminds you of the functional model, this is accurate, in the append only model, data truly may not change, any update is just a copy of the old data plus whatever changes you wish to make.

GetLatest<TEntity> implementation is going to depend on how you actually manage the data in the database. I would usually recommend something like the following table structure:

CREATE TABLE Employees
(
Id bigint not null,
Version int not null,
MaritalStatus int not null,
/*
other columns
*/
PreviousVersion int null,
PRIMARY KEY (Id, Version)
FOREIGN KEY (Id, PreviousVersion) REFERENCES Employees(Id, Version)
)

Some people prefer to use dates for marking the differences between versions, but there are some significant advantages for using a version number. You have no fear of collisions or inaccuracy, which may happen with dates. It is also easier to read something like Employee # 12312/4 rather than Employee # 12312 / 2008-08-01T10:04:49.

You can just turn all queries in the database to something like:

SELECT * FROM Employees this
WHERE Id = 12312 AND Version = (
SELECT max(Version) FROM Employees version
WHERE version.Id = this.Id
)

But this has some problems, it forces a scan of at least part of the table (although it will be at least an index scan, if the DBA is worth anything at all) which is a lot of work. We can probably get far better results by using:

CREATE TABLE LatestVersions
(
Type nvarchar(255) NOT NULL,
Id bigint NOT NULL,
Version int NOT NULL
PRIMARY KEY (Type, Id)
)

SELECT * FROM Employees
WHERE Id = 12312 and Version = (
SELECT Version FROM LatestVersions
WHERE Type = 'Employees' AND LatestVersions.Id = Employees.Id
)

I haven’t run any comparison tests on this, but this is likely to be faster. Regardless, I consider it more elegant.

The GetLatest<TEntity> simply use HQL or Criteria API to make NHibernate perform this query.

I usually also install a pre update event listener that just throws when it encounters an update (which is not allowed in the system).

And yes, beyond these two changes, everything else just flows. You never update, you always get by latest, and that is about it. Well, almost, you also need to make sure that NHibernate creates the primary key properly, by incrementing the version and updating the LatestVersions table. This can be done using an IIDentityGenerator.

There is a side topic that I am not going to discuss right now relating to how you are going to handle the reporting model. This type of model is great for a lot of reasons, versioning, auditing, etc. What is it not great at is reading data.

The problem is that you generally have a model that doesn’t work that well for reading from the database or for showing on the screen. What usually happen is that there is a separate process that is responsible for taking this data and turning that into a more palatable from for reporting purposes (and remember that I consider reporting to be also show me the details screen of employee #12312).

In particular, queries on this type of model are awkward, slow and tend to be annoying in general. I heartily recommends using a separate reporting model.

Comments

Steve Py
09/15/2009 02:49 AM by
Steve Py

I can't say I'm a big fan of concatenated keys.

Question: How do you ensure in mappings that when you resolve Many-to-One references or bags and such that you get the latest version of each object?

Set
09/15/2009 05:50 AM by
Set

Nice lil typo : the martial status :o

Rafal
09/15/2009 06:33 AM by
Rafal

This model is good if you want to know previous versions of data, but it's not optimized for most typical uses like access to current status and querying. Steve's question is very important - you didn't touch the subject of joins and many-to-one/many-to-many references. In your model you would have either to update the references in all referring objects on each version change or to do some tricks with joins to get the latest version of referred objects. None of these is supported 'out of the box' by NH and requires manual work. I would rather build a normal read/write data model and use append-only where it is strictly necessary (where business requirements state clearly that we must store previous versions).For all the remaining updates I'd build a 'history' table where each field modification would be stored. This could be done automatically using update triggers (either database-level or in the ORM).

Dave Mertens
09/15/2009 07:12 AM by
Dave Mertens

We have a totally different approach to append only tables. Each table also has a 'history' table to holds the previous versions. Records are moved to the history table by triggers. Each record also contains the database user who issues the update (part of the audit trail).

NHibernate does not have a clue about the append-only model. Our databases are used by multiple applications so in my humble opinion the database should now these business rules and not the 7 applications that use the database.

I don't want to think about the problems I would face if these rules were part of each application and a new application doesn't follow the append-only model rules and issues a delete statement.

liviu
09/15/2009 07:41 AM by
liviu

Hi,

SELECT * FROM Employees

WHERE Id = 12312 and Version = (

SELECT Version FROM LatestVersions    

WHERE Type = 'Employees' AND LatestVersions.Id = Employees.Id 

The aggregate table with Latest versions is a disaster for long running update transactions... every reader of latest versions will be blocked if using read committed transactions...

We have come to a workaround for this, but i would expected you to have seen the big problem ...

liviu
09/15/2009 07:46 AM by
liviu

..And of course, all writers to the same entity type will be SERIALIZED

Ollie Riches
09/15/2009 08:42 AM by
Ollie Riches

@Dave Mertens

7 applications that use the database!

ROFL....

chrisb
09/15/2009 08:59 AM by
chrisb

@liviu:

Its been a while since I read up on any of this stuff, but would enabling and using row version snapshots in sql 2005+ get you out of that particular hole without really needing to change much?

Frank Quednau
09/15/2009 09:18 AM by
Frank Quednau

Our current project is pretty much an "Append Only" application. Here, all relevant "Objects" carry an object state, either current, under revision or obsolete. Finding the latest object is a matter of saying "select from x where id = y and state = current". However, this leads to one update of the previous current, which becomes superseded. Also, the application must take care of updating fk references /inserting new ones (for m:n relationships) between objects since currents may only reference other currents.

Either way, this is a very special domain, since we are also forced to do denormalizations as superseded objects must remain completely frozen in their state. We are doing this very DB-centric (speak DataSets) for mostly non-technical reasons, which is why I was looking forward to this post. One day I'd love to see how this could have been done with NHibernate, but that's a pleasure I'll have to set up myself..:)

Mr_Simple
09/15/2009 10:40 AM by
Mr_Simple

@Dave Mertens

Now that is a pattern Mr_Simple gives gold stars for, so one gold * for you.

Eyston
09/15/2009 12:22 PM by
Eyston

I can't imagine going to a composite key model by choice. I used one for the past year with NH, and while NH makes it work, it is not a good experience.

The cool stuff starts to happen when you have multiple consumers of ChangeMaritalStatusMsg. I must say I'm eager to try out Domain Event style programming that leads to multiple, specifically targeted data stores.

V
09/15/2009 02:40 PM by
V

On other way of doing this is instead of inserting the newest Employee you update the current and insert the Old...

this way you keep all the references. 2 changes to the database but not having to worry about updating FKs references is a bless.

public void Consume(ChangeMaritalStatus msg)

{

var emp = Session.GetLatest

<employee(msg.EmployeeId);

var oldEmp = emp .Clone();

emp .ChangeMaritalStatus(msg.NewMaritalStatus, msg.MaybeNewSurname);

Session.Save(emp) //Update

Session.Save(oldEmp) //Insert Older Version

}

Daniel Auger
09/15/2009 03:39 PM by
Daniel Auger

@Ollie - 7 applications using the same DB is laughable to many developers, but you have to remember that "one true database" is still the mantra in DBA circles.

Ollie Riches
09/15/2009 03:53 PM by
Ollie Riches

@Daniel Auger

That's not an acceptable way to do 'Goldern Source'...

It's an excuse from the same book as 'This is to complicated to write tests'...

it's also a sign that DBAs have to much control over the development process...

Daniel Auger
09/15/2009 03:59 PM by
Daniel Auger

@Ollie. I do agree on all points you have there :)

Ollie Riches
09/15/2009 04:10 PM by
Ollie Riches

@Daniel Auger

I currently 'live' in such an environment :(

Scott White
09/15/2009 04:21 PM by
Scott White

Why not just set mutable to false?

Kurt Harriger
09/15/2009 05:18 PM by
Kurt Harriger

I would be very interested in learning more about implementing the append only model with NHibernate.

I could envision the GetLatest extension method, but my understanding the <version mapping attribute updates the version column rather than inserting a new row, how did you change this behavior?

How would you handle loading relationships as Steve Py mentioned?

Is it possible to have NHibernate call the Get method on my Repository to perform the lazy loads rather than directly retrieving it from session (this would be huge for me for lots of reasons)?

Perhaps you could maybe map entities to with a where clause or filter that joins latest version table, thus easily enable or disable the inclusion of historical records? But again latest version table isn't standard behavior?

SteveoReno
09/15/2009 06:48 PM by
SteveoReno

I would expect the append only to intercept 'update' calls where the object exists and perform an 'insert' instead.

That is what I consider 'append only'

And with a, ie., one to many relationship, each object in the object graph would need that logic.

Too bad this can't just be set in the mapping files...

liviu
09/15/2009 07:49 PM by
liviu

@Frank Quednau:

row version snapshots in sql 2005+ put a heavy burden on tempdb in oltp scenarios but are generally ok for reading data. In mixed read/update scenarios in a transaction they are useless because you check db constraints against stale data.

Ayende Rahien
09/15/2009 07:59 PM by
Ayende Rahien

Rafal,

I explicitly stated that this model requires a reporting model to do most queries from.

I didn't touch references because they just work, but I guess that I should have explicitly mentioned that each entity is a point in time.

Its references are also to a particular point in time.

It doesn't make sense to be able to update an entity association without also updating the entity in those type of models.

A good example would be with contracts. I may update the contract, but the employee's employement contract doesn't change because the contract was updated.

There is an explicit step required to change the employee from one version of a contract to another.

Ayende Rahien
09/15/2009 08:01 PM by
Ayende Rahien

V,

That requires that you would still update things, and that violate the append only model.

That type of behavior is for audit trails, not append only. I think I have a post about how to implement those.

Ayende Rahien
09/15/2009 08:02 PM by
Ayende Rahien

Kurt,

ChangeMaritalStatus will return a NEW INSTANCE of the employee, there is no update going on.

Ayende Rahien
09/15/2009 08:03 PM by
Ayende Rahien

SteveoReno,

I often hear this, but it tend to fail in real life.

Too leaky an abstraction

Ayende Rahien
09/15/2009 08:09 PM by
Ayende Rahien

Scott,

That would just make NH not save them. I want to force an error if you try

Ayende Rahien
09/15/2009 08:11 PM by
Ayende Rahien

Liviu,

Huh? Why would it be a problem for read committed?

I can see it being a problem for repeatable read or up, but read committed doesn't lock rows it reads.

Steve Py
09/15/2009 11:08 PM by
Steve Py

@Dave - That's a snapshot more than append-only. It's perfectly suited to audit history tracking.

One goal of append-only models is that references to previous revisions of data can be maintained, while new data references current revisions. As Ayende pointed out regarding his Contracts example.

Along with Ayende's example there is the option to use FK-2-Self references to ancestors or descendants. The benefits of these approaches is that you avoid combined keys. FK-2-Ancestor works well with NHibernate, though there is a bit of work needed if you want to archive records at a certain cut-off point. (Need to clear a FK off the new tail, and then take steps to merge back into the chain when that record itself gets dumped to the archive.) FK-2-Descendant handles the archiving scenario a bit more gracefully (needing only a proxy in the archive that is overwritten) but doesn't keep ancestry "pure" from updates as ancestors need to be updated with associations to their new descendants.

In any case there still appears to be a bit of baby-sitting needed in applications to ensure that the latest revision of a given object is maintained, and in my experience NHibernate's caching can be a bit of an obstacle. My question to Ayende is what techniques might be employed with/around NHibernate to detect, and refresh revisions to objects? I'm thinking that FK-2-Descendant may have a benefit because it "dirties" the record when it is replaced...

Ayende Rahien
09/16/2009 12:28 AM by
Ayende Rahien

Steve,

The problem with trying to refresh to the same instance from different version (which is actually a different row) is that you are missing the issue of the PK to instance mapping (identity map).

NH is built around the idea of identity map, and you won't get far trying to break it.

Steve Py
09/16/2009 01:27 AM by
Steve Py

Yeah, I definitely don't want to break it. After further work with the problems over the last couple days there aren't any show-stoppers with managing FK between objects. Whether those are updated to new versions or not is the responsibility of the application(s). Trying to automatically "change" those references behind the scenes would be a painful exercise.

But I do want to be sure that when I go to fetch data based on search criteria (that hasn't necessarily changed with the revision) that I get back the current reference, not a cached copy of a potentially stale object. I can see how you might address that with the lastest version table, however I'd like to see that applied with mapping/DAL-isms.

Using fetch by example for instance with the FK2Ancestor I need to tell NHibby to ignore fields like Revision/Version as I don't know what the latest version is. Using Linq2NHibby I've resorted to SortDescending + FirstOrDefault. With FK2Descendant I'm interested in rows that have no descendant unless I explicitly state otherwise. (or pull references by ID for historical associations) I think I can trigger my app to ensure that Descendant isn't present so that when my descendant reference is refreshed due to a new replacement I can dump the object and reload it to get the current revision.

Heh, I'd love to hook up NH Prof to it just to see what's really going on behind the scenes. It does work, but... But I'm dealing with a relatively small data pool and I don't want this biting me in the ass when dealing with larger databases if this turns out to be doing more than index scans. (I guess SQL Prof can tell me that...) sigh This pot just keeps boiling over with more stuff to try...

Frank Quednau
09/16/2009 05:07 PM by
Frank Quednau

liviu,

row version snapshots? we are not using this feature in any way...considering that we are doing this mayhem on an Oracle Database it would make it pretty difficult, too :)

Chris Nicola
09/17/2009 03:37 PM by
Chris Nicola

Oren, sorry if this is a little off topic but I am curious how the transaction (and perhaps the session) is handled by the infrastructure in a case like this.

I am doing something that looks very similar. A message is passed and the consume() method changes something in the domain. For me the session and transaction are opened and closed by the consume method.

I'm sure there is a better approach than opening a session each time consume is called. I also don't like my message handler having a dependency on SessionFactory.

Ayende Rahien
09/17/2009 07:53 PM by
Ayende Rahien

Chris,

I do that in the message module

Comments have been closed on this topic.