Append Only Models with NHibernate

time to read 5 min | 992 words

I mentioned that using an append only model isn’t just an infrastructure choice, it has a big effect on your API and your general approach to working with your model.

Let us look at a typical example of change the martial status of an employee:

// transaction is opened before method by the infrastructure
public void Consume(ChangeMaritalStatus msg)
{
var emp = Session.Get<Employee>(msg.EmployeeId);
emp.ChangeMaritalStatus(msg.NewMaritalStatus, msg.MaybeNewSurname);
}
/*
transaction is committed here and changes are flushed to the database
by the infrastructure
*/

As you can see, using this approach, this will issue an update statement from NHibernate. This is a typical model for using NHibernate.

Next, let us look at the same action, using an append only model:

// transaction is opened before method by the infrastructure
public void Consume(ChangeMaritalStatus msg)
{
var emp = Session.GetLatest<Employee>(msg.EmployeeId);
var newEmpVersion = emp.ChangeMaritalStatus(msg.NewMaritalStatus, msg.MaybeNewSurname);
Session.Save(newEmpVersion)
}
/*
transaction is committed here and changes are flushed to the database
by the infrastructure
*/

Notice what is going on in here. We changed two major things, first, we moved from just using a Get<Employee>(), an NHibernate method, to using GetLatest<Employee>(), which is an extension method. Second, where before we relied on NHibernate’s change tracking to do the deed for us, now we get a new version from the method and we save it explicitly.

If it reminds you of the functional model, this is accurate, in the append only model, data truly may not change, any update is just a copy of the old data plus whatever changes you wish to make.

GetLatest<TEntity> implementation is going to depend on how you actually manage the data in the database. I would usually recommend something like the following table structure:

CREATE TABLE Employees
(
Id bigint not null,
Version int not null,
MaritalStatus int not null,
/*
other columns
*/
PreviousVersion int null,
PRIMARY KEY (Id, Version)
FOREIGN KEY (Id, PreviousVersion) REFERENCES Employees(Id, Version)
)

Some people prefer to use dates for marking the differences between versions, but there are some significant advantages for using a version number. You have no fear of collisions or inaccuracy, which may happen with dates. It is also easier to read something like Employee # 12312/4 rather than Employee # 12312 / 2008-08-01T10:04:49.

You can just turn all queries in the database to something like:

SELECT * FROM Employees this
WHERE Id = 12312 AND Version = (
SELECT max(Version) FROM Employees version
WHERE version.Id = this.Id
)

But this has some problems, it forces a scan of at least part of the table (although it will be at least an index scan, if the DBA is worth anything at all) which is a lot of work. We can probably get far better results by using:

CREATE TABLE LatestVersions
(
Type nvarchar(255) NOT NULL,
Id bigint NOT NULL,
Version int NOT NULL
PRIMARY KEY (Type, Id)
)

SELECT * FROM Employees
WHERE Id = 12312 and Version = (
SELECT Version FROM LatestVersions
WHERE Type = 'Employees' AND LatestVersions.Id = Employees.Id
)

I haven’t run any comparison tests on this, but this is likely to be faster. Regardless, I consider it more elegant.

The GetLatest<TEntity> simply use HQL or Criteria API to make NHibernate perform this query.

I usually also install a pre update event listener that just throws when it encounters an update (which is not allowed in the system).

And yes, beyond these two changes, everything else just flows. You never update, you always get by latest, and that is about it. Well, almost, you also need to make sure that NHibernate creates the primary key properly, by incrementing the version and updating the LatestVersions table. This can be done using an IIDentityGenerator.

There is a side topic that I am not going to discuss right now relating to how you are going to handle the reporting model. This type of model is great for a lot of reasons, versioning, auditing, etc. What is it not great at is reading data.

The problem is that you generally have a model that doesn’t work that well for reading from the database or for showing on the screen. What usually happen is that there is a separate process that is responsible for taking this data and turning that into a more palatable from for reporting purposes (and remember that I consider reporting to be also show me the details screen of employee #12312).

In particular, queries on this type of model are awkward, slow and tend to be annoying in general. I heartily recommends using a separate reporting model.