Multi Tenancy - Extensible Data Model

time to read 5 min | 839 words

Continuing on the multi tenancy theme. We have the following problem, I now want to focus on the data model, and how you can create an extensible data model.

In particular, I am thinking about the ability to define something like a base line Customer entity, and allow a tenant to extend that with new fields and behavior. Note that we are talking about both data and behavior here, not on just extending the data. But let us try to take a look at that as well, just to see what solutions we can come up with.

The scenario that we have is that the tenant would like to define a new set of properties on the customer, such as non smoking preferences and the mother tongue. Let us see how we can deal with it. We will start by focusing solely on the underlying data model, without considering the actual entities.

Non RDBMS store

The main problem of storing additional properties in an entity is that we usually map that to an DB, and those really would like to know, upfront, about all the columns and what is the type of each and every column, Going with a non RDBMS store frees us from this limitation, because we can choose a back end store that is can accept any format. A document store like Lucene is a good example, or Google's BigTable or a few others that are out there. This means that you can very easily store tuples in the data store, without requiring fixed schema. Extending teh schema for a customer just means giving them the UI to do so (more on that in a future post).

The ExtraData Column

Since there are a lot of benefits for a database, we really want to try to deal with hat problem in this context as well. The easiest solution, probably, is having ExternalData XML (or NVARCHAR(MAX) column in all the tables, into which we can tack the specific tenant data. This has the advantage of being very simple to implement, but it has a lot of issues with regards to the resulting data model, being able to query on the data, that the user has added (and is presumedly important. Therefor, we need to consider other options.

The ExtraInt1, ExtraInt2, ExtraDate1,ExtraDate2, ExrtraStr1, ExtraStr2 Columns

With this approach you define a few extra columns at the end of each entity's table. And allow the user to map his own concepts to those extra columns. This is really nice from the simple perspective of fixed data model, but is a really bad idea from the perspective o understanding what the hell is going on with the system. Because on tenant used ExtraInt1 to store a language Id and another for storing customer weight, trying to understand what this means is extremely hard. Understanding queries, or optimizing them, building meaningful reports, all are going to be very hard to do.

The Key Value Table

Using this approach, we have an AdditionalValues table, with the following structure (EntityId, ValueTypeId, StrValue, IntValue, DateValue) as well as ValuesDefinitions table (ValueTypeId, EntityType, Name, DataType). Using this, we can "extend" the data model for an entity by simply defining a ValueDefinition and writing the values. It does tend to make things quite a bit more complicated from the querying side, and reporting on top of this mode is hard, to say the least. Beyond that, it is very flexible, and the last solution that keep us with a fixed schema.

Extending the customer table

By this I mean that if we want an additional value, we can simply add the new column to the table, and we are done. It has some complexity with regards to how you intend to manage that (must allow safe changes only, etc), but from the point of view of usage, that is extremely easy. Querying and reporting are a snap, and the only problem looming in the horizon is the next version.

Let us say that a customer has defined the NaturalLanguage extended column, and marked it as int, with values (2,3,6,8), which are how they represent the values in the rest of the organization. The next version of the product also have NaturalLanguage column, and it is a string with the language code. Problem.

The Customer_Extended table

We can deal with this issue by separating the customizations into a separate table, which we will give us a good place to put the new columns, with the same benefits of extending the actual table, but without the versioning issues.

For myself, I tend to favor the last one, extended data model using either an extension table or the actual table. It is the easiest to manage & grasp, and it has major benefits for the simplicity of the system on the long term.

So far we have talked only about the actual data model, not how we can extend this with behaviors. I think that I'll leave that one for a later post, and only say tat the storage options that you have will affect your ability to get a clean model at the application layer.