Multi Tenancy - Extensible Data Model
Continuing on the multi tenancy theme. We have the following problem, I now want to focus on the data model, and how you can create an extensible data model.
In particular, I am thinking about the ability to define something like a base line Customer entity, and allow a tenant to extend that with new fields and behavior. Note that we are talking about both data and behavior here, not on just extending the data. But let us try to take a look at that as well, just to see what solutions we can come up with.
The scenario that we have is that the tenant would like to define a new set of properties on the customer, such as non smoking preferences and the mother tongue. Let us see how we can deal with it. We will start by focusing solely on the underlying data model, without considering the actual entities.
Non RDBMS store
The main problem of storing additional properties in an entity is that we usually map that to an DB, and those really would like to know, upfront, about all the columns and what is the type of each and every column, Going with a non RDBMS store frees us from this limitation, because we can choose a back end store that is can accept any format. A document store like Lucene is a good example, or Google's BigTable or a few others that are out there. This means that you can very easily store tuples in the data store, without requiring fixed schema. Extending teh schema for a customer just means giving them the UI to do so (more on that in a future post).
The ExtraData Column
Since there are a lot of benefits for a database, we really want to try to deal with hat problem in this context as well. The easiest solution, probably, is having ExternalData XML (or NVARCHAR(MAX) column in all the tables, into which we can tack the specific tenant data. This has the advantage of being very simple to implement, but it has a lot of issues with regards to the resulting data model, being able to query on the data, that the user has added (and is presumedly important. Therefor, we need to consider other options.
The ExtraInt1, ExtraInt2, ExtraDate1,ExtraDate2, ExrtraStr1, ExtraStr2 Columns
With this approach you define a few extra columns at the end of each entity's table. And allow the user to map his own concepts to those extra columns. This is really nice from the simple perspective of fixed data model, but is a really bad idea from the perspective o understanding what the hell is going on with the system. Because on tenant used ExtraInt1 to store a language Id and another for storing customer weight, trying to understand what this means is extremely hard. Understanding queries, or optimizing them, building meaningful reports, all are going to be very hard to do.
The Key Value Table
Using this approach, we have an AdditionalValues table, with the following structure (EntityId, ValueTypeId, StrValue, IntValue, DateValue) as well as ValuesDefinitions table (ValueTypeId, EntityType, Name, DataType). Using this, we can "extend" the data model for an entity by simply defining a ValueDefinition and writing the values. It does tend to make things quite a bit more complicated from the querying side, and reporting on top of this mode is hard, to say the least. Beyond that, it is very flexible, and the last solution that keep us with a fixed schema.
Extending the customer table
By this I mean that if we want an additional value, we can simply add the new column to the table, and we are done. It has some complexity with regards to how you intend to manage that (must allow safe changes only, etc), but from the point of view of usage, that is extremely easy. Querying and reporting are a snap, and the only problem looming in the horizon is the next version.
Let us say that a customer has defined the NaturalLanguage extended column, and marked it as int, with values (2,3,6,8), which are how they represent the values in the rest of the organization. The next version of the product also have NaturalLanguage column, and it is a string with the language code. Problem.
The Customer_Extended table
We can deal with this issue by separating the customizations into a separate table, which we will give us a good place to put the new columns, with the same benefits of extending the actual table, but without the versioning issues.
For myself, I tend to favor the last one, extended data model using either an extension table or the actual table. It is the easiest to manage & grasp, and it has major benefits for the simplicity of the system on the long term.
So far we have talked only about the actual data model, not how we can extend this with behaviors. I think that I'll leave that one for a later post, and only say tat the storage options that you have will affect your ability to get a clean model at the application layer.
Comments
The best solution I've seen (and one that's lasted more than 15 years now) is the "Extending the customer table" approach, but prefixing customer columns with something (i.e. "x_NaturalLanguage") to avoid collisions.
Great series!
Are you doing a code review or consulting on a multi-tenant app?
Another option is table prefix, similar to multiple databases. We have a table prefix based app and would never do that again.
You could also consider a hybrid of Customer.CustomColumn1 and Customer_Extended.
Keep it up can't wait to read more of your insights.
Are you suggesting extended table where the end user is actually creating custom fields (which is usually the use case for this feature)? It seems weird to give an application the power to modify the database definition on the fly.
These articles are really interesting to read. The shop I'm working with at the moment is designing a kind of multi-tenant CMS system.
The proposed design is to use a RDBMS for storing data. But, rather than have strongly typed entities mapped to tables, we'll have tables like EntityType, Entity, Relationship etc. The properties of the entity are stored in an xml definition in the EntityType table, and then each Entity will also have a field containing XML that contains the actual values. Queries will involve XML and XPath and possibly Lucene.
The perceived benefits are that we can add new types without compilation, and we the editing GUI can be designed to automatically allow for new types and changes to types without coding (instead, a kind of editor GUI mapping file will be created for each entity.
Personally, I think this is a kind of smart design, but it throws me so far out of comfort zone that I'm having a really hard time accepting it :)
Nice post - I'd definitely go with the last option - extending the customer table with a new table with a 1:1 mapping. This creates an inheritance hierarchy which makes sense to me - I'd map those customers to a derived class (e.g. ExtendedCustomer) where all the new properties and behaviour goes (probably putting the derived class in another assembly).
Honestly, I only glazed over the first in this series and skipped the second. I wasn't sure where you were going and didn't think it applied. But this one hit the "I've been there" button.
I'm working with an app that has a rather complex (due to necessity) data structure. It supports user and 3rd party customizations down to the database level We have a well normalized schema, and allow what we call dynamic tables to extend our entities. For instance, our Contact entity is defined by data from a contact table and additional info in tables like Address, and Phones. The tables that define or extend the entity are tracked by an entity definition table that contains information regarding the system and extended tables that belong to an entity.
I have an add-on that we develop for the product which adds two additional fields to the contact entity, which are stored in its own table. We maintain a contact view that rebuilds as needed based on information in the entity definition table.
Our model classes are mutable types that are partly defined by mutable field descriptors.
In our case, the app or addon does modify the database schema on the fly, but it is abstracted to help keep accessing the database straight forward. Maybe I'm misunderstanding, but it sounds like what you describe as the extended table approach. it works, and the performance in the database and framework is pretty decent/good.
Chad,
This is just a variation on the extended table, as far as I can see.
Mike D,
For some reason I got a lot of contacts for multi tenancy stuff lately, and it is an interesting subject.
Eric ,
Yes and no.
The "end user" in this case is the tenant administrator, not just any random joe user.
Tobin,
I would strongly recommend against this approach. It has horrible performance issues when you try to scale it, the complexity is huge, joining is not possible, etc.
If you want unstructure data, don't use a DB
I think that sort of customization (custom extension for a particular tenant) occurs a lot less frequent than the end user (the customer) adding fields to their entities. Something like Salesforce.com comes to mind as an example where this is done all over the place.
Eric,
Even with salsforce, it is the tenant admin that does that.
I as a standard user do not do that
Thanks, from where I sit it's nice to see a dev as smart as yourself arriving at the same conclusions...
Keep this topic going if you have the time, hopefully we can hire you for a code review.
Tobin, I agree with Ayende. I've been down the xml in the database several times. It's mostly not worked out. I have a specific case that I'm using it for now, but it's very limited and has nothing to do with extensibility.
Ok, so now that we are on the same page on terminalogy =). My question is do you think it is a good idea to have an action by a customer (be it an admin or just a regular users) do something that ends up executing a DDL statement? I'm not saying that this is a bad thing, but I would approach it with some skepticism.
Of the options you listed "Customer_Extended" is the one I've seen work best over the years. Of the options you haven't listed, going fully table-generational on a per-tenant-per-custom-field basis has tended to work even better still, though occasionally at some cost due to the overhead of larger numbers of joins. That said, the join overhead has almost always been offset by better data locality, lower data page count, options for more effective indexing (and lower re-indexing overhead in high insert scenarios), etc.
@Ayende / @josh
Thanks for the feedback.
As you say, this model won't perform well. That was anticipated, and the solution is to export data to an optimized read-only runtime db to get realistic live web-site performance.
It's the complexity that I'm most uncomfortable with. Delivering understandable & maintainable code is hard enough (for me at least!). I fear attempting to do it on top of a meta-layer will just makes it harder. The meta layer does add value, but it comes at a cost.
Also, the exported live database will need to be mapped into c# objects to give us any kind of maintainable & understandable way of using the CMS data in web pages. So we need to write a mapping layer to generate the live db from CMS data, then generate/write objects, and then generate a mapping layer to get them out of the live db. Then we need to find a home for the small(ish) amount of business logic in the apps.
I've been questioning if the oober flexible CMS back-end is worth the complexity of having all this extra work, but I'm not communicating well enough and haven't managed to collect and express my arguments successfully. I'll keep at it...
Thanks again
P.S -On the subject of unstructured data, I quite like the direction these guys are taking:
http://strokedb.com
WARNING: That link may result in you seeing Ruby code :)
Jeremy,
Can you explain what this model is?
Apologies if I could have been clearer. The approach I mentioned involves programmatically generating a new table for each custom field, rather than programmatically issuing alter table statements against a single Customer_Extended (or what have you) table. It takes a bit more metadata tracking, because you need to keep mapping information around so that each custom field can be mapped to its generated table.
That said, it can be a net win in cases where row counts are high and alter table statements would cause too much work in the db, and/or if the custom fields are often null but your database server's packing of unused columns in rows sucks (Oracle handles this far better than SQL Server, for example), and/or if you want to spread custom field tables (and/or logs and/or indices) around different spindles, etc.
This model can be extended into multi-tenant scenarios quite nicely, and in fact gains a lot of leverage there especially you weren't planning on programmatically building and altering per-tenant Customer_Extended tables.
Eric,
Yes, I think it is alright to do this, yes.
@Tobin,
Huh? Why are you going with all this complexity?
The flexibility you need is achievable, but the way you are going about is going to end in tears, from everything that I have seen
Jeremy ,
I understand what you are saying, but the cost of joins would be prohibitive to me. Adding ten columns == ten joins seems to be very costly
@Ayende
I agree!
I meant to say I'm failing to communicate this to the rest of the team I'm working with.
Again, it depends on the situation. I've worked with databases where the extra joins were feared as overhead but ended up being the optimal performing/scaling/maintaining/etc. option for a variety of reasons. This would appear to be yet another of those cases where everyone is right. ;)
Jeremy, how does this solution differs from having partitioned ATTRIBUTE table with [attribute_id, attribute_type_id, owner_entity_id, value] ?
In most cases I saw, people tend to choose "Extending the customer table" (with all new columns either allowing NULL or having proper default or in more advances scenarios getting default values from certain queries ran once when creating the additional column), changing from INT to STRING language code would be a migration plan that feels "normal".
Typically the migration would look like: creating another column with temp name, getting a query to set the new column values based on the corresponding old column values, and finally deleting the old column and renaming the new column to the old name if necessary. Clearly a down time thing, people sometimes avoiding by allowing the TWO columns to live probably forever, or until next version time (with acceptable idea of downtime) finally comes.
I've used the "Key Value" option for a multi tenant e-commerce system (separate databases btw...). We have a CustomAttributeDefinition class which contains metadata about properties that can be set up against a particular type of entity. CustomAttributes are the actual realisation of the property for an entity.
I would emphasise the importance of having a rich metadata layer for defining behaviour, rules and constraints for these custom properties.
CustomAttributeDefinitions are entities in their own right. Subclasses of CustomAttributeDefinition define behaviour for a range of property types and contain things like descriptions and tooltips displayed to users when editing the values, constraints on the values that can be entered, ranges of values that can be selected etc.
This supports things like:
Values (String, Boolean, Int, DateTime etc)
Lists of values (as above)
MultiChoice (select from range of options (defined in CustomAttributeDefinition), with rules for min / max number of selections
Measurement (amount / unit type)
Dimension (width, height, depth, unit type)
A parallel hierarchy of CustomAttributeEditControls makes the editing experience within the UI a lot more pleasant, e.g. checkboxes for booleans (no more "Y/N, hmm or is it 1/0" in a text box), validation according to datatype etc.
For custom fields, validation is also a consideration.
I render my custom fields onto the web page dependant on their defintion.
The best I have come up with for validation is to allow a regular expression to be associated with the field.
Actually changing the ddl sounds like a bad plan.
Any comment on whether new SQL Server 2008 XML features changes anyone's mind relative to using XML for data extensibility? I still see it as reasonable approach compared to the complexity inherent in most of the alternatives.
Joseph,
I am not keeping up, what is great with XML in 2008?
I don't know about 'great' as it's not an approach devoid of complexity itself, but there are enhancements to SQL Server's XML capabilities - the summary from the white paper on the enhancements:
Microsoft® SQL Server™ 2008 builds on the extensive support for XML by extending support for XML schema validation and XQuery, and by enhancing the xml data type
-XML Schema Validation Enhancements
-Lax Validation Support
-Full xs:datetime Support
-Union and List Types
-XQuery Enhancements
-XML DML Enhancements
http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new-xml.aspx
I would say these enhancements make taking this approach a bit more reasonable proposition - if you're up for an XML-centric soloution to begin with. That said, I'm certainly of the mind data extensibility is a complex problem space with no simple solutions. All-in-all it almost seems more a matter of shuffling the complexity around to a form which is comfortable for any given developer rather than finding a solution which actually reduces it.
Would be interesting to hear the thoughts of someone in the MS Flight Simulator / Game crews on the subject as I suspect they deal with this on a fairly regular basis. Anyone have contacts might want to consider asking them to chime in...
Why do you think they need extensible data support in those games?
From http://www.fsinsider.com/developers/Pages/default.aspx
"As Extensible As It Gets"
The success of the Flight Simulator franchise is due in large part to the efforts of our third-party development community. Those efforts are made possible by our commitment to "extensibility."
The main idea behind this concept is to provide customers with the power to customize their experiences. Over our 25-year history the names and faces on our team have changed, but our dedication to this concept has remained.
I'm not from that world, but it makes sense to me that extensibility would be a key element of keeping games interesting to gamers and game platforms interesting to game developers and third parites...
Comment preview