Multi Tenancy - The Physical Data Model
Continuing on the multi tenancy theme, let us talk about the actual physical structure of the data store. In general, you have two basic options.
- The TenantId column and separated data stores. The TenantId is simply adding a column to all the columns, showing what tenant the row belongs to.
- The separate data store calls for a much higher level of separation, where there is no shared storage for the tenants.
There are variation on those, such as having separation in the schema level instead of DB level, etc, but they all end up being down to one of those choices.
The TenantId approach allows you to use a single DB for the entire application, potentially reducing costs for managing the environment. It is also the easiest to explain, we just filter by TenantId and that is it!
It comes with a whole set of problems that is related to that decision, however:
- Isolation - You have to actively do filtering all the time. This has major issues on your code. And now, tacking TenantId = @CurrentTenant at each query is not going to be sufficient. Consider the case of joins in such an environment, you are likely to get a filter only on the end result, after it was joined already. And that is just one example.
- Backup - The backup strategy that you have for this scenario sucks. Basically, you have to roll your own.
- Customer copy of the data - Going back to the backup strategy? You can't just hand the customer the backup. You have to dump the data out, and write something that writes it back in, and it has to be fast.
- Security - you have to let all tenants access the data, security is in the app logic level.
- Extensibility - in a shared environment it is hard to do a lot of extensibility in the data, because each client is stepping on each other toes.
As you probably gathered, I strongly suggest going with the separate model. Now our management strategy is dealing with the connection string. Isolation is built in and doesn't require any , for backup you have the vendor tools, and giving the customer its own data is trivial, just send him the backup file. Security is easy, a tenant can access only their DB, and extending this is much easier, because each tenant has their own walled garden to play with.
Comments
Amen to separate databases. If I were to put data for multiple tenants in the same db some sort of data encryption with separate key per tenant would be mandatory. Imagine an evil bug exposing one tenants data for another tenant... Nightmare. Wonder if people do encrypt per tenant when they do this?
The separate model works fine for a limited number of tenants, but not for a large scale web app with thousands of tenants. If your app has the potential to grow in this dimension, I suggest that you implement the TenantId approach from the beginning. Even with it you can still have separate databases if needed.
If you use LINQ to SQL in your data layer you can easily filter by TenantId on selected entities and database queries - and to switch it off if needed:
IQueryable<T> AddTenantFilter<T>(IQueryable<T> query) where T:IHasTenantId
{
if (CurrentTenantId != Guid.Empty)
{
}
return query;
}
Olivier,
I am sorry, but I truly don't see the scaling issue that you point to.
Thousands of tenants gets thousands of DBs, so ?
Separate database pro-s always beat the possible cons. Actually, I don't event see any cons except true requirement of upgrade automation and that is actually a good thing in itself.
Yep, it's all true.
But then you need to ask yourself if you will be forever content with plain multi tenancy or will your customers/managers demand to have more.
Because, you know the next logical step will be information sharing (information is power).
In our case the next step was getting requirements, for:
central management,
data sharing between parent/manager to child "tenants",
integrated reports for the SaaS itself.
So, I have to say I'm pretty glad we didn't take separate DB approach.
Eyal,
There is nothing here that prevents that
Prevent - No.
But it goes back to the matter of ease of implementation and management you've mentioned before.
Merging data from different DBs is just that much harder to implement and it's harder to reuse the code you wrote to handle the single tenant case. (granted there's security considerations - whether that's even a good idea to reuse in this case).
If we take the agile approach here then:
Unless you're writing a banking app or something similarly sensitive where security and isolation is a fundamental requirement, then taking the filtering approach will help you deliver your app faster.
The other issues you have raised, while you are bound to bump into them - may be lower on the priority list then you initially think.
For example - while most customer would not want to compromise data security, backup and customer copy may be low on the priority list.
seperate databases are the way to go... no question there.
i think the real question lies with the company you are working for. sure, i can sit in a planning meeting and state that seperate databases are how they should proceed but when they look at their purse strings and potential increase in staff... the mighty dollar has a way of trumping all
my biggest hurdle with seperate databases has always been with provisioning a new customer and getting them "on line" quickly enough. budgeting for a new server because of potential sales in the pipeline is a hard sale to the finance folks
there is also the time and cost of writing a provisioning application for your sales, legal, or support users to use. in a shared database approach... add a new row in your Tenant table, get the TenantId, and move the customer along
if you are working for a well funded company... this conversation becomes a moot point. if your not and you need to show ROI... you start to bend the rules and propose a shared databases model with the understanding that seperate databases are needed later down the line.
my two cents... for what it's worth (not much but...)
Shared Database is better for systems like www.ning.com (Create Your Own Social Network). If the app solves more dynamic business rules then one DB per tenant should be appropriate.
Eyal,
Eyal,
I completely disagree that the filtering approach is faster to develop.
Furthermore, I don't think that you understand the mindset of the customer.
A multi tenant application contains highly critical business data. Plans for marketing campaign may not sound like a highly sensitive item, but they are critical in the eyes of the business. The option of exposing one customer's data to the other is simply not acceptable.
Meisinger,
You are going to need to deal with this anyway, at some point.
This just forces you to deal with it upfront.
Provisioning a new customer is easy.
CREATE DATABASE Customer_14412
// execute scripts
// done!
Oren,
I've worked in both data models, and I must say that I prefer the filter model better.
We currently have thousands of tenants in a large web-farm with millions of users... the filter by TenantId model works beautifully for us. The TenantID is a required field in all Data Access layer methods, every stored procedure has it as a required field, and every table has the TenantID indexed.
We play some games with UDFs to ensure that the TenantID is filtered against first, Add to that an effective dated data model, and we have a significant data access layer that manages the database.
Maintenance is a snap, all tenants get new features made available to them at the same time, and they can be turned on and off through a TenantFeatures table. In the multi-database model, maintenance of the multiple databases without a very skilled DBA could become a nightmare quickly. If some customers want new versions of the software that others are not interested in, you are then maintaining multiple versions of your data layer... Not pretty....
Having seen all of the possible combinations I have to say that this is one of the situations where everyone is right: When you don't need multi-tenancy you benefit from not having it and suffer from having it. When you do need multi-tenancy, you suffer from not having it and benefit from having it.
The trick is to make a good enough call for your situation when things are just getting started, and to at least leave the door open (if not necessarily planning much in advance) to add or remove multi-tenancy in the future if things change.
Jeff, you make the all-customers-get-update-in-a-snap a strong positive for a single database model and bring out the negative side of multiple databases and multiple data layers. I wonder how you achieve latter with a single database... I.e. if you have multiple databases, no way you can still have configuration per client? :)
I think one sees the negatives of multiple databases if one hasn't used to think of the requirement of "productizing" in one's solution. That mostly means automation and rules to follow. I have dealt with similar kind of questions in a product we have (old thing, also uses SP-s) and passing common identifier all the time was a true nightmare. Data model was rather complex so procedures did the weirdest joins to have no data inconsistencies and if got really messy in general. In the end we just moved to multiple database model and it has worked like a charm. No kung-fu DBA-s required if proper patching/versioning system was in place.
Jeff ,
re: multi versions - that is very common scenario in big systems.
re: management - you need to automate the management of the DB, not a particularily hard task, mind you
Oren,
I have repeatedly said that security issue is the main consideration that will appear for almost any system you build and that this is the main point in favor of the separate DBs model.
However, there are ways to mitigate that threat as suggested by others in this thread - so I don't think that's a killer argument.
But, perhaps I'm missing a technical point here, so let's analyze - granted I could automate the creation of a DB per customer. It's slightly harder to maintain and will require to either give a higher set of permissions to the application (a bit of a security issue in itself). Or I could create an external "application" that will be run by the IT staff for every new customer (a maintenance issue). Let's say that up to this point the separate DBs schema is still better because security issues trump my arguments so far.
But, lets say I have several 1000s tenants in my system. How am I going to create a single report/view that can show/manipulate the relevant data for all related tenants?
Not to mention if I have a requirement to have a "managerial" view for every screen in my system that will show "child" tenants to the manager.
Am I missing an obvious solution here? Anything I could think of would be problematic in terms of performance and code maintainability.
In most multi tenant apps, there is no such concept of cross tenant information sharing.
If you do need this, and especially if you need this for reporting, an ETL process for a central DB can provide the information easily.
For sub tenants, that is not an issue that I have run into in the apps that I built, in most cases, tenant apps also container organization hierarchy, which means that a single tenant is also divided into departments, etc.
That is usually the isolation level inside a tenant.
re: DB maintenance.
You usually have a master app that manages that, with the web app being able to request provisioning on new resources from the master app. There are rarely humans involved in the common case.
Note that I think that isolation is what you get from this, security is part of it, but isolation also include a lot more freedom to do per tenant actions
@Eyal
I can see the sense in this. If you do want sharing of data (parent/child being a good example) then is TenantId approach not a good one? And I don't mean reporting here, I mean that there is true information sharing in configuration and in the domain.
Or maybe use a combination of it and the multiple DBs, so that if you need information sharing (e.g. because the tenants are grouped or have parent/child relationships) then they all live in one DB isolated from all other tenants.
@Ayende
Sounds of it you are handling this using an organization hierarchy which seems sensible. However I'm thinking that in some cases these could be considered subtenants with true information sharing? In such a case do you not need to pass around the "OrganizationId" or the "TenantId" anyway (or do in a more OO fashion ofcourse! :))?
Comment preview