Composite entities
In my previous post, I discussed some of the problems that you run into when you try to have a single source of truth with regards to an entity definition. The question here is, how do we manage something like a Customer across multiple applications / modules.
For the purpose of discussion, I am going to assume that all of the data is either:
- All sitting in the same physical database (common if we are talking about different modules in the same application).
- Spread across multiple databases with some data being replicate to all databases (common if we are talking about different applications).
We will focus on the customer entity as an example, and we will deal with billing and help desk modules / applications. There are some things that everyone can agree on with regards to the customer. Most often, a customer has a id, which is shared across the entire system, as well as some descriptive details, such as a name.
But even things that you would expect to be easily agreed upon aren’t really that easy. For example, what about contact information? The person handling billing at a customer is usually different than the person that we contact for help desk inquires. And that is the stuff that we are supposed to agree on. We have much bigger problems when we have to deal with things like customer’ payment status vs. outstanding helpdesk calls this month.
The way to resolve this is to forget about trying to shove everything into a single entity. Or, to be rather more exact, we need to forget about trying to thing about the Customer entity as a single physical thing. Instead, we are going to have the following:
There are several things to note here:
- There is no inheritance relationship between the different aspect of a customer.
- We don’t give in and try to put what appears to be shared properties (ContactDetails) in the root Customer. Those details have different meaning for each entity.
There are several ways to handle actually storing this information. If we are using a single database, then we will usually have something like:
The advantage of that is that it makes it very easy to actually look at the entire customer entity for debugging purposes. I say for debugging specifically because for production usage, there really isn’t anything that needs to look at the entire thing, every part of the system only care for its own details.
You can easily load the root customer document and your own customer document whenever you need to.
More to the point, because they are different physical things, that solves a lot of the problems that we had with the shared model.
Versioning is not an issue, if billing needs to make a change, they can just go ahead and change things. They don’t need to talk to anyone, because no one else is touching their data.
Concurrency is not an issue, if you make a concurrent modification to billing and help desk, that is not a problem, they are stored into two different locations. That is actually what you want, since it is perfectly all right for having those concurrent changes.
It free us from having to have everyone’s acceptance on any change for everything except on the root document. But as you can probably guess, the amount of information that we put on the root is minimal, precisely to avoid those sort of situations.
This is how we handle things with a shared database, but what is going on when we have multiple applications, with multiple databases?
As you can expect, we are going to have one database which contains all of the definitions of the root Customer (or other entities), and from there we replicate that information to all of the other databases. Why not have them access two databases? Simple, it makes things so much harder. It is easier to have a single database to access to and have replication take care of that.
What about updates in that scenario? Well, updates to the local part is easy, you just do that, but updates to the root customer details have to be handled differently.
The first thing to ask is whatever there really is any need for any of the modules to actually update the root customer details. I can’t see any reason why you would want to do that (billing shouldn’t update the customer name, for example). But even if you have this, the way to handle that is to have a part of the system that is responsible for the root entities database, and have it do the update, from where it will replicate to all of the other databases.
Comments
These are interesting thoughts, Oren. I agree with most of them, except that I would denormalize the shared information (the customers name and customers no) into every specialized customer. That way, it is much to read the data, because in always every case I will need the customers name in conjunction with all other related data. That means, I would always have to query two tables/documents to get all the information that I need to populate a list, a details page or whatever.
Actually I'm already doing things that way, but we don't allow different parts of the system to update the name and customer on their own. These are informations that change so rarely, that it is ok for us to have a DBA run a script for updating them. However, if this was something more common, we would probably have a webservice on the 'master' to do that.
I'm a tiny bit confused here, is the idea that you would have separate tables in the database for each of thee customer types? So you would have dbo.Customers, dbo.HelpDeskCustomers, etc? Or did I miss something critical, because that sounds like an awful situation with lots of duplicate data.
@Wayne M, there shouldn't be any duplicate data if the tables mimicked the class design above. dbo.HelpDeskCustomers wouldn't have a Name, that would live on dbo.Customers.
Of course if they were in different databases or you wanted to avoid the join you could denormalise it by putting Name on dbo.HelpDeskCustomers and make a trade off between the time it takes to list help desk customers and the relatively few times that you change a customers name.
They don’t need to talk to anyone, because no one else is touching their data. (emphasis added)
That's a very important assumption, which unfortunately doesn't hold true in our environment.
Because companies change their name. Mine did - twice in as many years.
If not billing, then who?
What we have done in the past is abstract the data away behind a service. This service could provide for the root data. It could also provide for data by business function too, albeit owners of the root data likely would not support this as they are unfamiliar with it. Root data could be merged with business specific customer data in the specific business application, and caching strategies could be put in place as the system scales out. Likely the root data changing (like Name) doesn't happen on a regular basis and isn't of utter importance in the business application anyway.
There's also the possibility of implementing entities via mixins (as provided by re-mix). An entity can include mixins in order to integrate functionality defined by generic modules, and use-case specific modules can define mixins that extend the entity with use-case specific code.
Here's a longer explanation: https://www.re-motion.org/blogs/mix/2012/03/16/re-mix-the-mixin-based-composition-pattern/ .
For certain scenarios this sounds great... but I get really annoyed when I update my address with helpdesk and the next bill I get still has the old address... I want them to share that information... and of course with this architecture that is possible but much less likely (unless all devs have a good understanding of the business model which is not always the case). In my real life experience (not as a developer but from a customer/consumer point of view), multiple versions of the truth involve multiple phonecalls, multiple forms... and are a real pain when not done right.
I think one version of the truth is more painful for developer/business but better for the consumer... unless the multiple versions are managed very well (so that the 'real' truth is always reflected accurately)... in which case there is development pain regardless. Unless I am missing something (which is entirely possible ;) I don't see an easy solution here... maybe I am looking at this from too much of an rdbms perspective?
I don't think Oren is saying the billing business unit won't change the name, it's just that any changes to the name will go through a customer service. So there's only one way of making that change, in order to manage change propagation. If you have independent systems, most likely a pub/sub notifying of such a change.
A one way change propagation is relatively easy, it's harder to do a two way.
Rowen - In most businesses, to solve the problem of working independently they work entirely independently. So you have help desk maintaining an address, and billing maintaining an address.
Oren is suggesting here a way of managing that common data in a way that doesn't interfere greatly with development team independence.
That being said, many businesses buy a lot of software like billing and helpdesk off the shelf and integrating two third party products together is damn near impossible.
Isn't this a perfect scenario for CQRS? Billing and Helpdesk require different read models. Shared attributes like Customer Name will be propagated to all read models, but an attribute like OutstandingSupportCalls would only be needed for the Helpdesk read model.
Seems like standard composition over inheritence to me. You can still have a "Generic" customer with the domain associated data as a seperate entity off your customer.
+1 Jay - YES, CQRS! I think as long as you have one model serving two masters (reads, writes) eventually there will be conflicts, trade-offs, and possibly mud.
How would you correlate between the different facets of a customer? We have a similar approach, and we are using RavenDB. The different facets cannot share Id, so AFAICT we need some kind of object/document to correlate between the facets. Or can this be done using a composite id somehow?
@Steve We don't integrate the products, but spend time integrating the data into a warehouse, which then serves as a single source of truth. Applications that own data can write their data to the warehouse, but other applications can only read it.
This is actually a very hard problem to solve for most big organizations that accumulate people, data and assets in a highly fragmented manner. That's why stuff like Master Data Management even exists: http://msdn.microsoft.com/en-us/library/bb190163.aspx#mdm04_topic3
This doesn't sound like a data modeling problem as much as a figuring out your business domain problem. I can see plenty of situations where either a customer is your only point of contact, or helpdesk personnel does need to change billing information.
How did Daniel's comment come through with a date of 01/17/2012 when this was just posted today on 3/16/2012? ;)
@Matt, Look at the bottom of the article, it was published on 3/16, but it was posted on 1/14. It is not uncommon for some people to have early access to posts. This was probably the case there.
Bpd, Sure, let us take this for example. What is a company name? It is pretty common to have a name that is used for things like marketing, and a separate name for legal reasons. It may simple be the difference between "Microsoft" and "Microsoft, INC" and it can be totally different names. Changing the legal name doesn't imply changing the public name, and vice versa.
Rowen, In our company, we have the main offices in one location, and all of the accounting is handled in a separate location. I actually DO need billing address to be different than the helpdesk address. And that is pretty common for many organizations
Thor, Look at the ids that I have there:
customers/1/billing customer/1/help-desk
They share the ROOT id, but they have extensions based on their purpose
Fair point... often billing address will be different. If it makes sense in the domain, then I can see composite entities will help. Having thought about it some more the problems you have to deal with with here (data duplication/replication) are probably much less complex than the alternative. Again some good stuff for me to consider thanks
This is a great example and is the exact issue addressed by DDD bounded contexts (BCs). There is one notion of a customer and the identity value is shared, however the model is different in each BC. The problem with BCs in DDD is that they aren't introduced until midway through the blue book and aren't sufficiently emphasized as a top level module mechanism. Furthermore, the pattern is relevant outside of DDD and applies in more data oriented applications just as well.
I am still a bit confused. I get how the connection is made in the database but how would the access to customer name look in code using an OR/M How would you use helpDestCustomer.Name within the domain object without going through some sort of repository?
Is there more information on this, perhaps showing basic crud on the various entities and how they're stored in the db?
Matthew, This is just basically a convention for the naming of ids, nothing else. There is nothing special about it otherwise.
I'd add that this is probably a textbook example of where Role Objects could be useful (http://martinfowler.com/apsupp/roles.pdf)
Comment preview