Ask AyendeRepository for abstracting multiple data sources?

time to read 3 min | 421 words

With regards to my recommendation to not use repositories, Remyvd asks:

… if you have several kind of data sources in different technologies, then it would be nice if you have one kind of interface. Also when an object (like Customer) is combined from data out of different data sources, the repository is for me a good place to initialize the object and return it. How would you solve this cases?

My answer is: System.ArgumentException: Question assumes invalid state.

More fully, this is one of those times where, in order to actually answer the question, we have to correct the question. Why do I say that?

Well, the question makes the assumption that actually combining the customer entity out of different data stores is desirable. Having made that assumption, it proceed to see what is the best way to do that. I am not going to recommend a way to do that, because the underlying assumption is wrong.

If your Customer information is stored in multiple data stores, you have to ask yourself, is it actually the same thing in all places? For example, we may have Customer entity in our main database, Customer Billing History in the billing database, Customer credit report accessible over a web service, etc. Note what happens when we start actually drilling down into the entity design. It suddenly becomes clear that that information is in different data stores for a reason.

Those aren’t the druids you are looking for might be a good quote here. The fact that the information is split usually means that there is a reason for that. The information is handled differently, usually by different teams and applications, it deals with different aspects of the entity, etc.

Trying to abstract that away behind a repository layer loses that very important distinction. It also forces us to do a lot of additional work, because we have to load the customer entity from all of the different data stores every time we need it. Even if most of the data that we need is not relevant for the operation at hand.

If would be much easier, simpler and maintainable to actually expose the idea of the multiple data stores to the application at large. You don’t end up with a leaky abstraction and it is easy to see when and how you actually need to combine the different data stores, and what the implications of that are for the specific scenarios that requires it.

More posts in "Ask Ayende" series:

  1. (28 Feb 2012) Aggregates and repositories
  2. (31 Jan 2012) What about the QA env?
  3. (25 Jan 2012) Handling filtering
  4. (19 Jan 2012) Life without repositories, are they worth living?
  5. (17 Jan 2012) Repository for abstracting multiple data sources?