Ayende @ Rahien

Refunds available at head office

Application databases and external integration points

Dave has an interesting requirements in his project:

We're not in control of where the data is located, how it's stored and in what configuration. In most cases employees need to be retrieved from a Active Directory (There's is no 'login', the Window Identity determines what a user can or can't do). Customer contacts are usually handled by the helpdesk department and each contact moment is logged in a helpdesk database. The customer (account information) itself often needs to be retrieved from an IBM DB2 database.

What you have is not one application that needs to access different data sources. That would be the wrong way to think about this, because it introduce a whole lot of complexity into the application.

image

It is much better to structure the application as an independent application with each integration point made explicit. Instead of touch the DB/2 database, you put a service on it and access that.

image

This isn’t just “oh, SOA nonsense again”, it is an important distinction. When you tie yourself directly to so many external integration points, you are also ensuring that whenever there is a change in one of them, you are going to be impacted. When you put a service boundary between you and the integration point (even if you have to build the service), the affect is much less noticeable.

Also, did you notice the blue lines going from the databases? Those are background ETL processes, replicating data to/from the databases. It allows us to handle situations where the integration points are not available.

In short, design you application so it doesn’t stick its nose into other people’s databases. If you need data from another database, put a service there, or replicate it. You’ll thank me when you app stays up.

Comments

Antonio Carlos Zegunis Filho
08/20/2010 11:47 AM by
Antonio Carlos Zegunis Filho

I fully agree with the idea of building services to expose data to whoever needs it in order to provide a protective abstraction.

However, I´m not comfortable at all with your suggestion to replicate the databases if services are not possible. I do some consultancy services for a company that thought that this (replication) would be the best idea.

What´s the problem? Almost each database has 4 or 5 replication in different servers which means that is impossible now to change the structure of any of the source databases.

The problem get´s worse when someone replicate a database and add more objects to it. As we go, things just get more complicated and messier.

Services are definitely the way to go!

scooletz
08/20/2010 12:57 PM by
scooletz

I'd add, that these ETL processes can be easily created with a pub/sub solution, like NServiceBus.

Louis Haußknecht
08/20/2010 01:12 PM by
Louis Haußknecht

@scooletz How would you use pub/sub for ETL??

For this to work, the database has to publish messages.

I think a (periodic) ETL e.g. with Rhino-ETL from the read-only sources would fit best.

scooletz
08/20/2010 07:48 PM by
scooletz

@Louis Haußknecht

What I meant was creating workers reading db and publishing data. Rhino-ETL, even because of its name, should do this work ;-)

Simone
08/22/2010 08:51 PM by
Simone

As usual, I think it depends a lot on context. I specifically disagree with the statement "You’ll thank me when you app stays up". If the external sources on which you are relying are critical for your company, having your application down is the last of your problems when the data source goes down.

Also, what do you mean exactly with putting a service between? If the data source changes, something has to change nonetheless. If you're talking about deployment boundaries then yes, I agree.

Ayende Rahien
08/23/2010 09:48 AM by
Ayende Rahien

@Simone,

Actually, having a way to stop propagating failures is crucial for systems to stay alive. Otherwise, a failure in system A can bring down system B, which bring down system C,D, Etc.

As for what happen when the data source changes, that is a much smaller problem, because the service remains the same. Usually the service is maintained by the same team that maintain the data source.

But even if you maintain it, you have to work on a much smaller scope.

Ayende Rahien
08/23/2010 10:15 AM by
Ayende Rahien

Antonio,

I agree that replication is bad in the sense that it ties the database schema, but sometimes it is the best you can do.

If the database owner won't give you a service on that, and insist on you talking to the DB directly, there isn't much you can do. And you are already coupled to the database, anyway.

David
08/24/2010 10:24 AM by
David

Ok, I understand the above, and agree with this post.

my question which sits on top is where do you put your ESB. would you have the following:

For Request/Reply(get employee holiday) this would be a WCF direct connection, between the App and Employee Service

For pub/sub (added new customer), use ESB to publish this infomation/event, then the App could subsribe to this message.

Also would you need the ETL? as if somehting did change that would be published. and the interested systems would update their systems. Or is the ETL contextual to this issue where you may not have a service?

Kyle
08/25/2010 01:09 AM by
Kyle

So what technology would you use to talk from the App Server to the HelpDesk Service? WCF?

Ayende Rahien
08/25/2010 03:37 AM by
Ayende Rahien

@David & @Kyle,

Don't assume WCF / Request Response between the two.

Another option would be to hold local copy of the data with replication.

Also known as caching :-)

ETL is just a case of getting data from one point to another. I would rather have an ETL process run every X time than make remote calls.

Comments have been closed on this topic.