SOA Data Access
Yesterday I posted about designing an SOA system. I identified the following service:
Available candidates - given a set of requirements, will attempt to find all verified candidates that match the requirements. Additional responsibilities include logging search history (important for noticing what requirements are common and desirable), and recording which requirements has no matching physicians in the system (should be passed for human inspection, to decide if head hunting should begin, or this should be ignored).
BIll Poole has left a comment regarding this post, which included this statement:
For example, the "Available Candidates" service appears highly data centric, likely exposing search/data retrieval operations. CRUD interfaces are bad.
This is something that I have often heard, but I still don't understand. I understand why CRUD interfaces are discourage, you get better results by just using a DB. But in this case?
How do you get the data out of an SOA system? It seems to me like this is a natural requirement for the problem at hand, but I feel that I am missing something.
Comments
No, Oren, you're not missing something. CRUD operations are not bad in any outright sense.
If, as Bill might have implied, we were to ban all CRUD interface (members) we would either have just renamed them to something that doesn't sound like CRUD or we'd have a gaping hole in our applications through which a high percentage of user stories would fall to their deaths. That's just how dependent on CRUD our applications generally are. Here, I'll say it explicitly: most applications are mostly CRUD operations. ;)
Now, having said that, exposing raw CRUD when there is a higher-level domain-centric operation that can be captured and exposed, now that is bad. Raw CRUD where raw CRUD is most appropriate is, well, most appropriate.
We need to distinguish here between a business service and an application. A business service is an autonomous coarse grained unit of logic that performs a specific business function and communicates with other services by way of exchanging explicitly defined messages.
An application is a piece of software leveraged by a user (via a user interface) to perform a task of value to the user. A service may contain zero or more applications - zero if the service is fully automated. An application may span many services (in the case of a composite application).
The user interface tier of an application will very likely fire CRUD messages at the application tier - but the communication between the user interface tier and application server is an implementation detail of a business service. Both the UI and application server sit behind the service boundary.
You can read more about defining service boundaries here:
http://bill-poole.blogspot.com/2008/02/defining-service-boundary.html
What we want to avoid is CRUD interfaces exposed at the public service boundary. This is achievable by decentralising your data using publish-subscribe, which you can read more about here:
http://bill-poole.blogspot.com/2008/02/centralised-vs-decentralised-data.html.
I describe the specific reasons why we want to avoid CRUD interfaces at the service boundary here:
http://bill-poole.blogspot.com/2008/02/crud-is-bad.html
Sorry, I forgot to answer your actual question as to how one gets data out of a service. The answer to that is we don't want to. All business logic performed on data held by a service must be performed by the service itself, otherwise we break encapsulation.
We do not retrieve data from a service, perform some logic elsewhere and then fire an update back. If our service contract permits this, then we will have the potential for business logic to leak outside the service boundary.
When we have our data decentralised, every service holds all the data it needs to perform its function locally - so there is no longer any need to go and pick up data from elsewhere.
You don't define a service boundary/api based on the CRUD it handles, you define it's boundary based on meaningful, business operations..
It's an extension of the logic from the "Tell, Don't Ask" principle. It greatly reduces coupling, helps with versioning, and prevents a lot of transaction-related messes from occuring..
A very poor CRUD service:
OrderDto GetOrder(xx);
void UpdateOrder(OrderDto);
A much better service (business operations):
void AddLineItems(int orderId, LineItemInfo[] lineitems);
void Settle(PaymentInfoDto dto);
void ConvertCartToOrder(long cartId);
You can think of it this way, the service boundary is much more similar to that of a Service Layer API, not a CRUD-based DAL api. Those types of boundaries require a transaction to flow across them as well as being forced to have a tight conceptual coupling to the consuming layers.
Build Business Facades--not Data DALs.
And make sure to decompose for change--not functionally. Functional decomposition is the antithesis of good design.
And, for the record, the methods I threw out above aren't exactly my preferred way of doing things. They are very statically typed ish things. I much prefer a layer of indirection where their is no method/type level coupling. Each service is only coupled to the actual message being sent, not the type and method that processes the message (a layer of indirection above what WCF exposes).
Evan,
I need to get the data out of a service, how that is done?
To me, searching the available candidates is a business scenario. What I hear is that because I am returning data, this is a bad idea
Evan is correct. It is an extension of the "Tell, Don't Ask" principal. In fact with SOA, it should be more like "Inform, Don't Ask", because command messages introduce coupling between services:
http://bill-poole.blogspot.com/2008/04/avoid-command-messages.html
Anyway, when you have a decentralised data model among your services, all the services that need the available candidates data will have it in their respective local databases. They just search it locally based on their own local schema / domain model based on their own local specific needs / requirements.
They do not need to go to another service to retrieve any data. Thus, there is no need for the CRUD interface.
OK, so I have an application that lets a user edit a schedule of calendar items. How do I display the calendar items? I have to read them from somewhere. According to everyone here, "GetCalendarItems" is a bad practice. So what am I supposed to do to please the SOA purists?
@Bill Pole
In nother words Order Processing system should have all the inventory data from Inventory System (processing thousands of inventory update events) just to be able to verify quantity on hand?
El Guapo - Where you have a client application interfacing with some back end system, "GetCalendarItems" is absolutely acceptable practice as the application and the back end both sit behind the service boundary. We just don't want CRUD interfaces exposed at the service boundary to be consumed by other services.
Alex Simkin - This is a good question. And to this one, the answer is "it depends". With an inventory system, we may not be able to tolerate delays between publishing event messages and their receipt by the order processing service because we run the risk of overselling (getting negative inventory).
In this circumstance, we may opt for the order processing service to publish an "Order Completed" event, to which the inventory service (and probably billing service) is subscribed. It would then attempt to allocate all the stock necessary to fill the order, after which it would publish a "Stock Allocated" event, to which the order processing service (and probably shipping service) would be subscribed.
If there was insufficient stock, depending on the business rules the system may attempt to allocate whatever stock is available and publish the event anyway. The order processing service would then know that only some items had been allocated.
Later, when new inventory arrived, the inventory service could then allocate to any existing outstanding orders and publish new "Stock Allocated" events. The order processing service and shipping service would then receive this notification and finalise the outstanding orders.
This is just one way in which we may go about this. The actual solution depends on the actual business requirements. The point is that we were able to achieve the desired outcome without the use of CRUD interfaces.
@Bill Poole
Ok. So what would be the scenario if Inventory System doesn't know how to allocate stock to the order - the allocation algorithm is external to the system. Example: we need to send DVD's as give-aways to some event. We need 10% cartoons, 10% chick-flicks, etc. Inventory system has no notion of genre.
@El Guapo: Well, you would split it up into two events. The UI would fire a NeedCalenderEvent, and the service subscribing to that event would reply with a CalendarReadEvent. Then everytime the calendar is updated, CalenderItemAdded/Removed etc would be generated and the UI would respond accordingly.
Sounds great right? Yeah, until you have more than one service responding to I Need a Calendar. I can tell you from first hand experience that this is a BAD model, especially when dealing with inventory, as Alex mentions.
The company I work for uses an INFOR product (written in providex) for warehouse management. It consists of a series of modules, such as accounts payable, accounts receivable, manufacturing control, inventory, sales, etc. Within each module there is a seperate program for each function, i.e. AP has entering a bill, writing a check against a bil, writing a check with no bill, voiding a check, and so on.
The programs have the same philosophy as bill suggested: each program maintains its own localized data silo. The item/warehouse inquiry only needs to know about the stocking unit of measure, whereas sales orders need to know about all of them. The data is stored in ProvideX, but there is an ODBC driver you can put on top of it.
The result is an ungodly mess. Data is quintiplicated all over the place. I would like to be able to read/write to these tables directly using NHIibernate, but since things like 'quantity on sales order' are in 5 or 6 different places it would be impossible to be sure I got them all. Even the program gets some data out of sync withitself.
As for long term cost, the software vendors charge a lot of money to just do a .1 upgrade, because someone has to spend five hours writing a script that updates all the data files to conform to the new data spec.
I think a better approach is a hybrid system. The services should not be coupled to one another, but they should all be coupled to the domain model. Say the tax location you are in says that unlike before, all shipping and handling is subject to sales tax. Is it easier to change the business logic in many different services, or to change the total calculation within your one domain object?
Such a system is going to require some sort of CRUD architecture. I can see the problem of coupling your service to a CRUD where there is a seperate method for every kind of query / domain object, such as GetCustomers(), InsertCustomer, UpdateCustomer. I don't see the problem of coupling your services to a GENERIC based Data Access service, like Repository of T, using LSD - LINQ, Save, Delete.
@El Guapo
I find your derogatory use of the word "purist" to be offensive..if you want to just build a bunch of webservices, go for it.
@Evan
Settle down, Frances
João Bragança,
Okay, please explain me the difference between:
And GetCalendarItems ?
Why not use a request/reply model for this?
@Ayende:
I was totally being sarcastic there. I believe such a scenario REQUIRES the request / reply model, because if an arbitrary number of services can reply to a 'UI needs something event' you are going to end up with a mess. Sorry I didn't make that more clear in my post.
@João
So are you saying that it's a broken paradigm or a broken implementation?
I have a question. Let say we need to re-order some products if they run out of stock. We can:
Fire CreateReplenishmentOrder event and subscribe Order Management to it. In which case if we add other systems requiring order creation, we will need to modify Order Management to be able to subscribe to other events.
Make Order Management to understand CreateOrder command and configure other systems to issue this command if they need order to be created and go through the approval chain. In which case we do not need to change Order Managemnt every time other systems change.
I am saying that it is a broken paradigm to be 100% event based, when things like user interfaces are involved. Can you make a monorail application work with a purely event based model?
Keep in mind that I think a pure request / reply system is a mistake too.
A few years ago eBay recognized this and implemented platform notifications. Pulling down the entire My eBay for a seller just to synchronize with the local database, which the UI then read from, is incredibly wasteful (which is what their sample SDK application did). But at the same time you couldn't base a UI solely on these events (as the desktop computer could be off, the machine serving up the web pages could crash, etc).
@João
As I was saying before, user interfaces sit BEHIND the service boundary. They are NOT an SOA concern. There is a full explanation of this concept here:
http://bill-poole.blogspot.com/2008/03/services-and-user-interfaces.html
You really shouldn't be firing events from your UI. The UI should interface with a back end system which can occur using synchronous request-reply command messages. This is occurring behind the service boundary. The service back end then executes some logic based on the request from the UI, and then may potentially publish one or more events onto the service bus.
Those events are what are exposed in the service contract at the service boundary. Those events are not being used as communication between the UI and service back end, but rather between different services in the enterprise.
@Alex Simkin
Event messages are semantically notifications, not commands. So CreateReplenishmentOrder is a command, not an event. OrderReplenished for instance would be an event message to be published.
We do not want other services to send the Order service a CreateOrder command. Rather we want to subscribe the Order service to the relevant events published by other services to which the Order service must respond in some way - for instance creating an order.
The decision to create the order is that of the Order service, not that of any other service. This is an important but subtle distinction that is described in detail here:
http://bill-poole.blogspot.com/2008/04/avoid-command-messages.html
@Bill Poole
Hi, so you assume a SOA system could be made just in a "async" req\rep manner, I'm thinking even to simplests applications that neeeds some service layer, maybe a WCF layer, there, using just WCF over HTTP, where it's reasonable to use Duplex contracts, this kind of situation is not useful..
What suggestion have you? I think in a situation like this it's more reasonable to expose a CRUD-LIKE interface, if not how can you decouple them?
@Bill Poole
"The decision to create the order is that of the Order service, not that of any other service."
Exactly the opposite. The decision to create order is that of the Inventory System (to re-order) and Sales Force (to ship). Order Service is responsible for creation of the order on request, run approval workflow and send approved order for execution where it belongs, notify Finance etc.
In my DVD example we need to run ad hoc queries against other service data store. How to do it with events?
@ Alex Simkin
With regards to (1), the Inventory service should be concerned only with inventory related matters. If it instructs the Order service to create an order, then it is concerning itself with part of the ordering business process.
One way of looking at it is that if you have a department concerned with ordering and another concerned with inventory in your organisation, you will find that the decision to re-order will lie with the ordering department, not the inventory department.
The inventory department will let the ordering department know when new stock has arrived and the ordering department would take it from there. The inventory department would very likely not instruct a worker in the ordering department to create a new order.
We mirror this business concept with events in SOA. It provides for a more loosely coupled design.
With regards to (2), you can run ad hoc queries against any database you need to. But a service should not run queries against another service. If a service requires data to something, it should have that data locally.
However you need to make sure that the right logic is put into the right service. In the example I provided, instead of the Order service doing inventory checks (and thus needing access to inventory data), we indentified that that was a responsibility of the Inventory service. That way, the Order service didn't need the inventory data locally.
@Bill Poole
"...the decision to re-order will lie with the ordering department".
No. All ordering decisions are of other departments. Ordering department can tell you that you are over your budget or violate export restrictions. They also know how to get permissions, insurance, special transport the only thing that thay have no idea is what chemicals our R&D will need next month or what type of computers and software our IT will need.
"you can run ad hoc queries against any database you need to"
except that you do not know that database schema or maybe it is not database at all and service is just a facade for other services.
@Bill
In our business the inventory system has all the information necessary to calculate the EOQ such as usage. We then periodically run a report that generates suggested orders / warehouse transfers.
However I can see the advantages of your approach. A sale is not the only way usage can occur. So the inventory system would be a subscriber to warehouse transfers, sales, manufacturing. Which would in turn generate a usage event which purchase orders subscribes to.
Still, I have a hard time seeing that it is not the inventory service that is saying 'I need this inventory in that warehouse.' The ordering department is simply communicating to the vendor / other warehouses what the inventory department needs, and then communicating back what can be delivered.
Also, how do you address the chattiness? I've had it pounded into my brain from day one that chunkiness was better than chattiness. Can you address that in a future blog post?
@João
I'm not an expert in this business area, but based on my limited understanding I would suggest that warehouse transfers are a direct concern of the Inventory service, and as such would not need to subscribe to warehouse transfer events. The Inventory service would just handle that internally.
You would not have one Inventory service per warehouse. There would be one Inventory service, potentially with components in each warehouse. From a business perspective, there is one inventory function, not one for each warehouse.
You may leverage publish-subscribe to keep databases between warehouses synchronised, but that would be messaging internal to your inventory service, and not the concern of any other service in your enterprise.
Coarse grained messages are indeed better than fine grained messages. Chattiness is bad. As long as the message semantics have business level relevance, then that should mean your granularity is correct.
In order to comment more definitively on your solution, I'd need to know more about how your organisation works and the business problem you are attempting to solve.
@Alex Simkin
"All ordering decisions are of other departments"
This sounds more like a Procurement service than an Ordering service. I was speaking in terms of an Ordering service being responsible for taking orders from your customers. For internal orders, this will very likely be a different business process.
If all ordering decisions are of other departments in your organisation, then you may consider using a command message to send to the Ordering/Precurement service. But you should be aware of the potential pitfalls and limitations of this approach to make sure it is appropriate:
http://bill-poole.blogspot.com/2008/04/soa-and-reuse.html
Either way though, you still haven't exposed a CRUD interface at your service boundary, which is what we really must avoid.
"except that you do not know that database schema or maybe it is not database at all and service is just a facade for other services"
In my experience, you always will know the database schema you are querying against. There are many BI tools out there to make this easy and self-service for management.
As far as a service being a facade for other services for the purpose of querying, you should consider using an entity aggregation service based on publish-subscribe. Udi Dahan wrote a good journal article on this that you can find here:
http://msdn2.microsoft.com/en-us/library/bb245672.aspx
this is...
by far...
the most...
^#$% ..khm... "annoying" blog comment series i've ever encountered!
That's what happens when there is "Design before Technology". We get middle-classers exiting casually onto the moon's surface wareing nothing but business suits, Jules Verne style.
bah, i shouldn't have commented at all. sorry, Aye.
p.s.: no, it is not a typo
@Jason
Stripes! classic...
Comment preview