Rhino Service BusConcurrency in a distributed world
I have talked about Rhino DHT at length, and the versioning story that exists there. What I haven’t talked about is why I built it. Or, to be rather more exact, the actual use case that I had in mind.
Jason Diamond had pointed out a problem with the way sagas work with Rhino Service Bus.
Are BaristaSaga objects instantiated per message? If so, can two different instances be consuming different messages concurrently?
The reason I ask is because it looks like handling the PrepareDrink message could take some time. Is it possible that a PaymentComplete message could come in before the PrepareDrink message is finished being handled?
If the two instances of BaristaSaga have their own instance of BaristaState, I can see the GotPayment value set by handling the PaymentComplete message getting lost.
If the two instances of BaristaSaga share the same instance of BaristaState, do I now have to worry about synchronizing changes to the state across all of the sagas? Also, wouldn't this prevent having multiple barista "servers" handling messages since they wouldn't be able to share instances across processes/machines.
The answer to that is that yes, a saga can execute concurrently. Not only that, but it can execute concurrently on different machines. That put us in somewhat of a problem regarding consistent state.
There are several options that we can use to resolve the issue. One of them is to ensure that this cannot happen by locking on a shared resource when executing the saga (commonly done by opening a transaction on the saga’s row). That can significantly limit the system scalability. Another option is to persist the saga’s state in a way that ensure that we have no conflicts. One way of doing that is to persist the actual state change itself, which allow us to replay the object to a consistent state. Concurrent updates don’t bother us because we aren’t actually modifying the data.
That might require some careful thinking, however, to avoid a case where a saga tat is concurrently executing step on its own feet without paying attention. I strongly dislike anything that require careful thinking. It is like saying that C++’s has no memory leaks issues, it just require some careful thinking.
For RSB, I wanted to be able to do better than that. I selected Rhino DHT at persistence store for the default saga’s state (you can still do other things, of course). That means that concurrency is very explicit. If you got to a point where there were two concurrently executing instances of the saga, their state is going to go to Rhino DHT. Since they are both going to be from the same version, Rhino DHT is going to keep both state changes around.
The next time that we need the state for that particular saga, we are actually going to get both states. At that point, we introduce the ISagaStateMerger:
1: public interface ISagaStateMerger<TState>2: where TState : IVersionedSagaState3: {
4: TState Merge(TState[] states);
5: }
This allow us to handle the notion of concurrency resolution in a very explicit manner. We get the appropriate state merger from the container and use that to merge the states back to a consistent state, which we then pass to the saga to continue its execution.
There is just one additional twist. A saga cannot complete until it is in a consistent state, so if the saga completes while it is in an inconsistent state, we will call the saga again (after resolving the conflict) and let it handle the final state conflict before perform the actual completion.
More posts in "Rhino Service Bus" series:
- (08 Aug 2009) DHT Saga Sate Persisters Options
- (21 Jan 2009) Concurrency Violations are Business Logic
- (19 Jan 2009) Concurrency in a distributed world
- (16 Jan 2009) Saga and State
- (15 Jan 2009) Field Level Security
- (14 Jan 2009) Understanding a Distributed System
- (14 Jan 2009) The Starbucks example
- (14 Jan 2009) Locality and Independence
- (14 Jan 2009) Managing Timeouts
Comments
You have an indentation problem in this post, where you don't break back out when you are done quoting Jason's point.
thanks, fixed
I just wanted to confirm given the retrying mechanism in RSB and the concurrency issues you've mentioned here, we need to design the consumers to be idempotent through-and-through?
Rishi,
Actually no, failure to process a message would cause the full action to be rolled back.
Would the roll back be per handler or all handlers? And if it is for all handlers, what if one handler did say shoot a message to some other service or write to a file without being part of the transaction, how would the roll-back be applicable?
Rishi,
You cannot send a message to someone without being part of a transaction.
Comment preview