Ayende @ Rahien

It's a girl

Rhino Service Bus: Concurrency Violations are Business Logic

Concurrency is a tough topic, fraught with problems, pitfalls and nasty issues. This is especially the case when you try to build distributed, inherently parallel systems. I am dealing with the topic quite a lot recently and I have create several solutions (none of them are originally mine, mind you).

There aren’t that many good solutions our there, most of them boil down to: “suck it up and deal with the complexity.” In this case, I want to try to deal with the complexity in a consistent fashion ( no one off solutions ) and in a way that I can deal without first meditating on the import of socks.

Let us see if I can come up with a good example. We have a saga that we use to check whatever a particular user has acceptable credit to buy something from us. The logic is that we need to verify with at least 2 credit card bureaus, and the average must be over 700. (This logic has nothing to do with the real world, since I just dreamt it up, by the way). Here is a simple implementation of a saga that can deal with those requirements:

   1: public class AccpetableCreditSaga : ISaga<AccpetableCreditState>,
   2:   InitiatedBy<IsAcceptableAsCustomer>,
   3:   Orchestrates<CreditCardScore>, 
   4:   Orchestrates<MergeSagaState>
   5: {
   6:   IServiceBus bus;
   7:   public bool IsCompleted {get;set;}
   8:   public Guid Id {get;set;}
   9:   
  10:   public AccpetableCreditSaga (IServiceBus bus)
  11:   {
  12:     this.bus = bus;
  13:   }
  14:   
  15:   public void Consume(IsAcceptableAsCustomer message)
  16:   {
  17:     bus.Send(
  18:       new Equifax.CheckCreditFor{Card = message.Card),
  19:       new Experian.CheckCreditFor{Card = message.Card),
  20:       new TransUnion.CheckCreditFor{Card = message.Card)
  21:       );
  22:   }
  23:   
  24:   public void Consume(CreditCardScore message)
  25:   {
  26:     State.Scores.Add(message);
  27:     
  28:     TryCompleteSaga();
  29:   }
  30:   
  31:   public void Consume(MergeSagaState message)
  32:   {
  33:     TryCompleteSaga();
  34:   }
  35:   
  36:   public void TryCompleteSaga()
  37:   {
  38:     if(State.Scores.Count <2)
  39:       return;
  40:      
  41:      bus.Publish(new CreditScoreAcceptable
  42:      {
  43:       CorrelationId = Id,
  44:       IsAcceptable = State.Scores.Average(x=>x.Score) > 700
  45:      });
  46:      IsCompleted = true;
  47:   }
  48: }

We have this strange MergeSagaState message, but other than that, it should be pretty obvious what is going on in here.It should be equally obvious that we have a serious problem here. Let us say that we get two reply messages with credit card scores, at the same time. We will create two instances of the saga that will run in parallel, each of them getting a copy of the saga’s state. But, the end result is that processing those messages doesn’t match the end condition for the saga. So even though in practice we have gotten all the messages we need, because we handled them in parallel, we had no chance to actually see both changes at the same time. This means that any logic that we have that requires us to have a full picture of what is going on isn’t going to work.

Rhino Service Bus solve the issue by putting the saga’s state into Rhino DHT. This means that a single saga may have several states at the same time. Merging them together is also something that the bus will take care off. Merging the different parts is inherently an issue that cannot be solved generically. There is no generic merge algorithm that you can use. Rhino Service Bus define an interface that will allow you to deal with this issue in a clean manner and supply whatever business logic is required to merge difference versions.

Here is an example of how we can merge the different versions together:

   1: public class AccpetableCreditStateMerger : ISagaStateMerger<AccpetableCreditState>
   2: {
   3:   public AccpetableCreditState Merge(AccpetableCreditState[] states)
   4:   {
   5:     return new AccpetableCreditState
   6:     {
   7:       SCores = states.SelectMany(x=>x.Scores)
   8:         .GroupBy(x=>x.Bureau)
   9:         .Select(x => new Score
  10:         {
  11:           Bureau = x.Key,
  12:           Score = x.Max(y=>y.Score)
  13:         }).ToList();
  14:     };
  15:   }
  16: }

Note that this is notepad code, so it may contain errors, but the actual intention should be clear. We accept an array of states that need to be merged, find the highest score from each bureau and return the merged state.

whenever Rhino Service Bus detects that the saga is in a conflicted state, it will post a MergeSagaState message to the saga. This will merge the saga’s state and call the Consume(MergeSagaState), in which the saga gets to decide what it wants to do about this (usually inspect the state to see if we missed anything). This also works for completing a saga, by the way, you cannot complete a saga in an inconsistent state, you will get called again with Consume(MergeSagaSate) to deal with that.

The state merger is also a good place to try to deal with concurrency compensating actions. If we notice in the merger that we perform some action twice and we need to revert one of them, for example. In general, it is better to be able to avoid having to do so, but that is the place for this logic.

Comments

Bill Pierce
01/21/2009 05:18 PM by
Bill Pierce

My recollection of another implementation from the NSB group is for the Saga State to be retrieved and updated transactionally. First simultaneous message updates the state, TryCompleteSaga returns, second simultaneous message updates the state, and fails because of a "stale state" (excuse the pun) exception, so the message is put back in the queue and processes successfully on the second try.

This assumes some sort of timestamp/version on the state and something like NH, but removes the dependency on DHT and state merges.

I have no implementation experience but I would presume the DHT and state merging would enable higher throughput than the transaction state.

Rafal
01/21/2009 07:48 PM by
Rafal

If this is dealing with complexity, I don't want to know how you deal with simplicity :)

I've got one question: suppose this is real system and you're checking the credit of millions of users (I assume we're dealing with millions - maybe it's Amazon.com?). Do you really need to keep multiple versions of a saga? Usually you'll have sagas related to different customers, so if you kept a single version of each saga and locked it properly to prevent concurrent updates it wouldn't have a negative impact on overall system performance. Or would it?

configurator
01/21/2009 08:53 PM by
configurator

<nitpicking
You have misaligned brackets in:

new Equifax.CheckCreditFor{Card = message.Card),

etc.

Ayende Rahien
01/21/2009 09:29 PM by
Ayende Rahien

Rafal,

Are you saying that this is complex or not?

And I would have a saga per customer yes, but locking is a very expensive operation.

If I lock I am holding a thread captive. I don't have that many threads.

And that is leaving aside the problem of trying to lock in a distributed env.

Ayende Rahien
01/21/2009 09:33 PM by
Ayende Rahien

I said it was notepad code.

Rafal
01/21/2009 10:20 PM by
Rafal

I was joking, but anyway, its complex, especially when you start considering some real world cases. What I was trying to say was that when you have millions of different sagas you get the parallelism from the fact that they are independent objects, not independent versions of the same object. And thus probability that two threads will be modifying the same object is low, so we can use pessimistic locking without taking too much risk.

BTW, do you have to group by bureau name when merging? Is it possible that one bureau sends two scores?

Ayende Rahien
01/21/2009 10:25 PM by
Ayende Rahien

Rafal,

The problem with pessimistic locking is that it doesn't scale very well.

If I have 10 machines, and each machines can handle 8 threads, I have 80 process handling threads.

Pessimistic locking take this thread out for a while, even in the case where we have no contention.

Using this approach, I am able to avoid any locking scenarios.

That is leaving aside the issue of this actually happening in the real world.

If we take the amazon sample, it is pretty common for me to browse in several tabs at the same time, and order at the same time or nearly so.

Rafal
01/22/2009 07:34 AM by
Rafal

I think I understand what you want to achieve: a 100% distributed environment with independent nodes processing incoming messages and with no centralized elements. It works nicely when messages are 'additive', but merging becomes too complex when messages are 'exclusive' (logic depends on processing order). I'm sure that with careful design, your approach would work very well for many scenarios, like workflow engines or billing data flows.

BTW, what is your approach to distributing incoming messages between processing nodes?

Sergey Shishkin
01/22/2009 08:40 PM by
Sergey Shishkin

I like the idea behind Rhino DHT, but what do you do if 2nd and 3rd score messages arrive simultaneously? Both instances of the saga will send the "accepted" or "not accepted" message (depending on the average scores in both instances).

You say that the saga can not complete in an inconsistent state, but what do you do with messages (possibly with opposite results) that already sent?

Ayende Rahien
01/22/2009 09:33 PM by
Ayende Rahien

That is why you have compensating actions.

Sergey Shishkin
01/23/2009 08:05 AM by
Sergey Shishkin

Some things are easier to avoid rather than to compensate their consequences. Imagine you've sent two messages: "customer accepted" and "customer not accepted". Somebody might have reacted already.

A possible solution might be to delay delivery of outgoing messages from a saga until the saga is persisted. And if it's inconsistent at that point, just throw the delayed messages away and give the saga an opportunity to send the right message. What do you think?

Rafal
01/23/2009 08:21 AM by
Rafal

Sergey, if saga persistence is transactional and sending messages is transactional, you can wrap everything in a distributed transaction and enjoy consistent behavior.

Sergey Shishkin
01/26/2009 12:59 PM by
Sergey Shishkin

Rafal, the joy of distributed transactions was the original reason to use DHT and versioning for sagas, afaik.

Comments have been closed on this topic.