I got into a discussion today about how we are dealing with concurrency, and I have had a few good examples that I think worth putting in writing. The first of them is the phone billing system. This is, by nature, a distributed and concurrent system, and it is pretty easy to understand, I think.
We store the billing information for each customer (keyed by the phone number) in the DHT. The initial state looks like this:
The balance is what the account has, the call & SMS are the actions on the account. For the purpose of discussion, sending SMS costs 2$ and 1 minute call cost 5$.
And then the following happens. A phone call is made at the same time that a couple of SMSes is sent and a bill is paid. You can see that in the following picture:
Each of those actions are handled by a different node. We will deal with them in sequence, because writing parallel hard be is.
A phone call is made, so we need to record that it happened. We get the current billing information from the DHT and add a new action:
At the same time, we also send a couple of SMS messages. Again, we get the current billing information (and we get version 42), add the action and saving it back. However, we don’t have the most current version, so the DHT accepts the update and now we have two versions for key 555-5421. This is expected and normal behavior.
You should also note that we have an overdraft charge, for going over our account balance.This is something that was added to the account as part of the business logic of processing those the call. Being a responsible adult, the bill is paid at the exact time to avoid an overdraft charge. That one is handled according to the same approach, get the billing information from the DHT (and again we get version 42), modify it and save.
Now we have the following situation:
First, I should mention that this is not a generic solution for all problems. There are likely to be problems that you’ll not be able to resolve using this approach.
One thing that you might have noticed is that each of the items is tagged with a number. In real life, it would be a guid, but no one can remember a guid by looking at it, so I made it a number that is easy to remember. This id can uniquely identify an item across multi machines and concurrent versions.
The algorithm for merging those three versions together is actually quite simple. It goes something like this:
RecalcuateCharges is responsible to add / remove overdraft charges based on the new information.
What we are basically doing is quite simple, we copy all the new information to the new state, and we know that it is new because we have a unique id that can identify each item. The only remaining bit of complexity is that we now need to recalculate the charges.
As you’ll see in a future post, “recalculating” isn’t really it, you usually have to perform some compensating actions as well, but that is beside the point for now.
Given the above code, we can safely merge the three versions, and make them into a single big version.
The DHT will notice that the new value is the child of all current valid versions, accept the update and remove all other versions.
As I said, it is not something that can fit any scenario, but it can fit a surprisingly wide area of them.