Almost as soon as we introduced concurrent subscriptions, we ran into a serious problem in their use. The desire was to do things in a serial fashion. That was quite infuriating, because we spent to much time working on making things concurrent, and now we had to deal with making them serial again? What the hell?
Before I dive any further, it will probably be for the best if I explained a bit more about the context of this very strange feature request.
Consider a system where the subscription is used to process commands, which may relationships between one another. For example, consider the following commands (all of them belonging to the same “Commands” collection):
- EmployeePayroll – commands/40-A
- EmployeeBankAccountChange – commands/34-A
- EmployeeContractUpdate – commands/49-C
For each one of those commands (and many more), we want to run some logic. Some of this requires us to touch third party services, which means that we are likely to be slow / stalled on some cases. That is the exact case for using concurrent subscriptions.
The developers quickly jumped on the new system, setting the mode of the subscription as concurrent and running multiple workers. Things worked, latency was down and everyone was happy. Everyone, that is, except for George. The problem was George had gotten married recently. Well, that wasn’t the actual problem. George is happily married. The problem is that George and his wife have a new joint bank account. George let the HR department know about the new bank account in advance, which resulted in the EmployeeBankAccountChange command being generated. Then payroll day hit, and we have an EmployeePayroll command as well.
This is where things started to get iffy. In terms of timing, the EmployeeBankAccountChange happened before the EmployeePayroll command. When the subscription was running in serial mode, it was guaranteed that it will always process the commands in the order that they were modified. That meant that handling things like changing the bank account and actually paying had a very natural order. If you made the change before payroll, it got processed before hand, otherwise, it was processed afterward.
With concurrent subscriptions, this is no longer the situation. We are still working roughly in the order of modification, but we are no longer guaranteeing it. And it is possible to process documents out of order.
RavenDB’s concurrent subscriptions will ensure that you’ll not have to worry about concurrent processing of a single document, but in this case, there are different documents, so they can be processed concurrently. An EmployeeBankAccountChange may take a long time (verifying accounts, etc) while EmployeePayroll is just adding a line to a ACH file, so it is very likely that we’ll process the payroll before the account change. And that makes George very sad. Let’s see how we can avoid depressing the newlywed.
One option is to make use of another RavenDB feature, the compare exchange support. This allows you to use strongly consistent, cluster-wide, values which are suitable for distributed locks. I looked into what it will take to build this and quailed in fear. I don’t want to let things become this complicated.
The key issue here is that we want both concurrency and serial work. An interesting observation is that there is a scope for such things. Commands on the same employee should run in the same order they were issued, commands on different employees are free to run in whatever order they like. How can we make this work without diving head first into complexity the like of which will keep you up at night?
For the most part, we can assume that concurrent operations for the same employee is rare. Even when we have multiple commands for the same employee, we can expect that there won’t be many of them. Given that, we can change the way we model the commands themselves. Instead of creating a document per command, we’ll have a document per employee.
Where before we had this model:
We’ll now have the following model:
What does this give us? We now have a commands/employees/1-A for the first employee, all operations on the employee and handled as a single unit, guaranteed by the concurrent subscription. Let’s explore further how that works, okay?
With the previous model/modeling, to register a command, we need to just call:
All the commands were using the Commands collection, so the subscription worker will look like::
And if we process this concurrently, we may process the commands for the same employee at the same time, leading to sadness in the household of George. Instead, with the new model/modeling, we can use the patching API to handle this. Here is what this looks like:
The idea in this case is that all commands for the same employee use the same document. If there isn’t already such a value, we’ll create a new instance, otherwise, we’ll apply the patch script and add to it. The end result is that we can have multiple concurrent operations and they will all be added to the same document in order of execution. However, so far this has nothing to do with concurrent subscriptions. What do we do from here? Here is what the subscription worker looks like after these changes:
The idea is that when we enqueue a command, we register them in the document specifically for the employee (the scope for serial work in a concurrent subscription) and when we process the command in the subscription worker we patch out all the commands that we already executed.
This behavior will guarantee that we can process commands serially within a concurrent worker. All commands for the same employee will be processed serially in the order they were submitted, while different employees will be processed concurrently.We even support adding additional commands to the employee document while the worker is processing commands, we’ll simply handle them in the next batch after the employee commands are all done.
One thing that I’m not discussing here is what to do in case we have concurrent modifications on the commands document in multiple nodes? That would generate a conflict and RavenDB defaults to selecting the latest version. You can configure RavenDB to resolve this property, I talk about this at length here.
Aside from leaning on the new concurrent subscriptions feature, all the rest of the things that we have been using in this post to solve the problem are long standing features of RavenDB and both conceptually and in practice this gives us a great deal of simplicity to handle a non trivial issue.
As usual, I would very much welcome your feedback.
More posts in "RavenDB 5.3 New Features" series:
- (26 Nov 2021) Revisions includes
- (25 Nov 2021) JSON Patch
- (24 Nov 2021) Studio & Query improvements
- (23 Nov 2021) TCP Compression
- (18 Nov 2021) Experimental PostgreSQL wire protocol
- (17 Nov 2021) Elasticsearch ETL
- (15 Nov 2021) Incremental time series & implementing lambda based accounting
- (12 Nov 2021) Incremental time series
- (11 Nov 2021) Concurrent Subscriptions & Serial operations
- (10 Nov 2021) Concurrent subscriptions