Architecture foresight: Put a queue on that
If you build any kind of non trivial system, one of the absolutely best things that you can do for the long term health of your system is to move all significant processing to sit behind a queue. That is one of those things that is going to pay massive dividends down the line as your system grows. Basically, any time that you want to do something that isn’t “pull data and show it to the user” or “store the data immediately”, throwing a queue at the problem is going to make things easier in the long run.
Yes, this is a bold statement, and I’m sure that you can see that this may be abused. Nevertheless, if you’ll follow this one rule, even if you misuse it, you are likely going to be better off than if you don’t. I should probably explain.
When I’m talking about using a queue, I’m talking about moving actual processing of an operation from the request handler (controller / action / web sever ) to popping a message from a queue and processing that. The actual queue implementation (SQS, Kafka, MSMQ, ring buffer) doesn’t actually matter. It also doesn’t matter if you are writing to the queue in the same process and machine or a distributed system. What matter is that you can created a break in the system between three very important aspects of command processing:
- Accepting a request.
- Processing the request.
- Sending the result of the request back.
A system without a queue will do all of that inline, in this manner:
What is the problem here? If the processing of the request is complex or takes some time, you have an inherent clock here. At some point, the client is going to timeout on the request, which lead to things like this:
On the other hand, if you put a queue in the middle, this looks like this:
Note that there is a separation in the processing of the request and sending the accepted answer to the customer.
What is the impact of this change?
Well, it is a bit more complex to manage in the user interface. Instead of getting the response for the request immediately, we have to fetch it in a separate operation. I’m typically really strict on policing the number of remote calls, why am I advocating for an architectural pattern that requires more remote calls?
The answer is that we build, from the first step, the ability of the system to delay processing. The user interface no longer attempts to pretend that the system reacts instantly, and have far more freedom to change what we do behind the scenes.
Just putting the operation on a queue gives us the ability to shift the processing, which means that we can:
- Maintain speedy responses and responsive system to the users.
- Can easily bridge spikes in the system by having the queue flatten them.
- Scale up the processing of the operations without needing to do anything in the front end.
- Go from a local to a distribute mechanism without changing the overall architecture (that holds even if you previously held the queue in memory and processed that with a separate thread).
- Monitor the size of the queue to get a really good indication about where we are at in terms of load.
- Gain the ability to push updates to the backend seamlessly.
At more advanced levels:
- Can copy message to an audit log, which gives great debugging abilities.
- Can retry messages that failed.
There are whole patterns of message based operations that are available to you, but the things that I list above are what you get almost for free. The reason that I think you should do that upfront is that your entire system would already be used to that. Your UI (and users’ expectations) would already be set to handle potential delays. That gives you a far better system down the line. And you can play games on the front end to present the illusion that operations are accepted on the server (in pending status) without compromising the health of the system.
In short, for the health of your system, put a queue on that, your future self will thank you later.
One final word of warning, this apply to operations, not queries. Don’t bother putting queries through the queue unless they are intended to be very long lived / complex ones.
I use Hangfire for exactly this. It would be awesome if a RavenDB backend for Hangfire could be developed. Then I can drop my SQL Server dependency!
We have an issue for that here: https://issues.hibernatingrhinos.com/issue/RavenDB-13912
If it gets enough votes, we'll prioritize.
That said, comparing Hangfire and RavenDB subscriptions, what is the difference?
in raven we can only have 1 worker connected. desperately asking/wishing for a competing consumer scenario with ravendb subscriptions on steroids :)
i think you already blogged about this after i sent an email to the raven maillist. i know it's probably not that easy to implement. just saying every single project i was we had a need for this and in the end we're firing up a redis instance. feels like raven has everything to provide this in a very nice way!
That is actually something that we are working on right now. We are going to support concurrent subscriptions soon. Given that it will be supported, what else would you need?
RavenDB Subscriptions may give the underlying mechanism for a worker queue (once we have concurrent workers). But Hangfire has so much included: nice API, scheduled jobs, batches, continuations. The Hangfire management UI is very good too.
Yes, makes perfect sense
Even though the benefits of an async queue are indisputable, I will respectfully point out that you brush over or ignore the drawbacks. More precisely you present this as a catch all approach for any system in any domain, at any stage of evolution. The main reason systems grow the wrong way is many people working over them over many years. I have learned that taking a conservative approach to accidental complexity of any kind, introducing it only when a valid reason exists (load, domain, team structures etc.) and favoring readability and easy to reason about systems will help to keep delivering value in the long term. Existing accidental complexity does not get eliminated, only moved elsewhere. In the absence of more context on when to use this ("always" is implied) this looks like overengineering. I think we agree that your sync code example is much easier to reason about than your async one. "Well, it is a bit more complex to manage in the user interface", "And you can play games on the front end" hides a lot of complexity in the FE to accommodate async patterns. Your "At more advanced levels" section presents no benefits really, doing these things in a sync pattern is exactly the same as in async, the complexity is moved to the infrastructure instead of the code.
Great comment, I answered it here:
This pattern has a name: asynchronous request-response.
I seen another problem with a queue-backed system: lack of feedback after you send a message. What if the data you sent is not valid from the point of view of processing system (consumer)? There's no way to know unless you put your validation in the producing application, which in my opinion kills most of the benefits of the queue idea, even more so if you have several consumers
There are two types of validations in this context.
There is the "email address looks right to me" or "age must be >= 0 and < 120" and then there is a "the credit card charge went through".
For the first time, you do that inline, because they are cheap. For the second part, this is expensive, and you need to build some manner for reporting errors anyway.
Note that in this case, a lot of the complexity around this has been removed. You do minimal frontend validation, then post the message, then you can just check the state. The boundary between the system is easy.
Oren, that's precisely what I'm talking about. Lightweight validation can be done before sending the messsage, sure. You absolutely need a feedback from all the downstream systems. Not sure I can easily put together the implementation for that, but still. Without the feedback it's a nightmare such approach is difficult to troubeshoot. I heard people say "just use queue for everything, it's better arhitecture and throughtput is higher" which I totally disagree with.
Even if you have all the validation in the backend, that is still pretty good. The turn around time for most validation is going to be minimal, but your code will be ready for delays if needed.