Okay, NServiceBus Distributor deserve its own post. Broadly, the way it works is fairly simple.
Here is a simplified diagram, showing the workers and their queues, as well as the distributor and its queue. In here they are shown running on separate machines, but they could also be on the different processes.
Now, let us zoom into the distributor a bit, shall we?
Hm, we actually have two queues for the distributor, one is for workers, reporting for work (on the left), the second is applicative messages, which needs to be processed.
The reason that I think this is beautiful is quite simple, you submit work to the distributor queue, which forward it to one of the workers. So far, pretty standard stuff. The fun part starts when you talk about managing the workers.
On startup, each of the workers will send T notifications to the distributor, where T is the number of threads it is configured to use (yes, workers are threads in a machine, not a machine). When the distributor send a message to a worker, it also take it out from the available list. When the worker is done, it tells the distributor that it is ready again, at which point it become available to more work.
Very elegant solution. It is even more elegant when you look at the code to handle that:
private void messageBusTransport_TransportMessageReceived(
string destination = this.workerManager.PopAvailableWorker();
if (destination == null)
logger.Debug("Sending message to: " + destination);
Notice what happens if we don't have an available worker. We simply rollback our current action and move on. We will try again in a short while, and hopefully then we will have a worker to dispatch to.
What about failure scenarios?
Well, at most a worker can "lose" a single message, since if it crashed, it will not report itself as available. If a machine crashes, then we might lose a bunch of messages (all the messages currently worked on by the workers on that machine), but it doesn't hurt the overall system stability. When that machine comes back online, it will immediately starts to process those messages again.
Hm, there is actually an issue here with this scenario, since the workers will start working on their existing messages, but at the same time will report that they are ready for work. This just means that they will have work already queued by the time they are finished (and then they would report they are available again, of course). In general, I am okay with that.
What about the distributor itself? Again, crashing the distributor is generally not an issue. We are talking about using a durable transport here, so unprocessed messages will be saved and received when the distributor comes back again.
At the moment, I am not sure what happens if the distributor goes down for a lengthy period of time.