One of the major hurdles in distributes systems is trying to understand how they work. Different parts are running at different places and sometimes at different times. Standard debugging usually breaks down at this point, because no one has even invented a non sequential debugger that would make sense to humans.
We are left with trying to understand what is going on in the system based on a pretty old notion, the system logs. With Rhino Service Bus, this was one of the things that I really cared about, so I made this into a first class concept. And no, you don’t get to hunt through a 3GB text file. The idea is that each message (and message interaction) in the system can be captured.
The configuration for this is quite simple:
And once we have done that, we copy each message to the log queue. But it is not just the arrived messages. It is also when a message arrived, how long it took to process it, why it failed, etc.
Using this approach, you can build tools that listen to the log queue and display the information in ways that makes sense to humans. For example, we can create a flow of a saga or conversation, or start getting input about the time it takes to process certain messages or detect SLA violations.
More posts in "Rhino Service Bus" series:
- (08 Aug 2009) DHT Saga Sate Persisters Options
- (21 Jan 2009) Concurrency Violations are Business Logic
- (19 Jan 2009) Concurrency in a distributed world
- (16 Jan 2009) Saga and State
- (15 Jan 2009) Field Level Security
- (14 Jan 2009) Understanding a Distributed System
- (14 Jan 2009) The Starbucks example
- (14 Jan 2009) Locality and Independence
- (14 Jan 2009) Managing Timeouts