Handling production errors in a messaging environment
So, today I got the first L2S Prof order. As you can imagine, I was pretty excited about that. However, it turned out that I had actually missed something when I built the backend for handling L2S Prof ordering. The details about what actually went wrong aren’t important (and are embarrassing).
But I logged into the server and checked out what was going on:
One of the major design criteria that I had with Rhino Service Bus is that it should be dead easy to handle production. As you can see in the screen shots, I have two messages of interest here, the first one is the actual message, and the second is the error information, which include the full stack trace. Using that, it was a piece of cake to isolate the problem, do the head slapping moment, and deploy a new version out. Once that was done, all I had to do is to move the message back to the processing queue, and I was done.
Just to give you an idea, here is how it looks like on my timeline:
I guarantee you that the customer in question didn’t have any idea that there was something wrong with his order processing.
I am loving it.