Ayende @ Rahien

Refunds available at head office

Errors are part of your experience

The NH Prof website is running on Rhino Service Bus. I decided that this is a great time to test Rhino Service Bus for real, and I built the website on top of it. Even the order processing is done on top of RSB.

But that isn't what I wanted to talk about today. I wanted to talk about errors, and how important they are. In this case, I screwed up when I build the error reporting capability for NH Prof, which means that if you tried to report an error back, you would get a nasty 404 and I would get nada as feedback.

The actual fault is not relevant, and it is fixed already, so hopefully you'll forgive me for having bugs in a beta product.

What I really wanted to talk about was how I fixed the issue. Remember, this is the production machine, and I have no debugger there. What I did was go to the MSMQ interface, go to the back end queue and take a peek in the errors sub queue, where I got to see the following:

image

The first message is the original one, the second is the error description for it. When I opened it, I could see that the error was "AyendeIsStupidException", which explained exactly what went wrong, and I was able to quickly resolve the issue by buying more IQ on eBay. I then moved the message back to the main queue, and got the email that I expected about the problem with NH Prof.

A few important points about this:

  • No debugging
  • No hunting through the logs
  • No custom tools. Rhino Service Bus Profiler doesn't exist (yet).
  • In your face, explicit and very quick troubleshooting experience.

Error handling was an explicit design goal when I set to build Rhino Service Bus, and I am happy to be able to say that so far, it is proving that it is working, and is definitely worth the time I spent in it.

Comments

Jason Whitehorn
01/03/2009 01:56 PM by
Jason Whitehorn

In the past I've used log4net for logging, which works OK. But log files often have a way of becoming burdensome to wade through, so more often than not important errors are not seen.

This is definitely an interesting take on the idea of logging. Are you (or have you thought about) using message priority to represent the error severity so that more critical errors bubble to the top?

Demis
01/03/2009 02:25 PM by
Demis

I've been using ActiveMQ for some time now to store error requests as well. I wrap them up in a SOAP Fault storing details about the error as well as the 'error request'.

The benefit of using MQ's for error handling is that you can deal with your errors programmatically, i.e. once you have fixed the bug that caused the error you can replay the 'error request' to ensure it works and if its a 'Store Message' its still processed (i.e. Store and forward) so you don't lose any important messages.

Steve
01/03/2009 04:09 PM by
Steve

I hope to see you continue to update the Rhino Service Bus and it's sample (dummies like me need samples... lol) - it's a great setup and I'm looking forward to adding this capability into my applications

Ayende Rahien
01/03/2009 08:30 PM by
Ayende Rahien

Jason,

No, I am not.

A failure is a failure is a failure.

It depends on the semantics of the app to decide what to do with it.

Rhino Service Bus simply make sure that the information is accessible and easy to view.

Ayende Rahien
01/03/2009 08:32 PM by
Ayende Rahien

Demis,

Absolutely. I was able to replay all failed transactions!

Ayende Rahien
01/03/2009 08:35 PM by
Ayende Rahien

Steve,

I intend to do so, but I am trying to juggle quite a few balls at the moment.

It will get there, but it will take time

Comments have been closed on this topic.