Ayende @ Rahien

Refunds available at head office

Production Errors Best Practices

I have been practicing deployment lately, and today we have gotten everything squared away very nicely, only to be confronted with a failure when we actually run the app.

The error was: KeyNotFoundException, because we forgot to move some data to the QA database.  We fixed the error, but I also change the error message that we were getting.

Now, we get the following error for this scenario:

Rhino.ServiceBus.Exceptins.InvalidDeploymentException: ERROR_ORDER_ATTRIB_NOT_FOUND: Could not find order attribute with the name: CorrelationId
This is usually an indication that the database has not been properly prepared for the deployment.
You need to add the order attribute to the appropriate table.
Important! If the order attribute 'CorrelationId' is missing, it is likely that other required order attributes are also missing.
Please refer to the deployment guide about the required attributes. If you don't find the required attributes in the deployment guide, please shout at the developer, preferably at 2 AM.

There are a couple of important things that I want to discuss about this error:

  • It is very explicit about what is going on.
  • It contains the information about how to fix the error.
  • It warns about additional problems that this may relate to.
  • It contains a uniquely identifying string that can be used to clearly communicate the error, placed in Google or put in a knowledge base.

Comments

Duncan Godwin
02/18/2009 01:36 AM by
Duncan Godwin

Its certainly much better than many of the standard KeyNotFound, DuplicateKey and NullReference style exceptions!

configurator
02/18/2009 01:39 AM by
configurator

I suggest you make the "not been properly prepared for the deployment" and "are also missing" bold and somehow highlighted because users may not read this rather long message...

Of course, that depends on what exactly causes the message.

Simon Labrecque
02/18/2009 01:47 AM by
Simon Labrecque

Did you really put that last sentence in the deployed application? I wouldn't. I'm so tired of working with code containing stuff that the author thought was funny. It's NEVER funny, especially when you get an exception. This also applies to comments. Today, I came across a comment (in the linux kernel, mind you) which says "I see dead people", a reference to the movie "The sixth sense". Now, believe me, that part of code actually needed a comment, and all I got was that lousy reference. I guess it made some sense to the author. Not to me. (ok, the code was actually de-allocating some "dead" structures... but still).

I believe I have a very good sense of humor, but I just don't find those types of "jokes" funny. Not anymore, anyway.

To be fair, the rest of the post is incredibly useful in listing how to make sure an exception actually communicates information instead of just barfing uselessness ;)

Rafal
02/18/2009 10:57 AM by
Rafal

Simon, I agree with you. Occasional 'developer jokes' could be fun, but not anymore if every message wants to be funny. Also, let's think who is the intended reader of these messages. If it's a developer or technically-aware admin, he will probably know that deployment went wrong and ordinary 'missing column XXX in table YYY' message would be sufficient. For other exceptions, it's a good practice to add some context information, for example name of the file or database table where the error occurred. Error codes, like ERRORORDERATTRIBNOTFOUND - don't work in my opinion and it doesn't make sense to give some code to every possible application error, especially when there is no other information available. Look at almost totally useless HRESULTS.

Ayende Rahien
02/18/2009 01:37 PM by
Ayende Rahien

Rafal,

You ignore one very important issue.

Being able to lookup the error easily, and being able to trip it easily

Konstantin
02/19/2009 08:26 PM by
Konstantin

I would google for "Could not find order attribute with the name" - a piece of error message, not for useless and almost meaningless ID.

Googling for pieces of error message text gives you some flexibility in finding similar problems.

Also I think that error ID expressed in numbers like 0x99334422 are better because you don't have to think about naming the ID (which is just another area for programmer jokes).

firefly
02/20/2009 12:10 AM by
firefly

I think consistency is the most important. As long as users know exactly where to look up the error every times they'll generally be happy.

Comments have been closed on this topic.