Beautiful errors

time to read 2 min | 322 words

The following code in a recent PR caused it to be rejected, can you figure out why?

image

The error clearly states that what the error is, but it fails to provide crucial details. Namely, which files have been corrupted. If I’m seeing an error like this in my logs, I need to be able to figure out what happened, and not hope & guess.

I’m picking up on this particular change because I found myself tallying in my head the number of comments I make on PRs from our team, and quite a large portion of that involve these kind of changes. What I’m looking for with error handling is not just to do it properly and handle all edge cases, but to also properly report it so the person who will end up reading this error message (very likely years from now) will have a clue about what they are supposed to do now.

Sometimes we go to great lengths to ensure that this is the case. We have an entirely different server mode dedicated to handling catastrophic errors, so when you try to get into RavenDB, you’ll get a meaningful error page that will at least try to give you an idea about how to fix this issue, for example. The sad part is that it is very easy to have a single error sour up a really good experience, because it doesn’t provide you with the right information to fix it.

We spend a lot of time just crafting errors properly. They go to the log, they are sometimes blocking the UI (if the server cannot start), we have dedicated alert system that handles error and alert distribution across the cluster so an admin can get into any node and see all the stuff that they need to know about across the entire cluster.