Fixing the heisenbug

time to read 5 min | 858 words

I just run the Rhino Service Bus test suite which resulted in the test suite hanging. This is where it most often hang.

   1: [Fact]
   2: public void A_message_that_fails_processing_should_go_back_to_queue_on_transactional_queue()
   3: {
   4:     TransactionalTransport.MessageArrived += ThrowOnFirstAction();
   6:     TransactionalTransport.Send(TransactionalTestQueueUri, DateTime.Today);
   8:     gotFirstMessage.WaitOne();
  10:     Assert.NotNull(transactionalQueue.Peek());
  12:     gotSecondMessage.Set();
  13: }

And it kept getting hang on line 8. Now, it worked on other machines, and when I run this on its own, it worked just fine. I knew that the issue was probably a matter of test interaction, but how could I debug this?

When running under the debugger, it didn’t reproduce itself.

It was pretty consistent in where it failed, however, and that gave me an opening. Many people are not familiar with the ability to interact with the debugger from your code. But the .NET framework contains System.Diagnostics.Debugger class. And that showed me the path.

I put Debugger.Launch() as the first line of the test, and run the tests without the debugger. When this particular test was executed, I broke into the debugger, and I was able to check what the current state of the system was. As it turned out, I had another bus instance reading from the queue. That was because another few tests weren’t disposing of their buses properly. I fixed that and the test suite run normally.