Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 08 | Comments: 18

filter by tags archive

The best multi threading debugging tool is Microsoft Excel

time to read 3 min | 412 words

In any system that gets to a certain size, especially one that is multi threaded, there are a certain class of bugs that are really a bitch to figure out.

They are usually boil down to some sort of a race condition. I have been doing that recently, trying to fix a bunch of race conditions in RavenDB. The problem with race conditions is that they are incredibly hard to reproduce, and even when you can reproduce them, you can’t really debug them.

I came up with the following test harness to try to narrow things down. Basically, create a test case that sometimes fails, and run it a thousand times.

Odds are that you’ll get the failure, but that isn’t the important bit. The important bit is that you setup logging properly. In my case, I set things up so the logging output to CSV and each test run had a different file.

This gives me output that looks like this:

23:54:34.5952,Raven.Database.Server.HttpServer,Debug,Request #  10: GET     -    12 ms - <default>  - 200 - /indexes/dynamic/Companies?query=&start=0&pageSize=1&aggregation=None,,11
23:54:34.5952,Raven.Client.Document.SessionOperations.QueryOperation,Debug,"Stale query results on non stale query '' on index 'dynamic/Companies' in 'http://reduction:8079/', query will be retried, index etag is: bcf0fed1-d975-43b4-bfb7-65221ef06b99",,1
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 10, for: TasksExecuter, will wait for additional work",,6
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,Incremented work counter to 11 because: WORK BY IndexingExecuter,,12
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 11, for: TasksExecuter, will wait for additional work",,6
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 11, for: ReducingExecuter, will wait for additional work",,21
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 11, for: IndexingExecuter, will wait for additional work",,12
23:54:34.6952,Raven.Client.Document.SessionOperations.QueryOperation,Debug,Executing query '' on index 'dynamic/Companies' in 'http://reduction:8079/',,1
23:54:34.8172,Raven.Database.Indexing.Index.Querying,Debug,Issuing query on index Temp/Companies for all documents,,11
23:54:34.8172,Raven.Storage.Esent.StorageActions.DocumentStorageActions,Debug,Document with key 'companies/1' was found,,11

Not really helpful in figure out what is going on, right? Except for one tiny thing, we load them to Excel, and we use conditional formatting to get things to look like this:

image

The reason this is so helpful? You can actually see the threads interleaving. This usually help me get to roughly the right place, and then I can add additional logging so I can figure out better what is actually going on.

I was able to detect and fix several race conditions using this approach.

And yes, I know that this is basically printf() debugging. But at least it is printf() debugging with pretty colors.


Comments

Frank Quednau

Just wondering, have you ever given CHESS (http://research.microsoft.com/en-us/projects/chess/) a spin, considering that kind of problem could be right up its alley?

Ayende Rahien

Frank, We found multiple bugs in CHESS when we tried to run it on RavenDB

tobi

CHESS is, unfortunately, abandoned. The approach this tool uses is one of the best available. So easy to use and so thorough.

Thomas Freudenberg

Have you ever tried log4view (http://www.log4view.com/)? It provides a nice UI to analyze log outputs produced by log4net, nlog or whatever. It has one or another memory leak, but except that it's great. Using log4view you could skip the CVSs but see the logs live using a TCP appender.

Iain

As a (IMHO, better) alternative to Excel for this sort of data analysis, you could try Google Refine http://code.google.com/p/google-refine/

The facet based interface allows you to quickly burrow into the data and filter out the irrelevant.

Chris

Sounds like the uncertainty principle at work!

Gokul

IMHO you can also try using logparserlizard. You may have to write a simple regex but once you get it going it's very helpful. http://www.lizard-labs.net/logparserlizard.aspx

magellings

Visual Studio 2010 has a lot of support for debugging multi-threaded code. It was one of the main selling points to the 2010 IDE. Perhaps it doesn't help with race conditions, but out of curiousity did you dig into what it all provides to see if it can?

 Delay

Another tool you might consider is TextAnalysisTool.NET: http://blogs.msdn.com/b/delay/archive/2007/06/21/powerful-log-file-analysis-for-everyone-releasing-textanalysistool-net.aspx

It allows you to easily filter lines in and out of view based on matching text, patterns, and markers. I wrote it a long time ago for very similar purposes. :)

Clinton

CHESS is open source: https://chesstool.codeplex.com/

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Concurrent max value - 7 hours from now
  2. Production postmortem: The case of the memory eater and high load - 3 days from now
  3. Production postmortem: The case of the lying configuration file - 4 days from now
  4. Production postmortem: The industry at large - 5 days from now
  5. The insidious cost of allocations - 6 days from now

And 5 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats