Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,972 | Comments: 44,508

filter by tags archive

ChallengeFind the resource leak


I am not sure if I should be embarrassed or furious about this bug, but I spent a lot of time working on issue. Basically, I had a resource leak in my app. I knew that I had it, and even though it took a while to create a reliable reproduction, I managed to narrow it down to a simple reproduction. It was obvious that the issue is leaking resources during the shutdown phase of the application as a result of a race condition.

Now, usually we don’t care about such things because shutting down will, obviously, clear all of those nasty things anyway. Unfortunately, my scenario is to start & shutdown many times in a single process, as part of a unit test. So I had to figure out what was actually going on. The maddening part is that the code should work.

Here is a close approximation of the code in question:

public class QueueActions : IDisposable
{
    UnmanagedDatabaseConnection database;
    public string Name { get; private set; }

    public class QueueActions( UnmanagedDatabaseConnectionFactory factory)
    {
         database = factory.Create();
         database.Open(()=> Name = database.ReadName());
    }

   // assume proper GC finalizer impl

    public void Dispose()
    {
          database.Dispose();
    }
}

And the code using this:

using(var factory = CreateFactory())
{
   ThreadPool.QueueUserWorkItem(()=>
   {
          using(var actions = new QueueActions(factory))
          {
               actions.Send("abc");     
          }
    });
}

Note that this isn’t remotely like the actual code, it is highly simplified

And yes, there is an intentional race condition here.

Here are two pieces of information that are important to remember about this unmanaged library:

  • The factory will not close if there are opened connections (exception)
  • Trying to open a connection after the factory was closed will fail (exception)

How many problems does the code above have?

More posts in "Challenge" series:

  1. (28 Apr 2015) What is the meaning of this change?
  2. (26 Sep 2013) Spot the bug
  3. (27 May 2013) The problem of locking down tasks…
  4. (17 Oct 2011) Minimum number of round trips
  5. (23 Aug 2011) Recent Comments with Future Posts
  6. (02 Aug 2011) Modifying execution approaches
  7. (29 Apr 2011) Stop the leaks
  8. (23 Dec 2010) This code should never hit production
  9. (17 Dec 2010) Your own ThreadLocal
  10. (03 Dec 2010) Querying relative information with RavenDB
  11. (29 Jun 2010) Find the bug
  12. (23 Jun 2010) Dynamically dynamic
  13. (28 Apr 2010) What killed the application?
  14. (19 Mar 2010) What does this code do?
  15. (04 Mar 2010) Robust enumeration over external code
  16. (16 Feb 2010) Premature optimization, and all of that…
  17. (12 Feb 2010) Efficient querying
  18. (10 Feb 2010) Find the resource leak
  19. (21 Oct 2009) Can you spot the bug?
  20. (18 Oct 2009) Why is this wrong?
  21. (17 Oct 2009) Write the check in comment
  22. (15 Sep 2009) NH Prof Exporting Reports
  23. (02 Sep 2009) The lazy loaded inheritance many to one association OR/M conundrum
  24. (01 Sep 2009) Why isn’t select broken?
  25. (06 Aug 2009) Find the bug fixes
  26. (26 May 2009) Find the bug
  27. (14 May 2009) multi threaded test failure
  28. (11 May 2009) The regex that doesn’t match
  29. (24 Mar 2009) probability based selection
  30. (13 Mar 2009) C# Rewriting
  31. (18 Feb 2009) write a self extracting program
  32. (04 Sep 2008) Don't stop with the first DSL abstraction
  33. (02 Aug 2008) What is the problem?
  34. (28 Jul 2008) What does this code do?
  35. (26 Jul 2008) Find the bug fix
  36. (05 Jul 2008) Find the deadlock
  37. (03 Jul 2008) Find the bug
  38. (02 Jul 2008) What is wrong with this code
  39. (05 Jun 2008) why did the tests fail?
  40. (27 May 2008) Striving for better syntax
  41. (13 Apr 2008) calling generics without the generic type
  42. (12 Apr 2008) The directory tree
  43. (24 Mar 2008) Find the version
  44. (21 Jan 2008) Strongly typing weakly typed code
  45. (28 Jun 2007) Windsor Null Object Dependency Facility

Comments

Demis Bellot

Looks like you could get both exceptions:

1) the factory could dispose before the connection is open and

2) the connection could be open and 'Send()'ing while the factory is trying to close.

Not sure why you would spawn a thread at that point though.

Rafal

One of problems is the use of FactoryFactoryFactory pattern.

Felix

You can have:

An exception on closing the factory

An exception on the costructor of QueueAction

An exception on the send

An exception disposing the QueueAction

So at a first glance there is more possible exception than lines ;)

Making a singleton thread safe Factory does not solve all the problem ?

Daniel
*  The factory will not close if there are opened connections (exception)

* Trying to open a connection after the factory was closed will fail (exception)

Both of these can happen, and I don't see any other possible problems (assuming the Unmanaged* classes are thread-safe -- if they aren't, then anything might happen).

The 'send' call cannot fail since the factory won't close if there are open connections. Disposing the connection also shouldn't fail according to your description.

So I'd say this code will always do one of the following:

1) crash when opening the connection (factory.Create() call)

2) crash when closing the factory (factory.Dispose() call)

3) run successfully

Where's the resource leak?

Well obviously, if closing the factory fails with an exception, it'll stay open until the GC calls the finalizer. If that doesn't correctly handle the case of a failed close attempt, you might end up with a permanent resource leak.

Bert Van Steen

The QueueActions (and associated database connection) won't be disposed if an exception is thrown in the constructor.

It's also not a best practice to do expensive calls (like opening connections) in a constructor.

Justice

@Ayende ... you gave it all away!

The code

  1. creates a factory

  2. queues a thread to

2a. create a connection

2b. close the connection

  1. closes the factory

But the factory could easily be closed before the queued thread is started or the thread attempts to create a connection, resulting in an exception on creating a connection. Additionally, the factory could easily be closed after the queued thread creates a connection, resulting in an exception on closing the factory.

The solution is either to put the creation of the factory into the queued thread, or to join the outer thread to the queued thread (or use other signalling mechanisms such as WaitHandles) so that the factory is closed in the outer thread only once the queued thread has run to completion, at least of the connection-oriented work.

Kaare Skovgaard

My guess would be that it is the fact that you're using the ThreadPool.QueueUserWorkItem - INSIDE your using block. It looks to me like the outer using statement is likely to dispose of the factory before the inner using block is ever ran. Now if that does not happen - seeing as this code is now running in 2 different threads - the factory.Dispose() could be called before actions.Dispose() (both by the end of the using blocks when expanded into try/finally) and that could also cast an exception.

That's just my idea anyway.

Rich

If database.Open throws an Exception then the reference to the UnmanagedDatabaseConnection instance is lost... database.Dispose() will never be called as the Dispose method of QueueActions cannot be called (an exception will be thrown from the QueueActions constructor so there's no instance of QueueActions to call Dispose on).

Not really worked much with unmanaged code directly but I'm guessing that the CLR has no way to clean this up, so the UnmanagedDatabaseConnection will live on in memory after the application ends.

Michal

The factory can get disposed during the QueueActions constructor. If this happens after calling

database = factory.Create();

but before

database.Open(()=> Name = database.ReadName());

.. then opening the connection will throw Exception. This will cause QueueActions.Dispose() to never be called and the database will leak.

zihotki

I think that in order to fix bug you should move using block for a factory inside of the ThreadPool.QueueUserWorkItem action. At the time of execution of an item in the ThreadPool queue the factory for this item may be already disposed or this factory will be disposed (with an exception) after the creation of a QueueActions instance but before disposing it, so there will be an opened connection.

Frank Quednau

depending on what you do with the lambda in the Open call, you'd stop to have full control about the lifetime of the database and the QueueActions since they both need to be captured for the lambda to work, but I don't know if that's an issue here...

OmariO

using(var factory = CreateFactory())

{

ThreadPool.QueueUserWorkItem(()=>

{

      using(var actions = new QueueActions(factory))

      {

           actions.Send("abc");     

      }

});

}

It is possible that before the ThreadPool work item will have a chance to start execution the "main" main thread will have disposed the factory. Let start with this scenarion

  1. As there are no open connection yet disposing the factory will succeed

  2. When constructor of QueueActions class tries to create connection an exception will be thrown because the factory will have been disposed to the time

  3. As the exception is thrown in the constuctor the reference on the QueueActions is not assigned to "actions" local variable and consecuently the object won't be disposed

  4. Uncaught exception in the thread pool's thread will lead to the AppDomain unload whcih will run finalizer for QueueActions which in turn will call QueueActions.Dispose method

  5. As the factory failed to create a connection and threw exception the QueueActions object will have null value in its "database" member variable = > NullReferenceException in the finalizer thread

I would suggest you not to throw exceptions from constructors and to remove the finalizer from QueueActions class as is doesnt deal DIRECTLY with an unmanaged resource (disposing disposables in the Dispose method is enough in). Having finaliser in order to to deal with exceptional constructors... smells a bit.

But we still havent found the leak of connections as in the aforementioned scenario a connection event were not created.

Th second scenario I can see is:

  1. The main thread places the work item in ThreadPool and preempted by a thread pool thread that starts excecution of the the work item

  2. The thread pool thread in the constructor of QueueActions executes factory.Create() and preemted by main thread before the created connection assigned to "database" member

  3. The factory doesn't close because there is a created connetion and throws an exception.

  4. In the rpocess of unloading the AppDomain a thread abort is injected in the thread pool thread exactly before the assignment to "database" memeber =>

created QueueActions object is not disposed and when it finalizer is executed database == null

  1. connection leaked!

"Last but not least" sorry for my English :)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Paying the rent online - 17 hours from now
  2. Reducing parsing costs in RavenDB - about one day from now

There are posts all the way to Aug 04, 2015

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats