Lambda methods and implicit context

Feb 27 2015

Lambda methods and implicit context

time to read 3 min | 524 words

The C# compiler is lazy, which is usually a very good thing, but that can also give you some issues. We recently tracked down a memory usage issue to code that looked roughly like this.

var stats = new PerformatStats
{
    Size = largeData.Length
};
stats.OnCompletion += () => this.RecordCompletion(stats);

Write(stats, o =>
{
    var sp = new Stopwatch();
    foreach (var item in largeData)
    {
        sp.Restart();
        // do something with this
        stats.RecordOperationDuration(sp);
    }
});

On the surface, this looks good. We are only using largeData for a short while, right?

But behind the scene, something evil lurks. Here is what this actually is translated to by the compiler:

__DisplayClass3 cDisplayClass3_1 = new __DisplayClass3
{
    __this = this,
    largeData = largeData
};
cDisplayClass3_1.stats = new PerformatStats { Size = cDisplayClass3_1.largeData.Length };

cDisplayClass3_1.stats.OnCompletion += new Action(cDisplayClass3_1, __LambdaMethod1__);

Write(cDisplayClass3_1.stats, new Action(cDisplayClass3_1, __LambdaMethod2__));

You need to pay special attention to what is going on. We need to maintain the local state of the variables. So the compiler lift the local parameters into an object. (Called __DisplayClass3).

Creating spurious objects is something that we want to avoid, so the C# compiler says: “Oh, I’ve two lambdas in this call that need to get access to the local variables. Instead of creating two objects, I can create just a single one, and share it among both calls, thereby saving some space”.

Unfortunately for us, there is a slight issue here. The lifetime of the stats object is pretty long (we use it to report stats). But we also hold a reference to the completion delegate (we use that to report on stuff later on). Because the completion delegate holds the same lifted parameters object, and because that holds the large data object. It means that we ended up holding a lot of stuff in memory far beyond the time they were useful.

The annoying thing is that it was pretty hard to figure out, because we were looking at the source code, and the lambda that we know is likely to be long running doesn’t look like it is going to hold a referece to the largeData object.

Ouch.

Tweet Share Share 5 comments

Tags:

Comments

27 Feb 2015
10:20 AM

Matt Warren

If you're running ReSharper, you want to download the Heap Allocation Viewer plugin, it shows you things like this directly in VS. See http://blog.jetbrains.com/dotnet/2014/06/06/heap-allocations-viewer-plugin/

A word of warning, it can get a bit addictive if you follow all it's suggestions. So you want to make sure you only look at hot-paths in your code.

27 Feb 2015
10:26 AM

Steve

Even default ReSharper informs that the lambdas will implicitly capture both variables. It's one of those really useful features when you need them.

27 Feb 2015
10:30 AM

mark

I believe they also capture more than strictly needed. E.g. variables that are never used by any lambdas.

The Roslyn codebase should be very amenable to new optimizations. Much more than the old codebase. Also this would probably something they'd take as a contribution since perf optimizations have no surface area and need no design work.

27 Feb 2015
15:10 PM

Matt Warren

@mark

Interestingly enough, this is already happening, although for a different scenario. See https://github.com/dotnet/roslyn/pull/415

27 Feb 2015
15:44 PM

Arseny Kapoulkine

@mark, the problem is that this is a tradeoff that is hard to make for a compiler.

You can make a call to allocate separate contexts for separate lambdas. Which helps in this specific case, but increases the allocation count => GC pressure.

Or you can make a call to allocate one context, which helps with allocation count but sometimes results in more memory being held alive.

Or you can make GC significantly more complex to track "partial" ownership (or e.g. set context fields to nil during closure finalization if just this closure owns them - which still does not help for some cases and penalizes closure collection).

Of course, if the compiler knew statically about the sizes of stats object vs largeData it could choose wisely. Generally it does not.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB