Ayende @ Rahien

Refunds available at head office

Debugging memory issues with RavenDB using WinDBG

We had gotten a database export that showed a pretty rough case for RavenDB. A small set of documents (around 7K) that fan out in multiple map/reduce indexes to be around 70K to 180K (depending of which index is used).

As you can imagine, this puts quite a load on the system, I tried to use the usual methods (dotTrace, hand picking, etc) and we did get some good results on that, we found some pretty problematic issues along the way, and it was good. But we still had the RavenDB process take way too much memory. That means that we had to pull the big guns, WinDBG.

I took a dump of the process when it was using about 1 GB of memory, then I loaded that dump into WinDBG (6.2.9200.20512 AMD64).

I loaded SOS:

.loadby sos clr

Then, the first thing to do was to try to see what is going on with the threads:

image

As you can see, we have a small number of threads, but nothing to write home about. The next step is to see if we have anything very big in the heap:

!dumpheap –stat

image

The first number is the method table address, the second is the count, and the third is the total size. The problem is that all of that combine doesn’t reach near as much memory as we take.

I guess it is possible that we hold a lot of data in the stack, especially since the problem is likely caused by indexing.

I decided to map all of the threads and see what they are doing.

image

Switches me to thread #1, and we can see that we are currently in a waiting. Dumping the stack reveals:

image

This seems to be the debugger thread. Let us look at the rest:

  • 2 – Finalizer
  • 3 – Seems to be an inactive thread pool thread.
    image
  • 6 – appears to be an esent thread:
    image
    I am not sure what this is doing, and I am going to ignore this for now.
  • 7 – also esent thread:
    image

And… I got tired of this, and decided that I wanted to do something more productive, which is to selectively disable things in RavenDB until I find something that drops the memory usage enough to be worth while.

Whack a mole might not be a great debugging tool, but it the essence of binary search. Or so I tell myself to make my conscience sleep more easily. 

For reference, you can look at the dump file here.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Roman D. Boiko
12/06/2012 12:26 PM by
Roman D. Boiko

http://memprofiler.com is a great memory profiler. See also http://stackoverflow.com/questions/399847/net-memory-profiling-tools and http://www.apress.com/9781430244585 for alternatives.

Rémi BOURGAREL
12/06/2012 01:37 PM by
Rémi BOURGAREL

windbg is not really user friendly. Even helped by Tess's blog here http://blogs.msdn.com/b/tess/archive/tags/memory+issues/default.aspx?PageIndex=1 I can't figure it out.

Ryan
12/06/2012 03:03 PM by
Ryan

Keep in mind !dumpheap shows the managed heap. It won't show you the unmanaged memory used by the process, and if you're using esent that points more fingers toward the issue being in unmanaged memory.

Chris Eldredge
12/06/2012 03:50 PM by
Chris Eldredge

Loading the dmp and using !address -summary shows 1.15 Gb of committed memory, but as you note the problem does not seem to be in a managed heap since !dumpheap, !eeheap and !heapstat all show memory consumption less than 100 Mb.

This seems to indicate most memory is being allocated and used by unmanaged code.

You mentioned that you are using ESENT. Did you see another user ran into a problem with Managed ESENT that caused over 1 GB of memory to be consumed? http://managedesent.codeplex.com/discussions/276175

Justin
12/06/2012 05:11 PM by
Justin

As your managed heap is very low, have you considered using DebugDiag's LeakTrack to try and catch native allocations that leak?

Matt
12/06/2012 05:36 PM by
Matt

My immediate thought when I saw this was Esent too. Setting the cache parameters solved a similar problem for me. I believe there are actually some config options for this in RavenDB, but you probably know that ;)

Ayende Rahien
12/06/2012 06:59 PM by
Ayende Rahien

Roman, I am pretty sure that the problem is not in managed code, so memprofiler isn't likely to help me here.

Ayende Rahien
12/06/2012 07:01 PM by
Ayende Rahien

Chris, I am pretty sure it is an Esent issue, but I am also pretty sure that it isn't the cache. We added some diagnostics, and the cache takes ~400MB, so that isn't counting for a lot of the data, still investigating.

James Manning
12/06/2012 07:28 PM by
James Manning

A couple of tools I've found helpful fast/easy/simple to use for memory investigation:

  • CLRProfiler4 (despite having a name that sounds more CPU-centric) from the CLR team (IIRC) - if it has any managed code, mainly because it's so simple and quick to use that it's useful even if only as a verification that the managed heap(s) are fine. Its heap map is very useful, especially in comparison to windbg/sos :)

http://www.microsoft.com/en-us/download/details.aspx?id=16273

  • Process Explorer - not as detailed as other tools, but its process properties dialog includes a simple way to see all memory info ('Performance' tab) and managed memory ('.NET Performance' tab, especially with '.NET CLR Memory', the default, selected under '.NET Performance Objects')

Much like the GitHub's situation with "the bridge loop problem that wasn't there" (https://github.com/blog/1346-network-problems-last-friday), I find it important to use the coarse-granularity tools available to help narrow down the problem (even if it's just to verify a 'hunch') before bringing out the tools like windbg :)

Of course, that's exactly what Ayende did here, of course, just reiterating that it's a great idea. :)

Roman D. Boiko
12/06/2012 07:30 PM by
Roman D. Boiko

Ayende, I had to mention that memprofiler partially analyses non-managed code. See http://memprofiler.com/OnlineDocs, "Native Memory Page" section. There is no guarantee that it will solve your particular issue, but it is free to try.

As for the book, it doesn't cover memprofiler in any detail, but provides several good strategies for memory profiling in general. Not sure they will be new for you, though.

Roman D. Boiko
12/06/2012 07:58 PM by
Roman D. Boiko

Note that using memprofiler with unmanaged resources tracker enabled to profile your process will give you more information than using it to analyze a dump file.

Alois Kraus
12/06/2012 11:24 PM by
Alois Kraus

You should create several dumps while the memory is growing to see a pattern in one of the threads to see where it is actually allocating something. Only by looking at the threads while they are not allocating will not lead the way. With VMMap from SysInternals you can have a look at the contents and newly allocated stuff as well. That can also give some clue what the stuff taking so much space actually is.

Sergey Shumov
12/07/2012 03:46 AM by
Sergey Shumov

Ayende, have you tried to compare memory usage of ESENT vs Munin?

Comments have been closed on this topic.