Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

RavenDB 2.0 StopShip bug: Memory is nice, let us eat it all.


In the past few days, it sometimes felt like RavenDB is a naughty boy who want to eat all of the cake and leave none for others.

The issue is that under certain set of circumstances, RavenDB memory usage would spike until it would consume all of the memory on the machine. The problem is that we are pretty sure what is the root cause of the problem, it is the prefetching data that is killing us. Proven by the fact that when we disable that, we seem to be operating fine. And we did find quite a few such issues. And we got them fixed.

And still the problem persists… (picture torn hair and head banging now).

To make things worse, in our standard load tests, we couldn’t see this problem. It was our dog fooding tests that actually caught it. And it only happened after a relatively long time in production. That sucked, a lot.

The good news is that I eventually sat down and wrote a test harness that could pretty reliably reproduce this issue. That narrowed down things considerably. This issue is related to map/reduce and to prefetching, but we are still investigating.

Here are the details:

  • Run RavenDB on a machine that has at least 2 GB of free RAM.
  • Run the Raven.SimulatedWorkLoad, it will start writing documents and creating indexes
  • After about 50,000 – 80,000 documents have been imported, you’ll begin seeing memory rises rapidly, to use as much free memory as you have.

On my machine, it got to 6 GB before I had to kill it. I took a dump of the process memory at around 4.3GB, and we are analyzing this now. The frustrating thing is that the act of taking the mem dump dropped the memory usage to 1.2GB.

I wonder if we aren’t just creating so much memory garbage that the GC just let us consume all available memory. The problem with that is that it gets so bad that we start paging, and I don’t think the GC should allow that.

The dump file can be found here (160MB compressed), if you feel like taking a stab in it. Now, if you’ll excuse me, I need to open WinDBG and see what I can find.


Comments

mm
mm

do you try to explicit call gc when a block of (20k/40k) documents has been imported?

Ayende Rahien

mm, We have a way to call GC directly, yes, and no, it doesn't help.

mm
mm

so if you call Gc.Collect() every N documents during the test the memory still continue to increase?

Ayende Rahien

mm, That doesn't matter, if we call GC.Collect() when it is 4 GB in size, it should clear 4 GB of waste on its own. If it doesn't, it means that there is something else that is wrong.

mm
mm

yeah, sorry for my bad english, what i mean is if gc works or not to know if it's a memory leak problem

Jason

I had a very similar issue using Lucene in the past. The problem was I upgraded the project to .net 4.0 and Lucene .net was built against an older version. Upgrading to Lucene.Net 2.9.4g and all projects to .NET 4.0 fixed my issue. I feel your pain.

To track down my issue I just started deleting functionality until the issue stopped. WinDBG wasn't that helpful in my case.

Payam

Have you tried using a memory profiler like ANTS Memory Profiler or .NET Memory Profiler(http://memprofiler.com/)?

Rennie

Can this have anything to do with it? http://nikosbaxevanis.com/2010/10/20/adventures-using-rhino-servicebus/

Ayende Rahien

Rennie, As a matter of fact, no, that was another issue. See the next few posts.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats