Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 10 | Comments: 37

filter by tags archive

Fighting the profiler memory obesity

time to read 3 min | 597 words

When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account:

  • Speed in serializing / deserializing.
  • Ability to intervene in the serialization process at a deep level.
  • Size (also effect speed).

The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed & size of serialization by making sure that the serialization pipeline knows exactly what is going on (string tables & flyweight objects).

I started looking into the standard .NET serialization pipeline, but that was quickly ruled out. There are several reasons for that, first, you literally cannot hook deep enough into the serialization pipeline to do the sort of things that I wanted to do (you cannot override how System.String get persisted), and it is far too slow for my usages.

My test data started as a ~900Mb of messages, which I loaded into the profiler (resulting in a 4 GB footprint during processing and a 1.5GB footprint when processing is done). Persisting the in memory objects using BinaryFormatter resulted in a file whose size is 454Mb and whose deserialization I started before I started writing this post and at this point in time has not completed yet. Currently the application (simple cmd line test app that only does deserialization, takes 1.4 GB).

So that was utterly out. So I set out to write my own serialization format. Since I wanted it to be fast, I couldn’t use reflection, (BF app currently takes 1.6 GB) but by the same token, writing serialization by hand is labor intensive, error prone method. That lives aside the question of handling changes in the objects down the road, that is not something that I would like to do.

Having come to that conclusion, I decided to make use of CodeDOM to generate a serialization assembly on the fly. That would give me the benefits of no reflection, handle addition of new members to the serialized objects and would allow me to incrementally improve how (BF app now takes 2.2 GB, and I am getting ready to kill it). My first attempt in doing so, applying absolutely not optimization techniques, result in a 381 Mb file and an 8 seconds parsing time.

That is pretty good, but I wanted to do a bit more.

Now, note that this is an implementation specific for a single use. After applying a simple string table optimization, the results of the serialization are two files, the string table is 10Mb in length and the actual saved data is 215Mb and de-serialization takes ~10 seconds. Taking a look at what actually happened, it looked like the cost of maintaining string table is quite high. Since I care more about responsiveness than file size, and since the code for maintaining the string table is complex, I dropped that in favor of in memory only MRU string interning.

Initial testing shows that this should be quite efficient in reducing memory usage. In fact, in my test scenario, memory consumption during processing dropped down 4 GB to just 1.8 – 1.9 GB and 1.2 GB when processing is completed. And just using the application shows that the user level performance is pretty good, even if I say so myself.

There are additional options that I intend to take, but I’ll talk about them in a later post.


Comments

Rafal

How comes that serializing 900 Mb of messages gives a 454Mb file?

Ayende Rahien

Some of the data is thrown away, or kept in a much more compact form.

Zamboch

When deserializing, how much of the time take memory allocation itself ?

Ayende Rahien

Zamboch,

When deserializing, the data is pretty small,so I never actually had to check that.

Ayende Rahien

Julien,

I am making use of Google Buffer already.

The problem is that it requires me to adhere to a very rigid structure, which would limit my ability to work with it

Imran

Interesting stuff, would be interested in seeing some of the code you have used to do this. Any chance of posting part of it?

Frans Bouma

You didn't look at the fast serialization articles I mentioned to you ?

Rob Eisenberg

Ok. This is really going to frustrate you. But, there is not CodeDom in Silverlight ;( There is Reflection.Emit though...

Ayende Rahien

Frans,

I did.

It did a lot of things manually, and that wasn't what I was after.

Ayende Rahien

Rob,

There is always pre-build step :-)

Rob Eisenberg

Ah! Great! Of coarse. I was feeling rather lowly being the bearer of bad news after such great progress...

Anon

Are you planning to handle the scenario of a initial file that is larger than total available memory for these changes?

Ayende Rahien

Anon,

That isn't an issue. I am only loading chunks (a single session) from the file, so unless I have a single session that consume more than the available memory, it is a non issue.

And if I do, well, there are other problems to be solved there first.

Ori Almog

Hi

What about T4 to generate your serialization code, may be easier to maintain?

Dinesh Gajjar

Just a question : Since you are serializing huge amount of data and then filtering, couldn't you use some embedded Database like SQLite ? I have used it in some situations like this and found it to be useful.

But to be honest, no idea of the challenges you are dealing here, so i may be wrong.

Dinesh Gajjar

Sorry, I found the answer in your previous post :). Blogs shows latest first, so I was reading this one before the persistence one :)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The case of the memory eater and high load - about one day from now
  2. Production postmortem: The case of the lying configuration file - 2 days from now
  3. Production postmortem: The industry at large - 3 days from now
  4. The insidious cost of allocations - 4 days from now
  5. Find the bug: The concurrent memory buster - 5 days from now

And 4 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats