Fighting the profiler memory obesity

Dec 29 2009

Fighting the profiler memory obesity

time to read 3 min | 597 words

When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account:

Speed in serializing / deserializing.
Ability to intervene in the serialization process at a deep level.
Size (also effect speed).

The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed & size of serialization by making sure that the serialization pipeline knows exactly what is going on (string tables & flyweight objects).

I started looking into the standard .NET serialization pipeline, but that was quickly ruled out. There are several reasons for that, first, you literally cannot hook deep enough into the serialization pipeline to do the sort of things that I wanted to do (you cannot override how System.String get persisted), and it is far too slow for my usages.

My test data started as a ~900Mb of messages, which I loaded into the profiler (resulting in a 4 GB footprint during processing and a 1.5GB footprint when processing is done). Persisting the in memory objects using BinaryFormatter resulted in a file whose size is 454Mb and whose deserialization I started before I started writing this post and at this point in time has not completed yet. Currently the application (simple cmd line test app that only does deserialization, takes 1.4 GB).

So that was utterly out. So I set out to write my own serialization format. Since I wanted it to be fast, I couldn’t use reflection, (BF app currently takes 1.6 GB) but by the same token, writing serialization by hand is labor intensive, error prone method. That lives aside the question of handling changes in the objects down the road, that is not something that I would like to do.

Having come to that conclusion, I decided to make use of CodeDOM to generate a serialization assembly on the fly. That would give me the benefits of no reflection, handle addition of new members to the serialized objects and would allow me to incrementally improve how (BF app now takes 2.2 GB, and I am getting ready to kill it). My first attempt in doing so, applying absolutely not optimization techniques, result in a 381 Mb file and an 8 seconds parsing time.

That is pretty good, but I wanted to do a bit more.

Now, note that this is an implementation specific for a single use. After applying a simple string table optimization, the results of the serialization are two files, the string table is 10Mb in length and the actual saved data is 215Mb and de-serialization takes ~10 seconds. Taking a look at what actually happened, it looked like the cost of maintaining string table is quite high. Since I care more about responsiveness than file size, and since the code for maintaining the string table is complex, I dropped that in favor of in memory only MRU string interning.

Initial testing shows that this should be quite efficient in reducing memory usage. In fact, in my test scenario, memory consumption during processing dropped down 4 GB to just 1.8 – 1.9 GB and 1.2 GB when processing is completed. And just using the application shows that the user level performance is pretty good, even if I say so myself.

There are additional options that I intend to take, but I’ll talk about them in a later post.

Tweet Share Share 18 comments

Tags:

Comments

29 Dec 2009
12:04 PM

Rafal

How comes that serializing 900 Mb of messages gives a 454Mb file?

29 Dec 2009
12:05 PM

Ayende Rahien

Some of the data is thrown away, or kept in a much more compact form.

29 Dec 2009
12:13 PM

Zamboch

When deserializing, how much of the time take memory allocation itself ?

29 Dec 2009
12:15 PM

Zamboch,

When deserializing, the data is pretty small,so I never actually had to check that.

29 Dec 2009
12:19 PM

Julien

Have you look into google's protocols buffers? it's a lot faster and space efficient than BinarryFormatter : code.google.com/p/protobuf-net/wiki/Performance

29 Dec 2009
12:22 PM

Julien,

I am making use of Google Buffer already.

The problem is that it requires me to adhere to a very rigid structure, which would limit my ability to work with it

29 Dec 2009
12:24 PM

Imran

Interesting stuff, would be interested in seeing some of the code you have used to do this. Any chance of posting part of it?

29 Dec 2009
14:36 PM

Frans Bouma

You didn't look at the fast serialization articles I mentioned to you ?

29 Dec 2009
16:01 PM

Rob Eisenberg

Ok. This is really going to frustrate you. But, there is not CodeDom in Silverlight ;( There is Reflection.Emit though...

29 Dec 2009
16:03 PM

Frans,

I did.

It did a lot of things manually, and that wasn't what I was after.

Rob,

There is always pre-build step :-)

29 Dec 2009
16:05 PM

Ah! Great! Of coarse. I was feeling rather lowly being the bearer of bad news after such great progress...

29 Dec 2009
16:31 PM

Anon

Are you planning to handle the scenario of a initial file that is larger than total available memory for these changes?

29 Dec 2009
17:56 PM

Anon,

That isn't an issue. I am only loading chunks (a single session) from the file, so unless I have a single session that consume more than the available memory, it is a non issue.

And if I do, well, there are other problems to be solved there first.

29 Dec 2009
20:54 PM

Ori Almog

What about T4 to generate your serialization code, may be easier to maintain?

29 Dec 2009
23:26 PM

Jon V

I remember something about a fast serialization approach developed by Greg Young. Would this be helpful to you?

codebetter.com/.../fast-serialization.aspx

30 Dec 2009
04:56 AM

Dinesh Gajjar

Just a question : Since you are serializing huge amount of data and then filtering, couldn't you use some embedded Database like SQLite ? I have used it in some situations like this and found it to be useful.

But to be honest, no idea of the challenges you are dealing here, so i may be wrong.

30 Dec 2009
05:06 AM

Sorry, I found the answer in your previous post :). Blogs shows latest first, so I was reading this one before the persistence one :)

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB