Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,550

filter by tags archive

Fighting the profiler memory obesity

time to read 3 min | 597 words

When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account:

  • Speed in serializing / deserializing.
  • Ability to intervene in the serialization process at a deep level.
  • Size (also effect speed).

The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed & size of serialization by making sure that the serialization pipeline knows exactly what is going on (string tables & flyweight objects).

I started looking into the standard .NET serialization pipeline, but that was quickly ruled out. There are several reasons for that, first, you literally cannot hook deep enough into the serialization pipeline to do the sort of things that I wanted to do (you cannot override how System.String get persisted), and it is far too slow for my usages.

My test data started as a ~900Mb of messages, which I loaded into the profiler (resulting in a 4 GB footprint during processing and a 1.5GB footprint when processing is done). Persisting the in memory objects using BinaryFormatter resulted in a file whose size is 454Mb and whose deserialization I started before I started writing this post and at this point in time has not completed yet. Currently the application (simple cmd line test app that only does deserialization, takes 1.4 GB).

So that was utterly out. So I set out to write my own serialization format. Since I wanted it to be fast, I couldn’t use reflection, (BF app currently takes 1.6 GB) but by the same token, writing serialization by hand is labor intensive, error prone method. That lives aside the question of handling changes in the objects down the road, that is not something that I would like to do.

Having come to that conclusion, I decided to make use of CodeDOM to generate a serialization assembly on the fly. That would give me the benefits of no reflection, handle addition of new members to the serialized objects and would allow me to incrementally improve how (BF app now takes 2.2 GB, and I am getting ready to kill it). My first attempt in doing so, applying absolutely not optimization techniques, result in a 381 Mb file and an 8 seconds parsing time.

That is pretty good, but I wanted to do a bit more.

Now, note that this is an implementation specific for a single use. After applying a simple string table optimization, the results of the serialization are two files, the string table is 10Mb in length and the actual saved data is 215Mb and de-serialization takes ~10 seconds. Taking a look at what actually happened, it looked like the cost of maintaining string table is quite high. Since I care more about responsiveness than file size, and since the code for maintaining the string table is complex, I dropped that in favor of in memory only MRU string interning.

Initial testing shows that this should be quite efficient in reducing memory usage. In fact, in my test scenario, memory consumption during processing dropped down 4 GB to just 1.8 – 1.9 GB and 1.2 GB when processing is completed. And just using the application shows that the user level performance is pretty good, even if I say so myself.

There are additional options that I intend to take, but I’ll talk about them in a later post.



How comes that serializing 900 Mb of messages gives a 454Mb file?

Ayende Rahien

Some of the data is thrown away, or kept in a much more compact form.


When deserializing, how much of the time take memory allocation itself ?

Ayende Rahien


When deserializing, the data is pretty small,so I never actually had to check that.

Ayende Rahien


I am making use of Google Buffer already.

The problem is that it requires me to adhere to a very rigid structure, which would limit my ability to work with it


Interesting stuff, would be interested in seeing some of the code you have used to do this. Any chance of posting part of it?

Frans Bouma

You didn't look at the fast serialization articles I mentioned to you ?

Rob Eisenberg

Ok. This is really going to frustrate you. But, there is not CodeDom in Silverlight ;( There is Reflection.Emit though...

Ayende Rahien


I did.

It did a lot of things manually, and that wasn't what I was after.

Ayende Rahien


There is always pre-build step :-)

Rob Eisenberg

Ah! Great! Of coarse. I was feeling rather lowly being the bearer of bad news after such great progress...


Are you planning to handle the scenario of a initial file that is larger than total available memory for these changes?

Ayende Rahien


That isn't an issue. I am only loading chunks (a single session) from the file, so unless I have a single session that consume more than the available memory, it is a non issue.

And if I do, well, there are other problems to be solved there first.

Ori Almog


What about T4 to generate your serialization code, may be easier to maintain?

Dinesh Gajjar

Just a question : Since you are serializing huge amount of data and then filtering, couldn't you use some embedded Database like SQLite ? I have used it in some situations like this and found it to be useful.

But to be honest, no idea of the challenges you are dealing here, so i may be wrong.

Dinesh Gajjar

Sorry, I found the answer in your previous post :). Blogs shows latest first, so I was reading this one before the persistence one :)

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - 12 hours from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats