The operation was successful, but the patient is still dead… deferring the obvious doesn’t work

time to read 2 min | 275 words

So, I have a problem with the profiler. At the root of things, the profiler is managing a bunch of strings (SQL statements, stack traces, alerts, etc). When you start pouring large amount of information into the profiler, the number of strings that it is going to keep in memory is going to increase, until you get to say hello to OutOfMemoryException.

During my attempt to resolve this issue, I figured out that string interning was likely to be the most efficient way to resolve my problem. After all, most of the strings that I have to display are repetitive. String interning has one problem, it exists forever. I spent a few minutes creating a garbage collectible method of doing string interning. In my first test, which was focused on just interning stack traces, I was able to reduce memory consumption by 50% (about 800Mb, post GC) and it is fully garbage collectible, so it won’t hung around forever.

Sounds good, right?

Well, not really. While it is an interesting thought experiment, using interning is a great way of handling things, but it only mask the problem, and that only for a short amount of time. The problem is still an open ended set of data that I need to deal with, and while there are a whole bunch of stuff that I can do to delay the inevitable, defeat is pretty much ensured. The proper way of doing that is not trying to use hacks to reduce memory usage, but to deal with the root cause, keeping everything in memory.

Tweet Share Share 14 comments

Tags:

Comments

26 Dec 2009
11:55 AM

Szymon Kulec

Hi Ayende,

will you end up with a db for your profiler?

26 Dec 2009
12:18 PM

Michal Chaniewski

Maybe memory-mapped files?

26 Dec 2009
12:47 PM

Cengiz Han

You can provide an option for your tools like "use mmf" and if users encounter with OOME exceptions can try to work in mmf mode.

http://mmf.codeplex.com/ you can check this project (memory mapped files)

and also .net 4.0 has system.io. memorymappedfiles namespace for this kind of job.

btw. luckly, I didn't need to use this projects for now.

26 Dec 2009
19:57 PM

FallenGameR

When OS runs out of memory it swaps.

Either you make some sort of clustering (main string + differences) or you swap memory to disk.

27 Dec 2009
06:14 AM

Jeff Brown

Now wondering whether the next series of posts will feature disk based index shards and external sorting...

Or perhaps it will be about creating summaries...

27 Dec 2009
19:49 PM

Patrick Smacchia

Just an idea that would take a bit of CPU cycles but can drastically reduce memory. In a code base, just a relatively small set of word is used in code elements identifiers. (Do, Go, String, Session, Event, On, Profile, Handler, get_, (, ), User...). You can harness that fact to your advantage by building a set of these words on the fly and a string identifier becomes an array of ids of these words.

Another idea: store strings using UTF8, since 99% of char in identifiers are in the first 128 ASCII chars (and take only one byte footprint instead of two).

Recently I applied several others tricky-like algo in NDepend with incredible performance/mem consumption improvements. What I discovered in 2009 is that using very tricky ideas, I never heard about in 15 years of programming, pays off. The result is to be released in a few weeks, I cannot wait :o)

27 Dec 2009
20:15 PM

Ayende Rahien

Patick,

That is more or less what I meant by interning strings.

As for UTF8, it doesn't work if you need to translate to standard strings all the time, you would allocate a lot of mem just that way

27 Dec 2009
21:55 PM

Patrick Smacchia

The problem is to keep many string values alive (i.e not collectable by GC) in memory, so you need a way to compress/uncompress strings values (with a set of words or UTF8 or anything else).

String interning is not really compression in the sense that you keep the whole strings values in memory. Strings interning just avoid duplication of same values. Moreover, string interning is not computation free, computing hash code on strings + dico search is not free and even, proposed string hash code impl in .NET Fx are far from being optimal in terms of performance.

I think you need something more tricky than string interning to save some memory.

As for UTF8, it doesn't work if you need to translate to standard strings all the time

I don't get you, once you need a string value, just uncompress it, use it, and immediately release the reference?!

28 Dec 2009
05:00 AM

Ayende Rahien

Patrick,

You are correct, in a sense.

The problem is that uncompressing a string will require me to allocate more memory, which leads to the GC having to do more work in the end.

My main worry isn't CPU time, it is memory, and having to keep allocating new strings (by the millions, btw) leads to a lod of garbage that the GC have to clean up.

28 Dec 2009
07:56 AM

Patrick Smacchia

Are you sure that you can overwhel the GC this way? Internally the GC has its own heap for string and is extremely optimized to allocate/deallocate many strings. When it comes to performance and GC, common sense and suppositions are of poor help. Code it, measure it, and see where is the bottleneck, if any. Many times I had good surprises. And when I had bad surpirses, they never came from where I expected.

28 Dec 2009
11:30 AM

Ayende Rahien

Patrick,

I am speaking out of experience here, I just finished doing MAJOR perf session on the profiler.

28 Dec 2009
11:46 AM

Patrick Smacchia

I am surprised that your program overwhelm the GC by allocating short-living strings. Certainly the GC team would be interested by your experience.

I am not sure to understand, what is the content of your strings, code element names? profiler events descripion?

28 Dec 2009
12:06 PM

Ayende Rahien

Patrick,

They aren't short lived, they go, almost always as-is, to live as part of the in memory model

28 Dec 2009
12:07 PM

Ayende Rahien

Sorry, what I meant, is that they aren't short lived, they are put into a processing queue and it may be quite a while until they are processed and some of them discarded

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB