<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:copyright="http://blogs.law.harvard.edu/tech/rss" xmlns:image="http://purl.org/rss/1.0/modules/image/">
    <channel>
        <title>Performance</title>
        <link>http://ayende.com/Blog/category/493.aspx</link>
        <description>Performance</description>
        <language>en-US</language>
        <copyright>Ayende Rahien</copyright>
        <managingEditor>Ayende@ayende.com</managingEditor>
        <generator>Subtext Version 2.0.0.0</generator>
        <item>
            <title>Sometimes you really need a profiler handy</title>
            <link>http://ayende.com/Blog/archive/2010/03/11/sometimes-you-really-need-a-profiler-handy.aspx</link>
            <description>&lt;p&gt;As part of the performance work I am doing for &lt;a href="http://nhprof.com"&gt;Uber&lt;/a&gt; &lt;a href="http://efprof.com"&gt;Prof&lt;/a&gt;, I fixed a couple of issues related to profiling &lt;em&gt;very&lt;/em&gt; busy applications.&lt;/p&gt;  &lt;p&gt;Here is the result on processing a 4GB input file.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Sometimesyoureallyneedaprofilerhandy_11D08/image_2.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Sometimesyoureallyneedaprofilerhandy_11D08/image_thumb.png" width="964" height="335" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;I can’t say that the profiler is &lt;em&gt;happy &lt;/em&gt;about it, but it works. &lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11344.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/03/11/sometimes-you-really-need-a-profiler-handy.aspx</guid>
            <pubDate>Thu, 11 Mar 2010 10:00:00 GMT</pubDate>
            <wfw:comment>http://ayende.com/Blog/comments/11344.aspx</wfw:comment>
            <comments>http://ayende.com/Blog/archive/2010/03/11/sometimes-you-really-need-a-profiler-handy.aspx#feedback</comments>
            <slash:comments>2</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11344.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Performance optimizations, managed code and leaky abstractions</title>
            <link>http://ayende.com/Blog/archive/2010/02/09/performance-optimizations-managed-code-and-leaky-abstractions.aspx</link>
            <description>&lt;p&gt;I run into &lt;a href="http://www.codinghorror.com/blog/archives/000299.html"&gt;this post&lt;/a&gt; from Jeff Atwood, talking about the performance difference between managed and unmanaged code:&lt;/p&gt;  &lt;p&gt;&lt;img alt="C# versus unmanaged C++, Chinese/English dictionary reader" src="http://www.codinghorror.com/blog/images/performance_quiz_graph.gif" /&gt;&lt;/p&gt;  &lt;p&gt;There were a &lt;em&gt;lot&lt;/em&gt; of optimizations for this along the way, but the C++ version has soundly beaten the C# version. As expected, right?&lt;/p&gt;  &lt;p&gt;Well, yes, but with &lt;a href="http://blogs.msdn.com/ricom/archive/2005/05/19/420158.aspx"&gt;extenuating circumstances&lt;/a&gt;. &lt;/p&gt;  &lt;blockquote&gt;&lt;i&gt;So am I ashamed by my crushing defeat? Hardly. &lt;b&gt;The managed code achieved a very good result for hardly any effort.&lt;/b&gt; To defeat the managed version, Raymond had to:       &lt;ul&gt;       &lt;li&gt;Write his own file/io stuff &lt;/li&gt;        &lt;li&gt;Write his own string class &lt;/li&gt;        &lt;li&gt;Write his own allocator &lt;/li&gt;        &lt;li&gt;Write his own international mapping&lt;/li&gt;     &lt;/ul&gt;      &lt;p&gt;Of course he used available lower level libraries to do this, but that's still a lot of work. Can you call what's left an STL program? I don't think so, I think he kept the std::vector class which ultimately was never a problem and he kept the find function. Pretty much everything else is gone.&lt;/p&gt;      &lt;p /&gt;   &lt;/i&gt;    &lt;p&gt;&lt;i&gt;So, yup, you can definitely beat the CLR. I think Raymond can make his program go even faster.&lt;/i&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I find this interesting, because it isn’t really specific for C++, in my recent performance sprint for the profiler, I had to:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Write my own paging system&lt;/li&gt;    &lt;li&gt;Write my own string parsing routines&lt;/li&gt;    &lt;li&gt;Write my own allocator&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;For the most part, performance optimizations fall into four categories:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Inefficient algorithms – O(N) notation, etc.&lt;/li&gt;    &lt;li&gt;Inefficient execution – not applying caching, doing too much work upfront, doing unneeded work.&lt;/li&gt;    &lt;li&gt;I/O Bound – the execution waits for a file, database, socket, etc.&lt;/li&gt;    &lt;li&gt;CPU Bound – it just takes a lot of calculations to get the result.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I can think of very few problems that are really CPU Bounded, they tend to be very specific and small. And those are just about the only ones that’ll gain any real benefit from a faster code. Of course, in pure math scenarios, which is pretty much where most of the CPU Bound code reside, there isn’t much of a difference between the language that you choose (assuming it is not interpreted, at least, and that you can run directly on the CPU using native instructions). But as I said, those &lt;em&gt;are&lt;/em&gt; pretty rare.&lt;/p&gt;  &lt;p&gt;In nearly all cases, you’ll find that the #1 cause for perf issues is IO. Good IO strategies (buffering, pre-loading, lazy loading, etc) are usually applicable for specific scenarios, but they are the ones that will make a world of difference between poorly performing code and highly performing code. Caching can also make a huge difference, as well as differing work to when it is actually needed.&lt;/p&gt;  &lt;p&gt;I intentionally kept the “optimize the algorithm” for last, because while it can have drastic performance difference, it is also the easiest to do, since there is so much information about it, assuming that you didn’t accidently got yourself into an O(N^2) or worse.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11309.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/02/09/performance-optimizations-managed-code-and-leaky-abstractions.aspx</guid>
            <pubDate>Tue, 09 Feb 2010 10:00:00 GMT</pubDate>
            <wfw:comment>http://ayende.com/Blog/comments/11309.aspx</wfw:comment>
            <comments>http://ayende.com/Blog/archive/2010/02/09/performance-optimizations-managed-code-and-leaky-abstractions.aspx#feedback</comments>
            <slash:comments>10</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11309.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Why all the performance posts? The shocking truth!</title>
            <link>http://ayende.com/Blog/archive/2010/01/05/why-all-the-performance-posts-the-shocking-truth.aspx</link>
            <description>&lt;p&gt;I was quite amazed by the number of conspiracy theories that were brought up by &lt;a href="http://ayende.com/Blog/archive/2010/01/03/why-all-the-performance-posts.aspx#feedback"&gt;this post&lt;/a&gt;. Some of them in the comments, some of them in private communications.&lt;/p&gt;  &lt;p&gt;The reason, the real &amp;amp; only one, that I had so many posts lately about performance is quite simple. I did a lot of that recently, and one aspect of perf testing that I didn’t talk about is that most perf test run takes a long time, that means that I had a lot of free time. Free time for me usually translate into posting time :-)&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11275.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/01/05/why-all-the-performance-posts-the-shocking-truth.aspx</guid>
            <pubDate>Tue, 05 Jan 2010 14:47:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2010/01/05/why-all-the-performance-posts-the-shocking-truth.aspx#feedback</comments>
            <slash:comments>6</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11275.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Patterns for reducing memory usage</title>
            <link>http://ayende.com/Blog/archive/2010/01/02/patterns-for-reducing-memory-usage.aspx</link>
            <description>&lt;p&gt;Memory problems happen when you application use more memory that you would like. It isn’t necessarily paging or causing OutOfMemory, but it &lt;em&gt;is &lt;/em&gt;using enough memory to generate complaints. The most common cases for memory issues are:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Memory leaks&lt;/li&gt;    &lt;li&gt;Garbage spewers&lt;/li&gt;    &lt;li&gt;In memory nuts&lt;/li&gt;    &lt;li&gt;Framework bugs&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Let me take each of them in turn.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Memory leaks &lt;/strong&gt;in a managed language are almost always related to dangling references, such as in a cache with no expiration or events where you never unsubscribe. Those are usually nasty to figure out, because tracking down what is holding the memory can be unpleasant. But, by the same token, it is also fairly straightforward to do so.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Garbage spewers &lt;/strong&gt;are pieces of code that allocate a lot of memory that will have to be freed soon afterward. A common case of that is:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;string&lt;/span&gt; Concat(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] items)
{
   &lt;span class="kwrd"&gt;string&lt;/span&gt; result = &lt;span class="str"&gt;""&lt;/span&gt;;
   &lt;span class="kwrd"&gt;foreach&lt;/span&gt;(var item &lt;span class="kwrd"&gt;in&lt;/span&gt; items)
      results += item;
 
   &lt;span class="kwrd"&gt;return&lt;/span&gt; result;
}&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;This is going to allocate a lot of memory, which will have to be freed soon after. This &lt;em&gt;will&lt;/em&gt; get cleaned up eventually, but it will put a lot of pressure on the GC first, will cause the application to consume more memory and in general won’t play nice with others. While the code above is the simplest way to explain this, it is fairly common in ways that are harder to detect, a common case would be to load a DTO from the database, convert that to an entity and convert that to a view model. Along the way, you are going to consume a lot of memory for doing pretty much the same thing.&lt;/p&gt;

&lt;p&gt;Now the caveat here is that most objects are actually small, so you don’t really notice that, but if you are working with large objects, or a lot of them, this is something that &lt;em&gt;is&lt;/em&gt; going to hit you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In memory nuts &lt;/strong&gt;refer to a common problem, you simply put your entire dataset in memory, and commonly refer to it by direct model traversal. When your dataset becomes too big, however… well, that is the point where the pain is &lt;em&gt;really&lt;/em&gt; going to hit you. Usually, fixing this is a costly process, because your code assumes that the entire thing is in memory. Even if you can easily save it to persistent storage, fixing all the places where the code assumes that everything is just a pointer reference away is a big problem. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Framework bugs &lt;/strong&gt;are my least favorite, it is when you run into cases where the framework just won’t release memory. Most often, this is because you are doing something wrong, but occasionally you &lt;em&gt;will&lt;/em&gt; hit the real framework bug, and tracking that down is a pure PITA.&lt;/p&gt;

&lt;p&gt;In all cases, you need to set up some goals, what is acceptable memory usage, in what scenarios, over what time frame, etc. Then build test scenarios that are repeatable and try each of your improvements out. Do &lt;em&gt;not&lt;/em&gt; try to implement too much upfront, that way lies the road to madness.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11263.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/01/02/patterns-for-reducing-memory-usage.aspx</guid>
            <pubDate>Sat, 02 Jan 2010 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2010/01/02/patterns-for-reducing-memory-usage.aspx#feedback</comments>
            <slash:comments>8</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11263.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Micro optimization decision process</title>
            <link>http://ayende.com/Blog/archive/2010/01/01/micro-optimization-decision-process.aspx</link>
            <description>&lt;p&gt;There are some parts of our codebase that are simply going to have to be called a large number of times. Those are the ones that we want to optimize, but at the same time, unless they are ridiculously inefficient, there isn’t that much &lt;em&gt;room&lt;/em&gt; for improvement.&lt;/p&gt;  &lt;p&gt;Let us look at this for a second:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Microoptimizationdecisionprocess_DCAC/image%5B2%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image_thumb" border="0" alt="image_thumb" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Microoptimizationdecisionprocess_DCAC/image_thumb_4d172f7e-8f4d-4365-9b89-a9869e6e4180.png" width="1065" height="82" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The numbers are pretty hard to read in this manner, so I generally translate it to the following table:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="461"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="200"&gt;Method name&lt;/td&gt;        &lt;td valign="top" width="259"&gt;Cost per&lt;strong&gt; 1,000&lt;/strong&gt; invocations&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;StringEqualsToBuffer&lt;/td&gt;        &lt;td valign="top" width="259"&gt;7 ms&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;get_Item&lt;/td&gt;        &lt;td valign="top" width="259"&gt;0.2 ms&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;get_Length&lt;/td&gt;        &lt;td valign="top" width="259"&gt;0.2 ms&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;GetHashCode&lt;/td&gt;        &lt;td valign="top" width="259"&gt;4 ms&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;Equals&lt;/td&gt;        &lt;td valign="top" width="259"&gt;1 ms&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;It is important to note that what I am trying to check here is relative cost of calling a method. I use the thousands invocation just to give us back a number that we can actually understand easily, instead of dealing with nanoseconds.&lt;/p&gt;  &lt;p&gt;As you can see, all of the methods in this piece of code are actually pretty fast, the slowest will complete in under ten nanoseconds. The problem is that they are called a &lt;em&gt;lot&lt;/em&gt;. StringEqualsToBuffer cost me 90 &lt;em&gt;seconds&lt;/em&gt; in this test run. This means that to improve its performance, we need to get it to drop to even fewer nanoseconds, or reduce the number of times it is called. Both of which are going to be hard.&lt;/p&gt;  &lt;p&gt;You can look at &lt;a href="http://ayende.com/Blog/archive/2009/12/30/when-mini-benchmarks-are-important.aspx"&gt;how I dealt with this particular case in this post&lt;/a&gt;, but right now I want to talk about the decision &lt;em&gt;process&lt;/em&gt;, not just the action that I took.&lt;/p&gt;  &lt;p&gt;Usually, in such situations, I find the most costly function (StringEqualsToBuffer in this case) and then find any functions that it called, in this case, we can see that get_Item and get_Length are both costly functions called from StringEqualsToBuffer. Stupid micro optimization tactics, like referencing a field directly instead of through a property have enormous consequences in this type of scenario.&lt;/p&gt;  &lt;p&gt;Next, we have things like GetHashCode, which looks to be very slow (it takes 4 nanoseconds to complete, I have hard time calling it slow :-)). This function is slow not because we are doing something that can be optimized, but simply because of what it does. Since we can’t optimize the code itself, we want to do the next best thing, and see if we can optimize the number of times that this code is &lt;em&gt;called&lt;/em&gt;. In other words, apply caching to the issue. Applying caching means that we need to handle invalidation, so we need to consider whatever we will gain something from that, mind you. Often, the cost of managing the cache can be higher than the cost of calculating the value from scratch when we are talking about this kind of latencies.&lt;/p&gt;  &lt;p&gt;Another issue to consider is the common memory vs. time argument, it is easy to err into one side of them when you are focused on micro benchmarks. You get a routine that completes in 1 nanosecond in the common case but uses up 10 Mb of cache. Sometimes you want that, sometimes it is a &lt;em&gt;very&lt;/em&gt; bad tradeoff.&lt;/p&gt;  &lt;p&gt;I generally start with simple performance tuning, finding out hotspots and figuring out how to fix them. Usually, it is some sort of big O problem, either in the function itself or what it is called on. Those tend to be easy to fix and produce a &lt;em&gt;lot&lt;/em&gt; of benefit. Afterward, you get to true algorithmic fixes (find a better algo for this problem). Next, I run tests for memory usage, seeing if under the most extreme likely conditions, I am hitting my specified limits.&lt;/p&gt;  &lt;p&gt;I’ll talk about reducing memory usage in a separate post, but once you run through that, another run to verified that you haven’t traded off in the other direction (reduced memory at the expense of running time) would complete the process.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11262.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/01/01/micro-optimization-decision-process.aspx</guid>
            <pubDate>Fri, 01 Jan 2010 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2010/01/01/micro-optimization-decision-process.aspx#feedback</comments>
            <slash:comments>3</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11262.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Memory obesity and the curse of the string</title>
            <link>http://ayende.com/Blog/archive/2009/12/31/memory-obesity-and-the-curse-of-the-string.aspx</link>
            <description>&lt;p&gt;I believe that I have mentioned that my major problem with the memory usage in the profiler is with strings. The profiler is doing a &lt;em&gt;lot&lt;/em&gt; with strings, queries, stack traces, log messages, etc are all creating quite a lot of strings that the profiler needs to inspect, analyze and finally produce the final output.&lt;/p&gt;  &lt;p&gt;Internally, the process looks like this:&lt;/p&gt; &lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Memoryobesityandthecurseofthestring_C82F/image_2.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Memoryobesityandthecurseofthestring_C82F/image_thumb.png" width="694" height="272" /&gt;&lt;/a&gt;   &lt;p&gt;On my previous post, I talked about the two major changes that I made so far to reduce memory usage, you can see them below. I introduced string interning in the parsing stage and serialized the model to disk so we wouldn’t have to keep it all in memory, which resulted in the following structure:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Memoryobesityandthecurseofthestring_C82F/image_4.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Memoryobesityandthecurseofthestring_C82F/image_thumb_1.png" width="694" height="330" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;However, while those measures helped tremendously, there is still more that I can do. The &lt;em&gt;major&lt;/em&gt; problem with string interning is that you first have to have the string in order to check it in the interned table. That means that while you save on memory in the long run, in the short run, you are still allocating a lot of strings. My next move is to handle interning directly from buffered input, skipping the need to allocate memory for a string to use as the key for interning.&lt;/p&gt;  &lt;p&gt;Doing that has been a bit hard, mostly because I had to go deep into the serialization engine that I use (Protocol Buffers) and add that capability. It is also fairly complex to handle something like this without having to allocating a search key in the strings table. But, once I did that, I noticed three things.&lt;/p&gt;  &lt;p&gt;First, while memory increased during operation, there weren’t any jumps &amp;amp; drops, that is, we couldn’t see any periods in which the GC kicked in and released a lot of garbage. Second, memory consumption was relatively low through the operation. Before optimizing the memory usage, we are talking about 4 GB for processing and 1.5 GB for final result, after the previous optimization it was 1.9 GB for processing and 1.3 for final result. But after this optimization, we have a fairly simple upward spike up to 1.3 GB. You can see the memory consumption during processing in the following chart, memory used in in GB on the Y axis.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Memoryobesityandthecurseofthestring_C82F/image_6.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Memoryobesityandthecurseofthestring_C82F/image_thumb_2.png" width="518" height="280" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can probably tell, I am &lt;em&gt;much&lt;/em&gt; happier with the green line than the other. Not only just because it takes less memory in general, but because it is much more predictable, it means that the application’s behavior is going to be easier to reason about. &lt;/p&gt;  &lt;p&gt;But this optimization brings to mind the question, since I just introduced interning at the serialization level, do I really need to have interning at the streaming level? On the face of it, it looks like an unnecessary duplication. Indeed, removing the string interning that we did in the streaming level reduce overall memory usage from 1.3GB to 1.15GB.&lt;/p&gt;  &lt;p&gt;Overall, I think this is a nice piece of work.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11261.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/12/31/memory-obesity-and-the-curse-of-the-string.aspx</guid>
            <pubDate>Thu, 31 Dec 2009 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/12/31/memory-obesity-and-the-curse-of-the-string.aspx#feedback</comments>
            <slash:comments>8</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11261.aspx</wfw:commentRss>
        </item>
        <item>
            <title>When mini benchmarks are important</title>
            <link>http://ayende.com/Blog/archive/2009/12/30/when-mini-benchmarks-are-important.aspx</link>
            <description>&lt;blockquote&gt;   &lt;p&gt;"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - &lt;a href="http://en.wikipedia.org/wiki/Donald_Knuth"&gt;Donald Knuth&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt; &lt;dl /&gt;  &lt;p&gt;I have expressed my dislike for micro benchmarks in the past, and in general, I still have this attitude, but sometimes, you &lt;em&gt;really&lt;/em&gt; care. &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;A small note, while a lot of namespaces you are going to see are Google.ProtocolBuffers, this represent my private fork of &lt;a href="http://github.com/jskeet/dotnet-protobufs"&gt;this library&lt;/a&gt; that was customized to fit UberProf’s needs. Some of those things aren’t generally applicable (like string interning at the serialization level), so please don’t try to project from the content of this post to the library itself. &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Let me show you what I mean:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Whenminibenchmarksareimportant_D033/image_2.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Whenminibenchmarksareimportant_D033/image_thumb.png" width="1065" height="82" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p /&gt;  &lt;p&gt;The following is profiling this piece of code:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;bool&lt;/span&gt; StringEqaulsToBuffer(ByteString byteString, ByteBuffer byteBuffer)
{
    &lt;span class="kwrd"&gt;if&lt;/span&gt;(byteString.Length != byteBuffer.Length)
        &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;false&lt;/span&gt;;
    &lt;span class="kwrd"&gt;for&lt;/span&gt; (&lt;span class="kwrd"&gt;int&lt;/span&gt; i = 0; i &amp;lt; byteString.Length; i++)
    {
        &lt;span class="kwrd"&gt;if&lt;/span&gt;(byteString[i] != byteBuffer.Buffer[byteBuffer.Offset+i])
            &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;false&lt;/span&gt;;
    }
    &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;true&lt;/span&gt;;
}&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;This looks pretty simple right?&lt;/p&gt;

&lt;p&gt;Now, it is important to understand that this isn’t some fake benchmark that I contrived, this is the profile results from testing a real world scenario. In general, methods such as Equals or GetHashCode, or anything that they call, is likely to be called a &lt;em&gt;lot&lt;/em&gt; of times, so paying attention to its performance is something that you should think about.&lt;/p&gt;

&lt;p&gt;This are a couple of very easy things that I can do to make this easier, remove the call to the ByteString indexer (which show up as get_Item in the profiler results) to a direct array access and consolidate the calls to the ByteString.Length property.&lt;/p&gt;

&lt;p&gt;After applying those two optimizations, we get the following code:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;bool&lt;/span&gt; StringEqaulsToBuffer(ByteString byteString, ByteBuffer byteBuffer)
{
    var strLen = byteString.Length;
    &lt;span class="kwrd"&gt;if&lt;/span&gt;(strLen != byteBuffer.Length)
        &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;false&lt;/span&gt;;
    &lt;span class="kwrd"&gt;for&lt;/span&gt; (&lt;span class="kwrd"&gt;int&lt;/span&gt; i = 0; i &amp;lt; strLen; i++)
    {
        &lt;span class="kwrd"&gt;if&lt;/span&gt;(byteString.bytes[i] != byteBuffer.Buffer[byteBuffer.Offset+i])
            &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;false&lt;/span&gt;;
    }
    &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;true&lt;/span&gt;;
}&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;And this profiler result:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Whenminibenchmarksareimportant_D033/image_4.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Whenminibenchmarksareimportant_D033/image_thumb_1.png" width="1151" height="81" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;You can see that the this simple change resulted in &lt;em&gt;drastic&lt;/em&gt; improvement to the StringEqualsToBuffer mehtod. As it stands now, I don’t really see a good way to optimize this any further, so I am going to look at the other stuff that showed up. Let us take a look at ByteBuffer.GetHashCode() now:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;override&lt;/span&gt; &lt;span class="kwrd"&gt;int&lt;/span&gt; GetHashCode()
{
    var ret = 23;
    &lt;span class="kwrd"&gt;for&lt;/span&gt; (var i = Offset; i &amp;lt; Offset+Length; i++)
    {
        ret = (ret &amp;lt;&amp;lt; 8) | Buffer[i];
    }
    &lt;span class="kwrd"&gt;return&lt;/span&gt; ret;
}&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;The problem is that I don’t really see a way to optimize that, instead, I am going to cache that in a field. There is some problem here with the fact that ByteBuffer is mutable, but I can handle that by forcing all call sites that change it to call a method that will force hash recalculation. Note how different this decision is from the usual encapsulation that I would generally want. Placing additional burdens on call sites is a Bad Thing, but by doing so, I think that I can save quite significantly on the hash code calculation overhead.&lt;/p&gt;

&lt;p&gt;Next, let us look at the DoCleanupIfNeeded method and see why it is taking so much time.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; DoCleanupIfNeeded()
{
    &lt;span class="kwrd"&gt;if&lt;/span&gt; (strings.Count &amp;lt;= limit)
        &lt;span class="kwrd"&gt;return&lt;/span&gt;;

    &lt;span class="rem"&gt;// to avoid frequent thrashing, we will remove the bottom 10% of the current pool in one go&lt;/span&gt;
    &lt;span class="rem"&gt;// that means that we will hit the limit fairly infrequently&lt;/span&gt;
    var list = &lt;span class="kwrd"&gt;new&lt;/span&gt; List&amp;lt;KeyValuePair&amp;lt;ByteStringOrByteBuffer, Data&amp;gt;&amp;gt;(strings);
    list.Sort((x, y) =&amp;gt; x.Value.Timestamp - y.Value.Timestamp);

    &lt;span class="kwrd"&gt;for&lt;/span&gt; (&lt;span class="kwrd"&gt;int&lt;/span&gt; i = 0; i &amp;lt; limit/10; i++)
    {
        strings.Remove(list[i].Key);                
    }
}&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;From the profiler output, we can see that it is an anonymous method that is causing the holdup, that is pretty interesting, since this anonymous method is the sort lambda. I decided to see if the BCL can do better, and changed that to:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; DoCleanupIfNeeded()
{
    &lt;span class="kwrd"&gt;if&lt;/span&gt; (strings.Count &amp;lt;= limit)
        &lt;span class="kwrd"&gt;return&lt;/span&gt;;

    &lt;span class="rem"&gt;// to avoid frequent thrashing, we will remove the bottom 10% of the current pool in one go&lt;/span&gt;
    &lt;span class="rem"&gt;// that means that we will hit the limit fairly infrequently&lt;/span&gt;
    var toRemove = strings.OrderBy(x=&amp;gt;x.Value.Timestamp).Take(limit/10).ToArray();

    &lt;span class="kwrd"&gt;foreach&lt;/span&gt; (var valuePair &lt;span class="kwrd"&gt;in&lt;/span&gt; toRemove)
    {
        strings.Remove(valuePair.Key);                
    }
}&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;This isn’t really what I want, since I can’t take a dependency on v3.5 on this code base, but it is a good perf test scenario. Let us see what the profiler output is after those two changes:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Whenminibenchmarksareimportant_D033/image_6.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Whenminibenchmarksareimportant_D033/image_thumb_2.png" width="919" height="110" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;This is much more interesting, isn’t it?&lt;/p&gt;

&lt;p&gt;First, we can see that the call to ByteBuffer.GetHashCode went away, but we have a new one, ByteBuffer.ResetHash. Note, however, that ResetHash only took half as much time as the previous appearance of GetHashCode and that it is called only half as many times. I consider this a net win.&lt;/p&gt;

&lt;p&gt;Now, let us consider the second change that we made, where previously we spend 11.1 seconds on sorting, we can see that we now spend 18 seconds, even if the number of calls is so much lower. That is a net lose, so we will revert that.&lt;/p&gt;

&lt;p&gt;And now, it is the time for the only test that really matters, &lt;em&gt;is it fast enough&lt;/em&gt;? I am doing that by simply running the test scenario outside of the profiler and checking to see if its performance is satisfactory. And so far, I think that it does meet my performance expectation, therefore, I am going to finish with my micro optimizations and move on to more interesting things.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11260.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/12/30/when-mini-benchmarks-are-important.aspx</guid>
            <pubDate>Wed, 30 Dec 2009 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/12/30/when-mini-benchmarks-are-important.aspx#feedback</comments>
            <slash:comments>29</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11260.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Fighting the profiler memory obesity</title>
            <link>http://ayende.com/Blog/archive/2009/12/29/fighting-the-profiler-memory-obesity.aspx</link>
            <description>&lt;p&gt;When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Speed in serializing / deserializing. &lt;/li&gt;    &lt;li&gt;Ability to intervene in the serialization process at a deep level. &lt;/li&gt;    &lt;li&gt;Size (also effect speed). &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed &amp;amp; size of serialization by making sure that the serialization pipeline knows exactly what is going on (string tables &amp;amp; flyweight objects).&lt;/p&gt;  &lt;p&gt;I started looking into the standard .NET serialization pipeline, but that was quickly ruled out. There are several reasons for that, first, you literally cannot hook deep enough into the serialization pipeline to do the sort of things that I wanted to do (you cannot override how System.String get persisted), and it is &lt;em&gt;far&lt;/em&gt; too slow for my usages.&lt;/p&gt;  &lt;p&gt;My test data started as a ~900Mb of messages, which I loaded into the profiler (resulting in a 4 GB footprint during processing and a 1.5GB footprint when processing is done). Persisting the in memory objects using BinaryFormatter resulted in a file whose size is 454Mb and whose deserialization I started before I started writing this post and at this point in time has not completed yet. Currently the application (simple cmd line test app that only does deserialization, takes 1.4 GB).&lt;/p&gt;  &lt;p&gt;So that was utterly out. So I set out to write my own serialization format. Since I wanted it to be fast, I couldn’t use reflection, (BF app currently takes 1.6 GB) but by the same token, writing serialization by hand is labor intensive, error prone method. That lives aside the question of handling changes in the objects down the road, that is &lt;em&gt;not&lt;/em&gt; something that I would like to do.&lt;/p&gt;  &lt;p&gt;Having come to that conclusion, I decided to make use of CodeDOM to generate a serialization assembly on the fly. That would give me the benefits of no reflection, handle addition of new members to the serialized objects and would allow me to incrementally improve how (BF app now takes 2.2 GB, and I am getting ready to kill it). My first attempt in doing so, applying absolutely not optimization techniques, result in a 381 Mb file and an 8 seconds parsing time.&lt;/p&gt;  &lt;p&gt;That is pretty good, but I wanted to do a bit more. &lt;/p&gt;  &lt;p&gt;Now, note that this is an implementation specific for a single use. After applying a simple string table optimization, the results of the serialization are two files, the string table is 10Mb in length and the actual saved data is 215Mb and de-serialization takes ~10 seconds. Taking a look at what actually happened, it looked like the cost of maintaining string table is quite high. Since I care more about responsiveness than file size, and since the code for maintaining the string table is complex, I dropped that in favor of in memory only MRU string interning.&lt;/p&gt;  &lt;p&gt;Initial testing shows that this should be quite efficient in reducing memory usage. In fact, in my test scenario, memory consumption during processing dropped down 4 GB to just 1.8 – 1.9 GB and 1.2 GB when processing is completed. And just using the application shows that the user level performance is &lt;em&gt;pretty&lt;/em&gt; good, even if I say so myself.&lt;/p&gt;  &lt;p&gt;There are additional options that I intend to take, but I’ll talk about them in a later post.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11259.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/12/29/fighting-the-profiler-memory-obesity.aspx</guid>
            <pubDate>Tue, 29 Dec 2009 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/12/29/fighting-the-profiler-memory-obesity.aspx#feedback</comments>
            <slash:comments>18</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11259.aspx</wfw:commentRss>
        </item>
        <item>
            <title>The operation was successful, but the patient is still dead&amp;hellip; deferring the obvious doesn&amp;rsquo;t work</title>
            <link>http://ayende.com/Blog/archive/2009/12/26/the-operation-was-successful-but-the-patient-is-still-deadhellip.aspx</link>
            <description>&lt;p&gt;&lt;img style="display: inline; margin-left: 0px; margin-right: 0px" align="right" src="http://www.napoleonguide.com/images/maps_retrussia_borissov.gif" width="240" height="120" /&gt;So, I have a problem with the profiler. At the root of things, the profiler is managing a bunch of strings (SQL statements, stack traces, alerts, etc). When you start pouring large amount of information into the profiler, the number of strings that it is going to keep in memory is going to increase, until you get to say hello to OutOfMemoryException.&lt;/p&gt;  &lt;p&gt;During my attempt to resolve this issue, I figured out that string interning was likely to be the most efficient way to resolve my problem. After all, most of the strings that I have to display are repetitive. String interning has one problem, it exists forever. I spent a few minutes creating a garbage collectible method of doing string interning. In my first test, which was focused on just interning stack traces, I was able to reduce memory consumption by 50% (about 800Mb, post GC) and it is fully garbage collectible, so it won’t hung around forever.&lt;/p&gt;  &lt;p&gt;Sounds good, right?&lt;/p&gt;  &lt;p&gt;Well, not really. While it is an interesting thought experiment, using interning is a great way of handling things, but it only mask the problem, and that only for a short amount of time. The problem is still an open ended set of data that I need to deal with, and while there are a whole bunch of stuff that I can do to delay the inevitable, defeat is pretty much ensured. The proper way of doing that is not trying to use hacks to reduce memory usage, but to deal with the root cause, keeping everything in memory.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11256.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/12/26/the-operation-was-successful-but-the-patient-is-still-deadhellip.aspx</guid>
            <pubDate>Sat, 26 Dec 2009 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/12/26/the-operation-was-successful-but-the-patient-is-still-deadhellip.aspx#feedback</comments>
            <slash:comments>14</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11256.aspx</wfw:commentRss>
        </item>
        <item>
            <title>UberProf performance improvements, nothing helps if you are stupid</title>
            <link>http://ayende.com/Blog/archive/2009/12/25/uberprof-performance-improvements-nothing-helps-if-you-are-stupid.aspx</link>
            <description>&lt;p&gt;The following change took a while to figure out, but it was a &lt;em&gt;huge&lt;/em&gt; performance benefit (think, 5 orders of magnitude). The code started as:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;readonly&lt;/span&gt; Regex startOfParametersSection = 
            &lt;span class="kwrd"&gt;new&lt;/span&gt; Regex(&lt;span class="str"&gt;@"(;\s*)[@:?]p0 ="&lt;/span&gt;, RegexOptions.Compiled);&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;And the &lt;em&gt;optimization&lt;/em&gt; is:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;readonly&lt;/span&gt; Regex startOfParametersSection = 
            &lt;span class="kwrd"&gt;new&lt;/span&gt; Regex(&lt;span class="str"&gt;@"(;\s*)[@:?]p0 ="&lt;/span&gt;, RegexOptions.Compiled);&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;The story behind this is interesting, this piece of code (and a few others like it) used to be in a class that has a singleton lifestyle. At some point, it was refactored into a command class that is created often, which obviously had… drastic effect on the system performance.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11255.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/12/25/uberprof-performance-improvements-nothing-helps-if-you-are-stupid.aspx</guid>
            <pubDate>Fri, 25 Dec 2009 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/12/25/uberprof-performance-improvements-nothing-helps-if-you-are-stupid.aspx#feedback</comments>
            <slash:comments>9</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11255.aspx</wfw:commentRss>
        </item>
    </channel>
</rss>