Ayende @ Rahien

It's a girl

Memory obesity and the curse of the string

I believe that I have mentioned that my major problem with the memory usage in the profiler is with strings. The profiler is doing a lot with strings, queries, stack traces, log messages, etc are all creating quite a lot of strings that the profiler needs to inspect, analyze and finally produce the final output.

Internally, the process looks like this:

image

On my previous post, I talked about the two major changes that I made so far to reduce memory usage, you can see them below. I introduced string interning in the parsing stage and serialized the model to disk so we wouldn’t have to keep it all in memory, which resulted in the following structure:

image

However, while those measures helped tremendously, there is still more that I can do. The major problem with string interning is that you first have to have the string in order to check it in the interned table. That means that while you save on memory in the long run, in the short run, you are still allocating a lot of strings. My next move is to handle interning directly from buffered input, skipping the need to allocate memory for a string to use as the key for interning.

Doing that has been a bit hard, mostly because I had to go deep into the serialization engine that I use (Protocol Buffers) and add that capability. It is also fairly complex to handle something like this without having to allocating a search key in the strings table. But, once I did that, I noticed three things.

First, while memory increased during operation, there weren’t any jumps & drops, that is, we couldn’t see any periods in which the GC kicked in and released a lot of garbage. Second, memory consumption was relatively low through the operation. Before optimizing the memory usage, we are talking about 4 GB for processing and 1.5 GB for final result, after the previous optimization it was 1.9 GB for processing and 1.3 for final result. But after this optimization, we have a fairly simple upward spike up to 1.3 GB. You can see the memory consumption during processing in the following chart, memory used in in GB on the Y axis.

image

As you can probably tell, I am much happier with the green line than the other. Not only just because it takes less memory in general, but because it is much more predictable, it means that the application’s behavior is going to be easier to reason about.

But this optimization brings to mind the question, since I just introduced interning at the serialization level, do I really need to have interning at the streaming level? On the face of it, it looks like an unnecessary duplication. Indeed, removing the string interning that we did in the streaming level reduce overall memory usage from 1.3GB to 1.15GB.

Overall, I think this is a nice piece of work.

When mini benchmarks are important

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth

I have expressed my dislike for micro benchmarks in the past, and in general, I still have this attitude, but sometimes, you really care.

A small note, while a lot of namespaces you are going to see are Google.ProtocolBuffers, this represent my private fork of this library that was customized to fit UberProf’s needs. Some of those things aren’t generally applicable (like string interning at the serialization level), so please don’t try to project from the content of this post to the library itself.

Let me show you what I mean:

image

The following is profiling this piece of code:

private static bool StringEqaulsToBuffer(ByteString byteString, ByteBuffer byteBuffer)
{
    if(byteString.Length != byteBuffer.Length)
        return false;
    for (int i = 0; i < byteString.Length; i++)
    {
        if(byteString[i] != byteBuffer.Buffer[byteBuffer.Offset+i])
            return false;
    }
    return true;
}

This looks pretty simple right?

Now, it is important to understand that this isn’t some fake benchmark that I contrived, this is the profile results from testing a real world scenario. In general, methods such as Equals or GetHashCode, or anything that they call, is likely to be called a lot of times, so paying attention to its performance is something that you should think about.

This are a couple of very easy things that I can do to make this easier, remove the call to the ByteString indexer (which show up as get_Item in the profiler results) to a direct array access and consolidate the calls to the ByteString.Length property.

After applying those two optimizations, we get the following code:

private static bool StringEqaulsToBuffer(ByteString byteString, ByteBuffer byteBuffer)
{
    var strLen = byteString.Length;
    if(strLen != byteBuffer.Length)
        return false;
    for (int i = 0; i < strLen; i++)
    {
        if(byteString.bytes[i] != byteBuffer.Buffer[byteBuffer.Offset+i])
            return false;
    }
    return true;
}

And this profiler result:

image

You can see that the this simple change resulted in drastic improvement to the StringEqualsToBuffer mehtod. As it stands now, I don’t really see a good way to optimize this any further, so I am going to look at the other stuff that showed up. Let us take a look at ByteBuffer.GetHashCode() now:

public override int GetHashCode()
{
    var ret = 23;
    for (var i = Offset; i < Offset+Length; i++)
    {
        ret = (ret << 8) | Buffer[i];
    }
    return ret;
}

The problem is that I don’t really see a way to optimize that, instead, I am going to cache that in a field. There is some problem here with the fact that ByteBuffer is mutable, but I can handle that by forcing all call sites that change it to call a method that will force hash recalculation. Note how different this decision is from the usual encapsulation that I would generally want. Placing additional burdens on call sites is a Bad Thing, but by doing so, I think that I can save quite significantly on the hash code calculation overhead.

Next, let us look at the DoCleanupIfNeeded method and see why it is taking so much time.

private void DoCleanupIfNeeded()
{
    if (strings.Count <= limit)
        return;

    // to avoid frequent thrashing, we will remove the bottom 10% of the current pool in one go
    // that means that we will hit the limit fairly infrequently
    var list = new List<KeyValuePair<ByteStringOrByteBuffer, Data>>(strings);
    list.Sort((x, y) => x.Value.Timestamp - y.Value.Timestamp);

    for (int i = 0; i < limit/10; i++)
    {
        strings.Remove(list[i].Key);                
    }
}

From the profiler output, we can see that it is an anonymous method that is causing the holdup, that is pretty interesting, since this anonymous method is the sort lambda. I decided to see if the BCL can do better, and changed that to:

private void DoCleanupIfNeeded()
{
    if (strings.Count <= limit)
        return;

    // to avoid frequent thrashing, we will remove the bottom 10% of the current pool in one go
    // that means that we will hit the limit fairly infrequently
    var toRemove = strings.OrderBy(x=>x.Value.Timestamp).Take(limit/10).ToArray();

    foreach (var valuePair in toRemove)
    {
        strings.Remove(valuePair.Key);                
    }
}

This isn’t really what I want, since I can’t take a dependency on v3.5 on this code base, but it is a good perf test scenario. Let us see what the profiler output is after those two changes:

image

This is much more interesting, isn’t it?

First, we can see that the call to ByteBuffer.GetHashCode went away, but we have a new one, ByteBuffer.ResetHash. Note, however, that ResetHash only took half as much time as the previous appearance of GetHashCode and that it is called only half as many times. I consider this a net win.

Now, let us consider the second change that we made, where previously we spend 11.1 seconds on sorting, we can see that we now spend 18 seconds, even if the number of calls is so much lower. That is a net lose, so we will revert that.

And now, it is the time for the only test that really matters, is it fast enough? I am doing that by simply running the test scenario outside of the profiler and checking to see if its performance is satisfactory. And so far, I think that it does meet my performance expectation, therefore, I am going to finish with my micro optimizations and move on to more interesting things.

Fighting the profiler memory obesity

When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account:

  • Speed in serializing / deserializing.
  • Ability to intervene in the serialization process at a deep level.
  • Size (also effect speed).

The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed & size of serialization by making sure that the serialization pipeline knows exactly what is going on (string tables & flyweight objects).

I started looking into the standard .NET serialization pipeline, but that was quickly ruled out. There are several reasons for that, first, you literally cannot hook deep enough into the serialization pipeline to do the sort of things that I wanted to do (you cannot override how System.String get persisted), and it is far too slow for my usages.

My test data started as a ~900Mb of messages, which I loaded into the profiler (resulting in a 4 GB footprint during processing and a 1.5GB footprint when processing is done). Persisting the in memory objects using BinaryFormatter resulted in a file whose size is 454Mb and whose deserialization I started before I started writing this post and at this point in time has not completed yet. Currently the application (simple cmd line test app that only does deserialization, takes 1.4 GB).

So that was utterly out. So I set out to write my own serialization format. Since I wanted it to be fast, I couldn’t use reflection, (BF app currently takes 1.6 GB) but by the same token, writing serialization by hand is labor intensive, error prone method. That lives aside the question of handling changes in the objects down the road, that is not something that I would like to do.

Having come to that conclusion, I decided to make use of CodeDOM to generate a serialization assembly on the fly. That would give me the benefits of no reflection, handle addition of new members to the serialized objects and would allow me to incrementally improve how (BF app now takes 2.2 GB, and I am getting ready to kill it). My first attempt in doing so, applying absolutely not optimization techniques, result in a 381 Mb file and an 8 seconds parsing time.

That is pretty good, but I wanted to do a bit more.

Now, note that this is an implementation specific for a single use. After applying a simple string table optimization, the results of the serialization are two files, the string table is 10Mb in length and the actual saved data is 215Mb and de-serialization takes ~10 seconds. Taking a look at what actually happened, it looked like the cost of maintaining string table is quite high. Since I care more about responsiveness than file size, and since the code for maintaining the string table is complex, I dropped that in favor of in memory only MRU string interning.

Initial testing shows that this should be quite efficient in reducing memory usage. In fact, in my test scenario, memory consumption during processing dropped down 4 GB to just 1.8 – 1.9 GB and 1.2 GB when processing is completed. And just using the application shows that the user level performance is pretty good, even if I say so myself.

There are additional options that I intend to take, but I’ll talk about them in a later post.

A persistence problem, irony, your name is…

The major goal that I had in mind for the profiler was online development usage. That is, do something, check the profiler, clear it, do something else, etc. One of the things that I am finding out is that people use it a lot more as a dumping ground. They push a lot of information into it, and then want to sift through that and look at how the application behave, not just a single scenario.

Surprisingly, it works quite well, especially with the recently implemented performance profiling sessions that we just run through. One scenario, however, remains stubbornly outside what the profiler can currently do. When people talk to me about it, they call it load tests profiling, or integration tests profiling. This is when you pour literally gigabytes of information into the profiler. And it works, provided you have enough memory, that is.

If you don’t have enough memory, however, you get to say hello to OutOfMemoryException.

When I dove into this problem I was sure that I would simply find that there is something stupid that I am doing wrong, and that as soon as I’ll figure it out, it will be all right. I actually did find a few places where I could optimize memory usage (reducing lambda usage in favor of cached delegates to named methods, for example), but that only shaved a few percentage points. Trying out string interning actually resulted in a huge saving in memory, but I feel that this is just a stop gag measure. I have to persist the data to disk, rather than keep it in memory.

That lead me to a very interesting problem. What I need is basically a key value store. Interestingly enough, I already wrote one. The problem is that while this would work great right now, I have future plans which means depending on Esent is an… unwise choice. Basically, I would like to be able to run on Mono and/or Silverlight and that rules out using a Windows only / full trust native dll. As they say, a bummer. That requirement also rules out using the various embedded databases as well.

I considered ignoring this requirement and handling it when the times come, but I decided that since this is going to majorly effect how I am going to use it, I can’t really afford to delay that decision. With that in mind, I set out to figure out what I needed:

  • A fast way to store / retrieve a session information along with its associated data (stack trace, alerts, statistics, etc).
  • Ability to store, at a minimum, tens of thousands of items of variable size.
  • A single file (or at least, very few files) – cannot afford to have one item per file (it usually kills the FS).
  • Support updates without re-writing the entire file.
  • Usable from Mono & Silverlight, or easily portable to them.

With that in mind, I decided to take a look at what is already out there.

  • C#-Sqlite looked like it might be the ticker. It is a C# port of the Sqlite database. Unfortunately, I took a look at the code base and it is a port to C#, the code gave me the willies. I don’t feel that I can trust it, and at any rate, it would require me to write a lot of data access code, that is a thing that I am trying to avoid :-). (And no, you can’t use NHibernate with that version, you would have to port the ADO.Net driver as well, and then you wouldn’t be able to use it in Silverlight anyway.)
  • Caching Application Block – because it looked like it had a persistent solution already. That persistent solution is based on several files per item, which is not acceptable. I already tried that route in the past, it is a good way to kill your file system.
  • SilverDB – this is an interesting code base, and a good solution for the problem it is meant to (saving relatively small amount of information to disk). However, I need to save large amounts of information, and I need to handle a lot of updates. SilverDB re-write the entire file whenever it is saving. That has too high a perf cost for my needs.
  • TheCache – I took only a brief look here, but it looks that it is too heavily focused on being a cache to be useful for my purposes.

In fact, given my requirements, it might be interesting to see what I don’t need.

  • Not reliable.
  • Not thread safe.
  • Saving is just a way to free memory.

Given that, I decided to go with the following method:

  • Custom serialization format, allowing me to save space & time using file & memory based string interning.
  • No persistent file index, that can be kept directly in memory.
  • Persisted string interning file.

As you can see, this is a very tailored solution, not something that would be generally useful, but I have great hopes for this.

String Interning: The Garbage Collectible way

Since I know people will want the actual implementation, here is a simple way of handling string interning in a way that will allow you to GC the results at some point. The issue is simple, I want to intern strings (so a string value is only held once through my entire app), but I don’t want to be stuck with them if the profiler state has been clear, for example.

public class GarbageCollectibleStringInterning
{
    private static IDictionary<string,string> strings = new Dictionary<string,string>();

    private static ReaderWriterLockSlim locker = new ReaderWriterLockSlim();
    
    public static void Clear()
    {
        locker.EnterWriteLock();
        try
        {
            strings.Clear();
        }
        finally
        {
            locker.ExitWriteLock();
        }
    }
    
    public static string Intern(string str)
    {
        string val;
        
        locker.EnterReadLock();
        try
        {
            if(strings.TryGetValue(str, out val))
                return val;
        }
        finally
        {
            locker.ExitReadLock();
        }
        
        locker.EnterWriteLock();
        try
        {
            if(strings.TryGetValue(str, out val))
                return val;
                
            strings.Add(str,str);
            return str;
        }
        finally
        {
            locker.ExitWriteLock();
        }
    }
}

This is a fairly simple implementation, a more complex one may try to dynamically respond to GC notification, but I think that this would be useful enough on its own.

Using this approach, I was able to reduce used memory in the profiler by over 50%. I gave up on that approach, however, because while it may reduce the memory footprint, it doesn't actually solve the problem, only delay it.

The operation was successful, but the patient is still dead… deferring the obvious doesn’t work

So, I have a problem with the profiler. At the root of things, the profiler is managing a bunch of strings (SQL statements, stack traces, alerts, etc). When you start pouring large amount of information into the profiler, the number of strings that it is going to keep in memory is going to increase, until you get to say hello to OutOfMemoryException.

During my attempt to resolve this issue, I figured out that string interning was likely to be the most efficient way to resolve my problem. After all, most of the strings that I have to display are repetitive. String interning has one problem, it exists forever. I spent a few minutes creating a garbage collectible method of doing string interning. In my first test, which was focused on just interning stack traces, I was able to reduce memory consumption by 50% (about 800Mb, post GC) and it is fully garbage collectible, so it won’t hung around forever.

Sounds good, right?

Well, not really. While it is an interesting thought experiment, using interning is a great way of handling things, but it only mask the problem, and that only for a short amount of time. The problem is still an open ended set of data that I need to deal with, and while there are a whole bunch of stuff that I can do to delay the inevitable, defeat is pretty much ensured. The proper way of doing that is not trying to use hacks to reduce memory usage, but to deal with the root cause, keeping everything in memory.

Responding to how EF is better NH commentary…

My last post caused quite a bit of furor, and I decided that I wanted to respond to all the comments in a single place.

  • EF has a designer, NHibernate does not.
    This is actually false, NHibernate has multiple designers available for it. Active Writer (Free, OSS, integrated into VS), Visual NHibernate (Commercial, Beta) and LLBLGen (Commercial, forthcoming in early 2010). I would say that using a designer with NHibernate isn’t something that is very common, most people tend to either code gen the entire model and tweak that (minority) or hand craft the model & mapping. That isn’t for lack of options, it is because it is simply more efficient to do so in most cases.
  • EF is from Microsoft.
    Yes, it is. That is a good point, because it reflect on support & documentation. Unfortunately, the fact that this was one of the most prominent reasons quoted in the replies is also interesting. It says a lot about the relative quality of the products themselves. Another issue with a data access framework from Microsoft is that history teaches us that few data access methods from Microsoft survive the 2 years mark, and none survive the 5 years mark. RDO, ADO, ADO.Net, Object Spaces, Linq To Sql, Entity Framework – just to name a few.
  • Downstream integration.
    That was mentioned a few times, as in integration with data services, WCF RIA, etc. I want to divide my comment to that into two parts. First, everything that exists right now can be use with NHibernate. Second, EF is supposed to come with reporting / BI tools in the future. Currently, everything that came up was easily usable with NHibernate, so I am not really worried about it.
  • In the Future it will be awesome.
    Some people pointed out that Microsoft is able to allocate more resources for EF than an OSS project can. That is true to a point. One of the problems that Microsoft is facing is that it has to pay a huge amount of taxes in the way to create a releasable product. That means that it is typically easier for an OSS project to release faster than a comparable Microsoft project.

So far, I find it really interesting that no one came up with any concrete feature that EF can do that NHibernate can’t. I am going to let you in on a secret, when EF was announced, there were exactly two things that it could do that NHibernate could not. Both were fixed before EF 1.0 shipped, just because that fact annoyed me.

Are you telling me that no such thing exists for the new version?

UberProf performance improvements, nothing helps if you are stupid

The following change took a while to figure out, but it was a huge performance benefit (think, 5 orders of magnitude). The code started as:

private readonly Regex startOfParametersSection = 
            new Regex(@"(;\s*)[@:?]p0 =", RegexOptions.Compiled);

And the optimization is:

private static readonly Regex startOfParametersSection = 
            new Regex(@"(;\s*)[@:?]p0 =", RegexOptions.Compiled);

The story behind this is interesting, this piece of code (and a few others like it) used to be in a class that has a singleton lifestyle. At some point, it was refactored into a command class that is created often, which obviously had… drastic effect on the system performance.

UberProf performance improvements, beware of linq query evaluation

This is a diff from the performance improvement effort of UberProf. The simple addition of .ToList() has significantly improved the performance of this function:

image

Why?

Before adding the ToList(), each time we try to run our aggregation functions on the statements enumerable, we would force re-evaluation of the filtering (which can be quite expensive). By adding ToList() I am now making the filtering run only once.

There is another pretty obvious performance optimization that can be done here, can you see it? And why did I choose not to implement it?

What can EF 4.0 do that NHibernate can’t?

I am trying to formulate my formal response to “NH vs. EF” question, and while I have a pretty solid draft ready, I found out that my response is pretty biased. I am not happy with that, so I wanted to pose this as a real question.

So far, I came up with:

  • EF 4.0 has a better Linq provider than the current NHibernate implementation. This is something being actively worked on and the NH 3.0 will fix this gap.

My sense of fairness says that this can’t be it, so please educate me.

Tags:

Published at

Originally posted at

Comments (60)

UberProf performance improvements, or: when O(N^3 + N) is not fast enough

While working on improving the performance of the profiler, I got really annoyed. The UI times for updating sessions with large amount of statements was simply unacceptable. We already virtualized the list UI, so it couldn’t be that. I started tracking it down, and it finally came down to the following piece of code:

protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
    foreach (var statementSnapshot in snapshots)
    {
        var found = (from model in unfilteredStatements
                     where model.Id == statementSnapshot.StatementId
                     select model).FirstOrDefault();

        if (found == null)
        {
            unfilteredStatements.Add(
                new StatementModel(applicationModel, statementSnapshot) {CanJumpToSession = false});
        }
        else
        {
            found.UpdateWith(statementSnapshot);
        }
    }

    if (snapshots.Any())
    {
        hasPolledForStatements = true;
        StatementsChanged(this, 
            new StatementsChangedEventArgs {HasPolledForStatments = hasPolledForStatements});
    }
}

This looks pretty innocent, right? At a guess, what is the performance characteristics of this code?

It isn’t O(N), that is for sure, look at the linq query, it will perform a linear search in the unfilteredStatements, which is an O(N) operation. *

At first glance, it looks like O(N*N), right? Wrong!

unfilteredStatements is an ObservableCollection, and guess what is going on in the CollectionChanged event? A statistics function that does yet another O(N) operation for every single added statement.

Finally, we have the StatementsChanged handler, which will also perform an O(N) operation, but only once.

protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
    var hasStatements = false;
    foreach (var statementSnapshot in snapshots)
    {
        IStatementModel found;
        statementsById.TryGetValue(statementSnapshot.StatementId, out found);
        hasStatements = true;

        if (found == null)
        {
            unfilteredStatements.Add(new StatementModel(applicationModel, statementSnapshot) {CanJumpToSession = false});
            statementsById.Add(statementSnapshot.StatementId, found);
        }
        else
        {
            found.UpdateWith(statementSnapshot);
        }
    }

    if (hasStatements == false) 
        return;
    HandleStatementsOrFiltersChanging(true);
}

The changes here are subtle, first, we killed off the O(N) linq query in favor of an O(1) lookup in a hashtable. Second, unfilteredStatements is now a simple List<T>, and HandleStatementsOrFilterChanging is the one responsible for notifications. Just these two changes were sufficent from the O(N^3+N) that we had before to a simple O(N+N) (because we still run some statistical stuff at the end) which collapses so a simple O(N).

Once that is done, showing sessions with 50,000 statements in them became instantaneous.

* Yes, it is supposed to be O(M) because the unfilteredStatements.Count is different than snapshots.Count, but I am simplifying here to make things easier.

UberProf performance improvements

I mentioned before that I run into some performance problems, I thought it would be interesting to see how I solved them.

The underlying theme was finding O(N) operations are eradicating them. In almost all the cases, the code is more complex, but significantly faster

Old code New code
public static class TimestampExtensions
{
    public static void AddInPlace<T>(this IList<T> self, T model)
        where T : IHaveTimestamp
    {
        int index = self.Count;
        for (var i = 0; i < self.Count; i++)
        {
            if (self[i].Timestamp > model.Timestamp)
            {
                index = i;
                break;
            }
        }
        self.Insert(index, model);
    }    
}
public static class TimestampExtensions
{
    public static void AddInPlace(this IList<SqlStatement> self, SqlStatement model)
    {
        var timestamp = model.Timestamp;
        var index = self.Count;
        if (index > 0 && timestamp >= self[index - 1].Timestamp)
        {
            self.Add(model);
            return;
        }
        for (var i = 0; i < self.Count; i++)
        {
            if (self[i].Timestamp < timestamp) 
                continue;
            index = i;
            break;
        }
        self.Insert(index, model);
    }    
}

Here we optimize for the common case where we add the statement to the end of the list. We also changed from generic method with interface constraint to the actual type used. That allows the JIT to apply additional optimizations. That method alone was responsible for almost 10 minutes in my test runs.

Old code New code
var statement = session.Statements
    .Reverse()
    .Where(x => x.SessionId.Timestamp <= queryDuration.SessionId.Timestamp)
    .FirstOrDefault();
var statement = session.Statements.LastOrDefault();

session.Statements is already ordered by the statements timestamps. The code perform a slightly differently here, and it is less robust to async message streaming (when events that happened sequentially arrive in different order than they occured), but that is rare enough that I feel that the performance boost from this is acceptable.

Old code New code
public override void AfterAttachingToSession(
SessionInformation sessionInformation, SqlStatement statement) { var isSelectStatement = statement.IsSelectStatement; if (isSelectStatement == false) return; if (statement.RawSql == mySqlSelectLastIdentity) return; int countOfStatementsWithIdenticalSql = sessionInformation.Statements .Count(x => x.RawSql == statement.RawSql); if (countOfStatementsWithIdenticalSql <
configuration.MaxRepeatedStatementsForSelectNPlusOne) return; statement.AcceptAlert(AlertInformation.SelectNPlusOne()); }
public int GetCountOfStatementsByRawSql(string rawSql)
{
    int value;
    countsByRawSql.TryGetValue(rawSql, out value);
    return value;
}

public void AddStatement(SqlStatement statement)
{
    int count;
    if (countsByRawSql.TryGetValue(statement.RawSql, out count) == false)
        count = 0;
    countsByRawSql[statement.RawSql] = count + 1;
    
    statements.AddInPlace(statement);
    StatementsChanged();
}


public override void AfterAttachingToSession(
    SessionInformation sessionInformation, SqlStatement statement)
{
    var isSelectStatement = statement.IsSelectStatement;
    if (isSelectStatement == false)
        return;

    if (statement.RawSql == mySqlSelectLastIdentity)
        return;

    if (sessionInformation.GetCountOfStatementsByRawSql(statement.RawSql) < 
        configuration.MaxRepeatedStatementsForSelectNPlusOne)
        return;
    statement.AcceptAlert(AlertInformation.SelectNPlusOne());
}

On the left, we have the old code, that did the calculations manually. On the right, the two top methods will do the calculation up front, and then it is just a matter of reading it in the appropriate times.

Old code New code
protected void ClearStackTraceFromInfrastructureFrameworks(StackTraceInfo stackTrace)
{
    if (configuration.InfrastructureNamespaces == null)
        return;
    
    stackTrace.Frames =stackTrace.Frames.Where(
        frame =>
        {
            var isInfrastructureNamespace = 
                configuration.InfrastructureNamespaces.Any(
                ns => frame.Namespace != null && frame.Namespace.StartsWith(ns)
                );

            return isInfrastructureNamespace == false ||
                   configuration.NonInfrastructureNamespaces.Any(
                    ns => frame.Namespace != null && frame.Namespace.StartsWith(ns)
                    );
        }).ToArray();
}
public StackTraceProcessor(ProfilerConfiguration configuration)
{
    this.configuration = configuration;
    var sb = new StringBuilder();
    foreach (var infrastructureNamespace in 
        configuration.InfrastructureNamespaces ?? new List<string>())
    {
        sb.Append("(^")
            .Append(Regex.Escape(infrastructureNamespace))
            .Append(")|");
    }
    sb.Length = sb.Length - 1;
    namespacesToClear = new Regex(sb.ToString(),
        RegexOptions.Compiled | RegexOptions.IgnoreCase);
    sb.Length = 0;

    foreach (var infrastructureNamespace in 
        configuration.NonInfrastructureNamespaces ?? new List<string>())
    {
        sb.Append("(^")
            .Append(Regex.Escape(infrastructureNamespace))
            .Append(")|");
    }
    sb.Length = sb.Length - 1;
    namespacesToKeep = new Regex(sb.ToString(), 
        RegexOptions.Compiled | RegexOptions.IgnoreCase);
}

protected void ClearStackTraceFromInfrastructureFrameworks(StackTraceInfo stackTrace)
{
    if (configuration.InfrastructureNamespaces == null)
        return;

    stackTrace.Frames =
        stackTrace.Frames.Where(IsValidStackFrameMethod).ToArray();
}

private bool IsValidStackFrameMethod(StackTraceFrame frame)
{
    if (frame.Namespace == null)
        return true;
    if(namespacesToClear.IsMatch(frame.Namespace) == false)
        return true;
    return namespacesToKeep.IsMatch(frame.Namespace);
}

Interestingly enough, this code produced much better runtime performance, simply because, while it is more complex, the number of iterations that are happening are so much smaller.

Overall, the change result in order of magnitude or two difference for processing messages in the profiler pipeline.

Effectus: Isolated features

Continuing in my exploration of the Effectus code base, which originated from my Building a Desktop To-Do Application with NHibernate article on MSDN Magazine, I wanted to talk today about separating features into individual pieces in the application.

Each feature in the Effectus application is built as an isolated unit, that has little to do with any other features. This is very clear when you look at how they are organized:

image 

Sidenote: In each feature, the component parts are named the same (Presenter, View, Model). This is intentional, since it make it that much harder to mix stuff from other features.

Every feature has a pretty rigid structure, defined by the application, and trying to reuse stuff between features is frowned upon and avoided, even if doing so can reduce duplication. As a good example, the Model for both CreateNew and Edit features are identical, but they are two different classes with no attempt to merge it up.

The reason for this is quite simple, treating each feature as an isolated unit bring us great benefits, we can have different developers working on different features without the chance of stepping on each other toes, it is very hard for one feature to break another, deploying new features is easier to handle, etc. Trying to apply DRY and consolidate common pieces of code would introduce dependencies between features, and that is a bad idea.

But what about communication between features? What about when one feature needs to call another?

As it turned out, it is quite easy to turn a lot of the inter feature communication into an indirect calls, using pub/sub. We gain the independence from dependencies using pub/sub, while maintaining the feel of a well integrated product for the user. For the rare cases of one feature calling another, I create a simple Call(“FeatureName”) mechanism (in the code, this is Presenters.Show(name) method) that all us to just explicitly invoke another feature, maybe with some initial arguments. A more complete implementation will handle even that using pub/sub, probably with an integrator class that would dispatch the appropriate event to start and invoke the feature(s) that want to handle it.

You can read more about the architecture principles behind this system in: Application structure: Concepts & Features

Effectus: Fatten your infrastructure

Continuing in my exploration of the Effectus code base, which originated from my Building a Desktop To-Do Application with NHibernate article on MSDN Magazine, I wanted to talk today about how to build an application infrastructure.

First, to be clear, I am not talking about infrastructure pieces such as caching, data access, etc. Those are generic concerns and you should be able to just get them from an off the shelve library. When I am talking about an application infrastructure, I am talking about the infrastructure code that you are going to need to write for a particular application. You are always going to have to do that, even if it is something as simple as just wiring the container together, or registering session handling, etc.

From my point of view, the point of the application infrastructure is to make writing the actual application features as simple as possible. Ideally, you want to make the infrastructure pieces handle everything that isn’t purely related to the feature you are implementing. The reason that an application infrastructure can do this while a generic infrastructure cannot is that an application infrastructure can make a lot of assumptions about the way the application is built.

Let us take the Main screen feature in Effectus, shall we, it will let us look at how this is implemented:

public Observable<int> CurrentPage { get; set; }


public
Fact CanMovePrev { get { return new Fact(CurrentPage, () => CurrentPage > 0); } } public Fact CanMoveNext { get { return new Fact(CurrentPage, () => CurrentPage + 1 < NumberOfPages); } } public void OnCreateNew() { Presenters.Show("CreateNew"); } public void OnActionsChoosen(ToDoAction action) { Presenters.Show("Edit", action.Id); } public void OnLoaded() { LoadPage(0); } public void OnMoveNext() { LoadPage(CurrentPage + 1); } public void OnMovePrev() { LoadPage(CurrentPage - 1); }

Looking the code, we can see that there are actually a lot of infrastructure playing a part here.

  • Automatic binding of On[Button Name]() methods to the button command.
    • Within this, enabling / disabling automatically using Can[Button Name] properties
  • Automatic binding of On[List name]Choosen(Item) to the selected item changed behavior on the list.
  • Automatic updates of property states using Fact.
  • Automatic updates of property values, using Observable.

There isn’t much to say about the automatic binding between UI actions and presenter methods, except that I think it is clear how this behavior reduce the level of infrastructure level code that you would have to write, test and maintain. But both Fact and Observable deserve a bit more explanation.

In the LoadPage() method, not shown here, we are updating the CurrentPage property, this will trigger invalidation of the all the facts bound to the value (CanMoveNext, CanMovePrev), which will force their re-evaluation. Since the infrastructure knows how to wire this up to the appropriate commands on the UI, this means that this will enable/disable the given controls automatically. What is more, look at the code. It is clear, concise and pretty easy to reason about and use.

I’ll leave the actual exploration of the implementation to you, it is fairly simple, but the idea is important. The Effectus application actually have more infrastructure code than it have code to implement its features. But that is a very small application. Just to give you an idea, the # of lines of code devoted to infrastructure in Effectus is about 600 LOC! In most applications, you probably wouldn’t even notice the amount of code that the infrastructure takes, but the effects are quite clear.

Effectus: Building UI based on conventions

I waited for a while after my article was posted, to see if anyone caught on to some of the things that I did in the sample code that weren’t related to NHibernate usage. It seems that no one caught on to that, so I’ll try pointing them out explicitly.

The basic format of a feature in Effectus is this:

image

Each feature have a Presenter (business logic for the feature), a View Model, which is directly bound to the View and is the main way that Presenter has to communicate with the View. The main responsibility of the Model is to be bound to the UI, and allow the presenter to update it, but it is not quite enough.

The UI also need to raise events and handle user input. A common example is handling button clicks. You can’t really map them into Model behavior, at least not easily and in a way that make sense to the developers. Instead, I defined a set of conventions. As you can see in the picture, based on the name of the element, we match is with the appropriate execute method and the way to decide if the command can execute. Under the cover, we generate the appropriate ICommand instance, bound to the Presenter behavior.

image

If you look at the code, you can see that the conventions extend even to handling the selected item changed in a list box, including passing the newly selected instance back to the presenter. Those conventions are project specific, based on what the project need, and they result in a code base that is very easy to read and follow. Moreover, they result in a codebase that is relatively free from infrastructure concerns.

I am going to have a few more posts about Effectus, can you figure out what else I put in that code base?

How to write MEF in 2 hours or less

It took me a while to get around to this blog post, I promised Glenn that I would write it ages ago.

The title of this blog post, which is admittedly provocative, refer to a (I believe intentional) misinterpretation of a comment of mine in a MEF talk. It might also explain my general attitude about software in general.

MEF is a pretty big library, and it does a whole host of things. But when talking about it, there are specific features that keep getting mentioned over and over, probably because they are:

  • Really nice features.
  • Demo very well.

A good example of that is MEF ability to import new types on the fly. It is something really nice, but it isn’t something that is really impressive to me. Doing dynamic loading isn’t really hard. That has led me to make the comment that turned into the title of this post.

The actual comment I made was: “I can implement this feature (dynamic loading) in less than two hours.” The problem is that that was interpreted as saying that I can implement MEF itself in 2 hours (just to be clear, it isn’t possible). What is possible, however, is to implement interesting parts of MEF for a particular scenario in a short amount of time.

In fact, most of the time, I can implement the features that I find attractive in some software (not limited to MEF) faster than I can learn how to use that software in the first place. Granted, it may be a single purpose, only-works-under-the-following-set-of-assumptions implementation, but it will work, it will be faster out of the gate and it will limit the number of dependencies that I have to manage.

It is something that you have to consider, because sometimes you want to put those responsibilities at the hands of some other party because you don’t want to deal with that. Sometimes you explicitly want to do it yourself because giving up control on this thing (or adapting to the way some library does it) is not acceptable.

ÜberProf new feature: Fully keyboard enabled

For a long time, most of the work in the profiler (NH Prof, HProf, L2S Prof & EF Prof) could be done only with the use of the mouse. That annoyed a bunch of people, but it didn’t really bother me.  Rob fixed this recently, and I cannot believe what kind of a difference it makes.

Here are the shortcuts:

S Focus on Sessions tab header
T Focus on Statements tab header
D Focus on Details tab header
F Focus on Statistics tab header
Left / Right Move to the next / prev tab
Down / Up Move to next session / statement

Using those, you can use the common functionality of the profiler without touching the mouse.

Open Source Success Metrics

I got into a very interesting discussion with Rob Conery recently, talking about, among other things, the CodePlex Foundation. You can hear it TekPub’s Podcast #2, when it is out later this week. Tangential to our discussion, but very important, is how an Open Source project owner defines success for their project.

Usually, when people try to judge if an open source project is successful or not, they look at the number of users, and whatever the general response for the project is positive. But I am involved in a lot of open source projects, so you might say that I have an insider’s view on how the project owner view this.

Now, let us be clear, I am talking about OSS project that are being led by an individual or a group of individuals. There are different semantics for OSS projects that are being led by a company.

There are several reasons for individuals to be involved in OSS projects, those are usually:

  • Scratch an itch – I want to do something challenging/fun.
  • Need it for myself – This is something that the owner started because they needed it themselves and open sourced it for their own reasons.
  • Reputation building – Creating this project would give me some street cred.
  • I use it at work – Usually applicable for people joining an existing project, they use it at work and contribute stuff that they require.

We must also make a distinction between OSS adoption and OSS contribution. Typically, maybe a tenth of a percent of the adopters would also be contributors. Now, let us talk about the owner’s success metric again. Unless the owner started the project for getting reputation, the number of people adopting your project is pretty meaningless. I think that DHH , the creator of Ruby on Rails, did a great job describing the owner sentiment:

I'm not in this world to create Rails for you. I'm in this world to create Rails for me and if you happen to like that version of Rails that I'm creating for me, than you are going to have a great time.

Even if you are interested in OSS for reputation building, after a while, it stops being a factor. Either you got your reputation, or you never will. A good example of a project started to gain reputation is Rhino Mocks. And you know what, it worked. But I think that it is safe to say that I am no longer just that Rhino Mocks guy, so the same rules applies as for the other motivations.

So, what does it mean, from an owner perspective, when you ask if an OSS project is successful or not?  The answer is simple: it does what I need it to do.

I asked a popular OSS project owner the following question: What would your response be if I told you that I am taking a full page ad in the Times magazine proclaiming your stuff to be the best thing since sliced bread? You’ll get a million users out that that.

His response was: Bleh, no!

Tying it back to the discussion that I had with Rob, I feel that much of the confusion with regards to the CodePlex Foundation role is when you try to talk to projects led by individuals as if they were commercial enterprises. The goals are just completely different, and in many cases, adding more users for the project will actually be bad for the project, because it put a higher support cost for the project ream.

In the .NET ecosystem, most of the projects aren’t being led by a company. They are led by individuals. That is an important distinction, and understanding it would probably make it clear why the most common response for the CodePlex Foundation was: What is in it for me?

UberProf performance challenges

One of the things that the profiler is supposed to do is to handle large amount of data, so I was pretty bummed to hear that users are running into issues when they try to use the profiler in high load scenarios, such as load tests.

Luckily, one of the things that the profiler can do is output to a file, which let me simulate what is going on on the customer site very easily.

The first performance problem is that the profiler isn’t processing things fast enough:

image

The second is that there are some things that just totally breaks my expectations

image

There are solutions for those problems, and I guess that I know what I’ll be doing in the near future :-)

ÜberProf new feature: Programmatic Integration

Well… I am still working done the list of stuff people request for the profiler (NH Prof, HProf, L2S Prof & EF Prof) , and one of the things that popped into the head of the list was wanting to have programmatic access to the profiler output. We aren’t talking about just the XML reports that are available, but to be able to get the data in a way that is easy to work with.

Well, here it is:

image

There is a new DLL in the profiler distribution, HibernatingRhinos.Profiler.Integration, where you can find the CaptureProfilerOutput class. That class will let you start the profiler, capture whatever happens in the middle, and finally stop and return you an instance of a report, containing all the data about what actually happened.

This is perfect if you want to use the profiler as part of your integration tests library. As an aside, you can also use the new DLL to programmatically parse (by using XML Serialization) the XML reports that you generate as part of your build, so you get very easy way to build rules around that report.

Entity Framework Profiler is now in public beta

imageAfter some time in private beta, the Entity Framework Profiler has now to the public beta stage.

For the beta period, I am offering a 30% discount for EF Prof.

EF Prof brings the an unparallel level of insight into Entity Framework internals. You can see exactly what sort of actions Entity Framework takes against you database, view the results of queries, correlate queries to the matching lines of code that spawn them and get on the spot guidance when you are violating Entity Framework best practices.

Here is the screen shot, and you can also take a look at the live demo as well.

image

Tags:

Published at

Originally posted at

Comments (15)

Setting the record straight: I am not the main contributor to NHibernate

This is just something that bothers me. I’ll take credit where credit is due, but in this case, it isn’t. I am neither the owner of NHibernate nor the main contributor, as some people seems to think.

According to ohloh, I am actually the #4 contributor, and if we will look at results from the last year alone, I would say that I am more likely to be #5 or #6. NHibernate is the work of many people, and it bothers me that people seems to think that because I talk about NHibernate frequently and for so long, I am actually the main guy behind it.

I am not the main contributor, I am just the loudest one.

Handling complex security conditions with Rhino Security

I got an interesting question about handling complex security conditions that I thought would be perfect to illustrate the underlying design principles for Rhino Security.

The problem is securing surveys. A survey has a start date, an end date, can be marked as applicable to a specific population, may be public or private, etc. The specification says that a survey that the user does not own should only be visible to a user iff:

  • The survey is public
  • The survey is active
  • The survey has started
  • The survey has not ended
  • The survey is for a population that the user is a member of

Can you imagine the amount of work that is required to set up this sort of rule properly? And apply it consistently all over the application? It is exactly for those sort of scenarios that I created Rhino Security.

Let us see how we are going to design the security infrastructure for the application. We are going to specify the following:

  • The owning user for a survey is allowed access at a high level:
    permissionsBuilderService
        .Allow("/Survey") // allow to do everything on a survey
        .For(survery.Owner)
        .On(survey)
        .Level(10)
        .Save();
  • If the survey is for a particular population, that population is created as a user group, and we create a rule that allow that user group access to the survey:
    permissionsBuilderService
        .Allow("/Survey/Vote")
        .For(survery.Population.Name) // the user group name
        .On(survey)
        .Level(1)
        .Save();
  • The other permissions are specified globally, we need to do an and between all of them, and Rhino Security doesn’t give us direct support for that. What we do get, however, is Deny and levels, so we specify those permissions using the following:
    permissionsBuilderService
        .Deny("/Survey/Vote")
        .For("Everyone") // all users
        .On("Inactive Surveys") // an entity group
        .Level(6)
        .Save();
    
    permissionsBuilderService
        .Deny("/Survey/Vote")
        .For("Everyone") // all users
        .On("Private Surveys") // an entity group
        .Level(6)
        .Save();
    
    permissionsBuilderService
        .Deny("/Survey/Vote")
        .For("Everyone") // all users
        .On("Completed Surveys") // an entity group
        .Level(6)
        .Save();
    
    permissionsBuilderService
        .Deny("/Survey/Vote")
        .For("Everyone") // all users
        .On("Unstarted Surveys") // an entity group
        .Level(6)
        .Save();

Now, that we have set it up, what we need to do now is to add and remove the survey to the appropriate entity groups, based on whatever business conditions that we have. Note that those are explicit state transitions. This isn’t just us setting IsActive and IsPublic flag, or verifying date ranges. We have to take explicit action to add and remove the entities from the appropriate groups.

There are two reasons of doing things in this way, the first, it make it significantly easier to handle security, since we don’t have murky rules or implicit state transitions. Second, from a design standpoint, it leads to a situation where we don’t work with just dump objects and queries, but meaningful transitions between states.

Yes, that does implies that you would have to have some sort of a background process to move surveys between groups based on time. That is a good thing, but I talked about this in another post.