UberProf performance improvements

Dec 22 2009

UberProf performance improvements

time to read 15 min | 2870 words

I mentioned before that I run into some performance problems, I thought it would be interesting to see how I solved them.

The underlying theme was finding O(N) operations are eradicating them. In almost all the cases, the code is more complex, but significantly faster

Old code

New code

public static class TimestampExtensions
{
    public static void AddInPlace<T>(this IList<T> self, T model)
        where T : IHaveTimestamp
    {
        int index = self.Count;
        for (var i = 0; i < self.Count; i++)
        {
            if (self[i].Timestamp > model.Timestamp)
            {
                index = i;
                break;
            }
        }
        self.Insert(index, model);
    }    
}

public static class TimestampExtensions
{
    public static void AddInPlace(this IList<SqlStatement> self, SqlStatement model)
    {
        var timestamp = model.Timestamp;
        var index = self.Count;
        if (index > 0 && timestamp >= self[index - 1].Timestamp)
        {
            self.Add(model);
            return;
        }
        for (var i = 0; i < self.Count; i++)
        {
            if (self[i].Timestamp < timestamp) 
                continue;
            index = i;
            break;
        }
        self.Insert(index, model);
    }    
}

Here we optimize for the common case where we add the statement to the end of the list. We also changed from generic method with interface constraint to the actual type used. That allows the JIT to apply additional optimizations. That method alone was responsible for almost 10 minutes in my test runs.

Old code	New code
var statement = session.Statements .Reverse() .Where(x => x.SessionId.Timestamp <= queryDuration.SessionId.Timestamp) .FirstOrDefault();
var statement = session.Statements.LastOrDefault();

session.Statements is already ordered by the statements timestamps. The code perform a slightly differently here, and it is less robust to async message streaming (when events that happened sequentially arrive in different order than they occured), but that is rare enough that I feel that the performance boost from this is acceptable.

Old code

New code

public override void AfterAttachingToSession(
         SessionInformation sessionInformation, SqlStatement statement)
{
    var isSelectStatement = statement.IsSelectStatement;
    if (isSelectStatement == false)
        return;

    if (statement.RawSql == mySqlSelectLastIdentity)
        return;

    int countOfStatementsWithIdenticalSql = sessionInformation.Statements
        .Count(x => x.RawSql == statement.RawSql);
    if (countOfStatementsWithIdenticalSql < 
        configuration.MaxRepeatedStatementsForSelectNPlusOne)
        return;
    statement.AcceptAlert(AlertInformation.SelectNPlusOne());
}

public int GetCountOfStatementsByRawSql(string rawSql)
{
    int value;
    countsByRawSql.TryGetValue(rawSql, out value);
    return value;
}

public void AddStatement(SqlStatement statement)
{
    int count;
    if (countsByRawSql.TryGetValue(statement.RawSql, out count) == false)
        count = 0;
    countsByRawSql[statement.RawSql] = count + 1;
    
    statements.AddInPlace(statement);
    StatementsChanged();
}


public override void AfterAttachingToSession(
    SessionInformation sessionInformation, SqlStatement statement)
{
    var isSelectStatement = statement.IsSelectStatement;
    if (isSelectStatement == false)
        return;

    if (statement.RawSql == mySqlSelectLastIdentity)
        return;

    if (sessionInformation.GetCountOfStatementsByRawSql(statement.RawSql) < 
        configuration.MaxRepeatedStatementsForSelectNPlusOne)
        return;
    statement.AcceptAlert(AlertInformation.SelectNPlusOne());
}

On the left, we have the old code, that did the calculations manually. On the right, the two top methods will do the calculation up front, and then it is just a matter of reading it in the appropriate times.

Old code

New code

protected void ClearStackTraceFromInfrastructureFrameworks(StackTraceInfo stackTrace)
{
    if (configuration.InfrastructureNamespaces == null)
        return;
    
    stackTrace.Frames =stackTrace.Frames.Where(
        frame =>
        {
            var isInfrastructureNamespace = 
                configuration.InfrastructureNamespaces.Any(
                ns => frame.Namespace != null && frame.Namespace.StartsWith(ns)
                );

            return isInfrastructureNamespace == false ||
                   configuration.NonInfrastructureNamespaces.Any(
                    ns => frame.Namespace != null && frame.Namespace.StartsWith(ns)
                    );
        }).ToArray();
}

public StackTraceProcessor(ProfilerConfiguration configuration)
{
    this.configuration = configuration;
    var sb = new StringBuilder();
    foreach (var infrastructureNamespace in 
        configuration.InfrastructureNamespaces ?? new List<string>())
    {
        sb.Append("(^")
            .Append(Regex.Escape(infrastructureNamespace))
            .Append(")|");
    }
    sb.Length = sb.Length - 1;
    namespacesToClear = new Regex(sb.ToString(),
        RegexOptions.Compiled | RegexOptions.IgnoreCase);
    sb.Length = 0;

    foreach (var infrastructureNamespace in 
        configuration.NonInfrastructureNamespaces ?? new List<string>())
    {
        sb.Append("(^")
            .Append(Regex.Escape(infrastructureNamespace))
            .Append(")|");
    }
    sb.Length = sb.Length - 1;
    namespacesToKeep = new Regex(sb.ToString(), 
        RegexOptions.Compiled | RegexOptions.IgnoreCase);
}

protected void ClearStackTraceFromInfrastructureFrameworks(StackTraceInfo stackTrace)
{
    if (configuration.InfrastructureNamespaces == null)
        return;

    stackTrace.Frames =
        stackTrace.Frames.Where(IsValidStackFrameMethod).ToArray();
}

private bool IsValidStackFrameMethod(StackTraceFrame frame)
{
    if (frame.Namespace == null)
        return true;
    if(namespacesToClear.IsMatch(frame.Namespace) == false)
        return true;
    return namespacesToKeep.IsMatch(frame.Namespace);
}

Interestingly enough, this code produced much better runtime performance, simply because, while it is more complex, the number of iterations that are happening are so much smaller.

Overall, the change result in order of magnitude or two difference for processing messages in the profiler pipeline.

Tweet Share Share 11 comments

Tags:

Comments

22 Dec 2009
12:13 PM

wcoenen

The new code is appearing under the old code for me, rather than to the right. (I tried multiple browsers and stretching the browser out over two screens.)

22 Dec 2009
12:16 PM

Rik Hemsley

If the list of IHaveTimestamp is to be kept ordered, have you considered using a SortedList? I don't know if it would perform any better than what you have, but it might help you avoid reinventing the wheel.

22 Dec 2009
13:59 PM

Ayende Rahien

wcoenen,

Sorry, but that is a limitation of the screen resolution, I am afraid.

It should be clear enough as it is.

22 Dec 2009
14:00 PM

Ayende Rahien

Rik,

Making this into a sorted list will lead to a potential sort operation at each insert, which I would rather avoid.

22 Dec 2009
14:06 PM

Rik Hemsley

Assuming you already have benchmarks set up as you've been through a round of optimisation, it might be interesting to plug SortedList in and see how it compares. I don't doubt there could be performance issues, but I've found that it's often best to assume the built-in collections have reasonable performance characteristics before trying to build your own version of the same functionality.

22 Dec 2009
14:08 PM

Rik Hemsley

Of course SortedList requires the keys to be unique, so if you're dealing with the possibility of more than one identical timestamp, it wouldn't be suitable.

22 Dec 2009
15:00 PM

Patrick Smacchia

Often, more performance means more code and more complexity.

Often, more performance means algorithms refining

which end up appending code dedicated to prevent execution of unnecessary branches of code.

22 Dec 2009
22:09 PM

firefly

What did you use to profile your application? dotTrace?

23 Dec 2009
08:28 AM

Ayende Rahien

firefly,

Yes, I love dotTrace.

23 Dec 2009
09:21 AM

Dennis

The sortedlist has amortized insertion time of O(lg N), but only if you insert random values... If you insert values that are already in order, then it is pretty much the same as a normal List with regards to runtime.

That would also make the code more robust, since as you say there is a good likelyhood that it is not always 100% sorted, and you have no real control over when it is not.

(and your remember me, forgot me yet again)

28 Dec 2009
12:01 PM

[ICR]

"Sorry, but that is a limitation of the screen resolution, I am afraid."

This is what floating divs are for.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB