Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,968 | Comments: 44,489

filter by tags archive

UberProf performance improvements, or: when O(N^3 + N) is not fast enough


While working on improving the performance of the profiler, I got really annoyed. The UI times for updating sessions with large amount of statements was simply unacceptable. We already virtualized the list UI, so it couldn’t be that. I started tracking it down, and it finally came down to the following piece of code:

protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
    foreach (var statementSnapshot in snapshots)
    {
        var found = (from model in unfilteredStatements
                     where model.Id == statementSnapshot.StatementId
                     select model).FirstOrDefault();

        if (found == null)
        {
            unfilteredStatements.Add(
                new StatementModel(applicationModel, statementSnapshot) {CanJumpToSession = false});
        }
        else
        {
            found.UpdateWith(statementSnapshot);
        }
    }

    if (snapshots.Any())
    {
        hasPolledForStatements = true;
        StatementsChanged(this, 
            new StatementsChangedEventArgs {HasPolledForStatments = hasPolledForStatements});
    }
}

This looks pretty innocent, right? At a guess, what is the performance characteristics of this code?

It isn’t O(N), that is for sure, look at the linq query, it will perform a linear search in the unfilteredStatements, which is an O(N) operation. *

At first glance, it looks like O(N*N), right? Wrong!

unfilteredStatements is an ObservableCollection, and guess what is going on in the CollectionChanged event? A statistics function that does yet another O(N) operation for every single added statement.

Finally, we have the StatementsChanged handler, which will also perform an O(N) operation, but only once.

protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
    var hasStatements = false;
    foreach (var statementSnapshot in snapshots)
    {
        IStatementModel found;
        statementsById.TryGetValue(statementSnapshot.StatementId, out found);
        hasStatements = true;

        if (found == null)
        {
            unfilteredStatements.Add(new StatementModel(applicationModel, statementSnapshot) {CanJumpToSession = false});
            statementsById.Add(statementSnapshot.StatementId, found);
        }
        else
        {
            found.UpdateWith(statementSnapshot);
        }
    }

    if (hasStatements == false) 
        return;
    HandleStatementsOrFiltersChanging(true);
}

The changes here are subtle, first, we killed off the O(N) linq query in favor of an O(1) lookup in a hashtable. Second, unfilteredStatements is now a simple List<T>, and HandleStatementsOrFilterChanging is the one responsible for notifications. Just these two changes were sufficent from the O(N^3+N) that we had before to a simple O(N+N) (because we still run some statistical stuff at the end) which collapses so a simple O(N).

Once that is done, showing sessions with 50,000 statements in them became instantaneous.

* Yes, it is supposed to be O(M) because the unfilteredStatements.Count is different than snapshots.Count, but I am simplifying here to make things easier.


Comments

Richard Dingwall

Can't wait to see these performance improvements in action myself. Keep up the good work!

Sebastian Ullrich

At first glance, it looks like O(N*N), right?

At second glance, it still does to me.

Your loop body is a sequence of two O(n) operations, summing to O(n). So we have a O(n²) loop followed by a O(n) operation, that's O(n²+n) = O(n²).

...whatever - another great example of real world performance improvements!

Ayende Rahien

Sebastian,

Note the next statement, unflitedStaetments is firing a CollectionChanged event that has a subscriber that does another O(N) operation

Mike G

I think Sebantian is right, this is O(N^2).

Unless the CollectionChangedEvent is performing an O(N) operation and for each value of N the subscriber is performing, everything in unfilteredStatements is O(N), not O(N^2).

It is O(N), the for loop, operating on N+N+N, the three O(N) statements, which comes out to O(N^2).

Ayende Rahien

Mike,

As I said in the post, the collection changed event is making an O(N) operation every time it is invoked.

Vadim Kantorov

Just my fifty cents.

Anyway, why haven't you simply written O(N^3) instead of O(N^3 + N) and O(N) instead of O(N + N)?

For me it's confusing to see odd O(N^3 + N) instead of O(N^3). And even more confusing to see this in the post title...

Manu

I also think it's O(N^2). The LINQ query takes O(N) and the unfilteredStatements.Add statements takes O(N). This sums up to O(N + N) and not O(N * N) within the foreach loop.

tobi

arent people reading your post ayende? three comments about n^2 already...

Rafal

Yep, it's not O(N^3) but O(N^2) like Manu said. If it were O(N^3) the program would probably take 50 years to process 50 000 statements.

Ayende Rahien

Vadim,

O(N^3 + N) express what is going on better than just N^3. Note the comment in the end of the post.

Paul Hatcher

We have..

  1. Outer loop - O(N)

1.a. Linq query - O(N)

1.b OrderedCollection.Add - O(N)

  1. StatementChanged - O(N)

Now the Linq query and the OrderCollection.Add are in the scope of the outer loop, so we have O(N*2N+N) which is O(N^2)

Vadim Kantorov

I get your point. Though I still don't agree.

Anyway, you use big o notation to discuss the complexity of the algorithm. N in the O(N^3 + N) barely adds anything to the overall time complexity.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  2. Production postmortem (4):
    23 Jul 2015 - The case of the native memory leak
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats