Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 66

filter by tags archive

UberProf performance improvements, beware of linq query evaluation

time to read 1 min | 116 words

This is a diff from the performance improvement effort of UberProf. The simple addition of .ToList() has significantly improved the performance of this function:

image

Why?

Before adding the ToList(), each time we try to run our aggregation functions on the statements enumerable, we would force re-evaluation of the filtering (which can be quite expensive). By adding ToList() I am now making the filtering run only once.

There is another pretty obvious performance optimization that can be done here, can you see it? And why did I choose not to implement it?


Comments

Rik Hemsley

I'd say the obvious 'optimization' would be to take counts of statements which are transactions, which are cached and which are neither.

Something like this...

long cachedCount, transactionCount, neitherTransactionNorCachedCount;

statements.Each(s =>

{

cachedCount += s.IsCached ? 1 : 0;

transactionCount += s.IsTransaction ? 1 : 0;

neitherTransactionOrCachedCount += s.IsCached || s.IsTransaction ? 0 : 1;

}

Not sure why you'd avoiding doing it, but I'm on holiday and my brain's not in gear. That's my excuse, anyway.

John St. Clair

"group by" rather than count, and a join/group by for the agreegate number of statements?

Anthony Dehirst

NumberOfStatements is

statements.Count() - NumberOfCahcedStatements - NumberOfTransactionsStatements;

I guess that you didn't do it as it would make the code a little messier as you couldn't use the inline constructor. I do hope that there is a better reason.

JJoos

I agree with rik, and i wouldn't do it because the performance issues are probably in the first part.

John Chapman

This is actually one of the reasons I find var to be mostly evil. var allows developers to ignore the real type. It changes the mind of thinking. The two pieces of code do VERY different things. Yet to an untrained eye it's not clear at all. Largely I blame var for this. If the type were explicitly defined in that statement I believe fewer people would make the mistake of misunderstanding how it is being used.

I personally never use var in my code. That also means I never use anonymous types. I don't mind the extra typing involved. Typing doesn't take a long time. Forming the right solution to the problem is usually the much biggest cost, so typing cost is in the noise for me. Readability adds more.

I realize I'm in the minority on this subject.

firefly

@Mike Memoization is a very powerful technique for performance optimization... Unfortunately it's something that could take awhile for an OO programmer to bend his head around to. Unless you are coming from a functional background :) Then it become your second nature.

Mike Chaliy

@firefly, agree but LINQ full of such technics... It all about lazy. So I believe we already prepared for this klind of stuff.

Konstan

@John: would you be happier if your saw IEnumerable <statement instead of var?

var is good because it allows compiler to pick the best matching extension method (for example IEnumerable and IQueryable both have "Where" extension method but first performs filtering on client whereas second does it on server - shorter code with performs better - win-win situation)

david

@Ayende: so, are you going to fill in the the "just back from holiday, and brain slow to fire up" and the "too scared to comment in case I look foolish" amongst us? :-)

Ayende Rahien

Not sure that I am following you here

david

You asked 3 questions in your posting. I don't know the definitive answer to them. To aid those, like me, who come along in the future and read posts like this, it would be beneficial to see questions and answers -- rather than just questions.

Comprendes?

MF
MF

would .ToArray() be slightly faster? or am i missing something?

Dathan Bennett

@MF, ToArray() might be slightly faster, but list item access by index is constant-time, so the difference (if there is an appreciable one) is probably trivial.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 3 hours from now
  2. Production postmortem: The case of the lying configuration file - about one day from now
  3. Production postmortem: The industry at large - 2 days from now
  4. The insidious cost of allocations - 3 days from now
  5. Buffer allocation strategies: A possible solution - 6 days from now

And 4 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    31 Aug 2015 - The case of the memory eater and high load
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats