UberProf performance improvements, beware of linq query evaluation
This is a diff from the performance improvement effort of UberProf. The simple addition of .ToList() has significantly improved the performance of this function:
Why?
Before adding the ToList(), each time we try to run our aggregation functions on the statements enumerable, we would force re-evaluation of the filtering (which can be quite expensive). By adding ToList() I am now making the filtering run only once.
There is another pretty obvious performance optimization that can be done here, can you see it? And why did I choose not to implement it?
Comments
I'd say the obvious 'optimization' would be to take counts of statements which are transactions, which are cached and which are neither.
Something like this...
long cachedCount, transactionCount, neitherTransactionNorCachedCount;
statements.Each(s =>
{
}
Not sure why you'd avoiding doing it, but I'm on holiday and my brain's not in gear. That's my excuse, anyway.
"group by" rather than count, and a join/group by for the agreegate number of statements?
NumberOfStatements is
statements.Count() - NumberOfCahcedStatements - NumberOfTransactionsStatements;
I guess that you didn't do it as it would make the code a little messier as you couldn't use the inline constructor. I do hope that there is a better reason.
I agree with rik, and i wouldn't do it because the performance issues are probably in the first part.
This is actually one of the reasons I find var to be mostly evil. var allows developers to ignore the real type. It changes the mind of thinking. The two pieces of code do VERY different things. Yet to an untrained eye it's not clear at all. Largely I blame var for this. If the type were explicitly defined in that statement I believe fewer people would make the mistake of misunderstanding how it is being used.
I personally never use var in my code. That also means I never use anonymous types. I don't mind the extra typing involved. Typing doesn't take a long time. Forming the right solution to the problem is usually the much biggest cost, so typing cost is in the noise for me. Readability adds more.
I realize I'm in the minority on this subject.
Probably Rx has better solution compared to the ToList(). System.Interactive.dll has MemoizeAll method. It cashes results, but works in lazy manner. I have blog post about this chaliy.name/.../system_interactive_new_and_usef...
@Mike Memoization is a very powerful technique for performance optimization... Unfortunately it's something that could take awhile for an OO programmer to bend his head around to. Unless you are coming from a functional background :) Then it become your second nature.
@firefly, agree but LINQ full of such technics... It all about lazy. So I believe we already prepared for this klind of stuff.
@John: would you be happier if your saw IEnumerable <statement instead of var?
var is good because it allows compiler to pick the best matching extension method (for example IEnumerable and IQueryable both have "Where" extension method but first performs filtering on client whereas second does it on server - shorter code with performs better - win-win situation)
@Ayende: so, are you going to fill in the the "just back from holiday, and brain slow to fire up" and the "too scared to comment in case I look foolish" amongst us? :-)
Not sure that I am following you here
You asked 3 questions in your posting. I don't know the definitive answer to them. To aid those, like me, who come along in the future and read posts like this, it would be beneficial to see questions and answers -- rather than just questions.
Comprendes?
would .ToArray() be slightly faster? or am i missing something?
@MF, ToArray() might be slightly faster, but list item access by index is constant-time, so the difference (if there is an appreciable one) is probably trivial.
Comment preview