Ayende @ Rahien

It's a girl

The RavenDB indexing process: Optimization–Parallelizing work

One of the things that we are doing during the index process for RavenDB is applying triggers and deciding what, if and how a document will be indexed. The actual process is a bit more involved, because we have to do additional things (like figure out which indexes have already indexed those particular documents).

At any rate, the interesting thing is that this is a process which is pretty basic:

for doc in docs:
    matchingIndexes = FindIndexesFor(doc)
    if matchingIndexes.Count > 0:
       doc = ExecuteTriggers(doc) 
       if doc != null:
          yield doc

The interesting thing about this is that this is a set of operations that only works on a single document at a time, and the result is the modified documents.

We were able to gain significant perf boost by simply moving to a Parallel.ForEach call.  This seems simple enough, right? Parallelize the work, get better benefits.

Except that there are issues with this as well, which I’ll touch on my next post.

Comments

Stu
04/19/2012 11:29 AM by
Stu

;-) Would you get any more performance if you switch to ".Any()" instead of ".Count > 0" ?

Ayende Rahien
04/19/2012 11:29 AM by
Ayende Rahien

Stu, Huh? There is no way Any can be faster.

Stu
04/19/2012 11:34 AM by
Stu

doh ! I thought I was helping as well A quick google came back with http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0 and http://stackoverflow.com/questions/5741617/listt-any-or-count

So I'm guessing FindIndexesFor(doc) returns a List not an IEnumerable

Sorry ;-)

Iván Morales
04/19/2012 01:12 PM by
Iván Morales

I made some simple tests and these are the results: 0.1461996 ms. [x1.00] Count()> 0 0.6240561 ms. [x4.27] Count> 0 5.0751038 ms. [x34.71] Any()

The Count method is the fastest. In other test Count property is 4-6 times slower then Count() and Any() 30-50 times slower

Iván Morales
04/19/2012 01:27 PM by
Iván Morales

Upps, time is in seconds, not milliseconds.

Simon
04/19/2012 01:37 PM by
Simon

Any() definitely can be faster in certain circumstances - as always though, it depends...

Good answer here: http://stackoverflow.com/a/305156/54222

Phil Bolduc
04/19/2012 02:16 PM by
Phil Bolduc

@Ivan/@Stu: Ayende is using the Count property, not the Count() extension method.

Iván Morales
04/19/2012 02:29 PM by
Iván Morales

Phil my previous comment you can results from Count property too

My results (without filtering data) ordered from faster to slower: Any() [Method] -> Count [Property] -> Count() [Method]

Iván Morales
04/19/2012 02:35 PM by
Iván Morales

Oppps again. Today is not my day :-(

Ordered from faster to slower: Count() [Method] -> Count [Property] -> Any() [Method]

The fastest way: Count() [Extension Method]

Count [Property] is slower than Count() Any() [Extension Method] is much slower then Count property

Phil Bolduc
04/19/2012 02:58 PM by
Phil Bolduc

@Iván could you elaborate on the testing? What was the data type, i.e., List, that you used? How many items were in your collection? I think the stackoverflow link that @Simon provided has a good explaination.

Using JustDecompile, the Count property on List has the following definition: public int Count { get { return this._size; } } which is returning a private field. The Count() extension method at minimum needs to check if the underlying collection is an ICollection or ICollection. If so, it returns the Count property. It does at minimum one not null check, one cast to ICollection using 'as', and one Count property access.

Generally, if the collection is an ICollection the argument of using the Count() extension method or Count property is moot.

Iván Morales
04/19/2012 03:50 PM by
Iván Morales

My test code: http://pastebin.com/awGgJpn9

stu
04/19/2012 05:08 PM by
stu

@Phil "Ayende is using the Count property, not the Count() extension method." yeah I noticed that ... afterwards

Phil Bolduc
04/19/2012 07:16 PM by
Phil Bolduc

@Ivan - I took your souce can and ran my own analysis. I could not corroborate your results. My code is here: http://pastebin.com/QadafRKG

One thing I added was to allow the user to pick which order the tests run. I also ran a smaller batch before timing to remove any issues with CPU caches. I ran this on Windows 7 x64 SP1, 16GB RAM, Q9400 @ 2.66GHz, .NET Framework 4 Client Profile

Here are my results:

D:>CountAnalysis.exe any-method-property Overhead: 415 Any: 15250 CountMethod: 5877 CountProperty: 304

D:>CountAnalysis.exe method-property-any Overhead: 413 CountMethod: 5877 CountProperty: 303 Any: 14274

D:>CountAnalysis.exe property-any-method Overhead: 413 CountProperty: 303 Any: 15391 CountMethod: 5878

D:>CountAnalysis.exe any-property-method Overhead: 414 CountProperty: 303 Any: 15082 CountMethod: 5893

Iván Morales
04/19/2012 09:23 PM by
Iván Morales

Definitely not my day today.

I made a mistake when typing the name of the test that ran. These are the corrected results and much more logical;-)

[Count>0] 0.22 secs, [Count()>0] 0.86 secs (x3,91 slower then the fastest), [Any()] 4.97 secs (x22,58 slower then the fastest)

Thanks Phil

Comments have been closed on this topic.