Ayende @ Rahien

My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

The RavenDB indexing processOptimization–Parallelizing work

One of the things that we are doing during the index process for RavenDB is applying triggers and deciding what, if and how a document will be indexed. The actual process is a bit more involved, because we have to do additional things (like figure out which indexes have already indexed those particular documents).

At any rate, the interesting thing is that this is a process which is pretty basic:

for doc in docs:
    matchingIndexes = FindIndexesFor(doc)
    if matchingIndexes.Count > 0:
       doc = ExecuteTriggers(doc) 
       if doc != null:
          yield doc

The interesting thing about this is that this is a set of operations that only works on a single document at a time, and the result is the modified documents.

We were able to gain significant perf boost by simply moving to a Parallel.ForEach call.  This seems simple enough, right? Parallelize the work, get better benefits.

Except that there are issues with this as well, which I’ll touch on my next post.

More posts in "The RavenDB indexing process" series:

  1. (24 Apr 2012) Optimization–Tuning? Why, we have auto tuning
  2. (23 Apr 2012) Optimization–Getting documents from disk
  3. (20 Apr 2012) Optimization–De-parallelizing work
  4. (19 Apr 2012) Optimization–Parallelizing work
  5. (18 Apr 2012) Optimization



;-) Would you get any more performance if you switch to ".Any()" instead of ".Count > 0" ?

Ayende Rahien

Stu, Huh? There is no way Any can be faster.


doh ! I thought I was helping as well A quick google came back with http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0 and http://stackoverflow.com/questions/5741617/listt-any-or-count

So I'm guessing FindIndexesFor(doc) returns a List not an IEnumerable

Sorry ;-)

Iván Morales

I made some simple tests and these are the results: 0.1461996 ms. [x1.00] Count()> 0 0.6240561 ms. [x4.27] Count> 0 5.0751038 ms. [x34.71] Any()

The Count method is the fastest. In other test Count property is 4-6 times slower then Count() and Any() 30-50 times slower

Iván Morales

Upps, time is in seconds, not milliseconds.


Any() definitely can be faster in certain circumstances - as always though, it depends...

Good answer here: http://stackoverflow.com/a/305156/54222

Phil Bolduc

@Ivan/@Stu: Ayende is using the Count property, not the Count() extension method.

Iván Morales

Phil my previous comment you can results from Count property too

My results (without filtering data) ordered from faster to slower: Any() [Method] -> Count [Property] -> Count() [Method]

Iván Morales

Oppps again. Today is not my day :-(

Ordered from faster to slower: Count() [Method] -> Count [Property] -> Any() [Method]

The fastest way: Count() [Extension Method]

Count [Property] is slower than Count() Any() [Extension Method] is much slower then Count property

Phil Bolduc

@Iván could you elaborate on the testing? What was the data type, i.e., List, that you used? How many items were in your collection? I think the stackoverflow link that @Simon provided has a good explaination.

Using JustDecompile, the Count property on List has the following definition: public int Count { get { return this._size; } } which is returning a private field. The Count() extension method at minimum needs to check if the underlying collection is an ICollection or ICollection. If so, it returns the Count property. It does at minimum one not null check, one cast to ICollection using 'as', and one Count property access.

Generally, if the collection is an ICollection the argument of using the Count() extension method or Count property is moot.

Iván Morales

My test code: http://pastebin.com/awGgJpn9


@Phil "Ayende is using the Count property, not the Count() extension method." yeah I noticed that ... afterwards

Phil Bolduc

@Ivan - I took your souce can and ran my own analysis. I could not corroborate your results. My code is here: http://pastebin.com/QadafRKG

One thing I added was to allow the user to pick which order the tests run. I also ran a smaller batch before timing to remove any issues with CPU caches. I ran this on Windows 7 x64 SP1, 16GB RAM, Q9400 @ 2.66GHz, .NET Framework 4 Client Profile

Here are my results:

D:>CountAnalysis.exe any-method-property Overhead: 415 Any: 15250 CountMethod: 5877 CountProperty: 304

D:>CountAnalysis.exe method-property-any Overhead: 413 CountMethod: 5877 CountProperty: 303 Any: 14274

D:>CountAnalysis.exe property-any-method Overhead: 413 CountProperty: 303 Any: 15391 CountMethod: 5878

D:>CountAnalysis.exe any-property-method Overhead: 414 CountProperty: 303 Any: 15082 CountMethod: 5893

Iván Morales

Definitely not my day today.

I made a mistake when typing the name of the test that ran. These are the corrected results and much more logical;-)

[Count>0] 0.22 secs, [Count()>0] 0.86 secs (x3,91 slower then the fastest), [Any()] 4.97 secs (x22,58 slower then the fastest)

Thanks Phil

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats