The RavenDB indexing processOptimization–Parallelizing work
One of the things that we are doing during the index process for RavenDB is applying triggers and deciding what, if and how a document will be indexed. The actual process is a bit more involved, because we have to do additional things (like figure out which indexes have already indexed those particular documents).
At any rate, the interesting thing is that this is a process which is pretty basic:
for doc in docs: matchingIndexes = FindIndexesFor(doc) if matchingIndexes.Count > 0: doc = ExecuteTriggers(doc) if doc != null: yield doc
The interesting thing about this is that this is a set of operations that only works on a single document at a time, and the result is the modified documents.
We were able to gain significant perf boost by simply moving to a Parallel.ForEach call. This seems simple enough, right? Parallelize the work, get better benefits.
Except that there are issues with this as well, which I’ll touch on my next post.
More posts in "The RavenDB indexing process" series:
- (24 Apr 2012) Optimization–Tuning? Why, we have auto tuning
- (23 Apr 2012) Optimization–Getting documents from disk
- (20 Apr 2012) Optimization–De-parallelizing work
- (19 Apr 2012) Optimization–Parallelizing work
- (18 Apr 2012) Optimization
Comments
;-) Would you get any more performance if you switch to ".Any()" instead of ".Count > 0" ?
Stu, Huh? There is no way Any can be faster.
doh ! I thought I was helping as well A quick google came back with http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0 and http://stackoverflow.com/questions/5741617/listt-any-or-count
So I'm guessing FindIndexesFor(doc) returns a List not an IEnumerable
Sorry ;-)
I made some simple tests and these are the results: 0.1461996 ms. [x1.00] Count()> 0 0.6240561 ms. [x4.27] Count> 0 5.0751038 ms. [x34.71] Any()
The Count method is the fastest. In other test Count property is 4-6 times slower then Count() and Any() 30-50 times slower
Upps, time is in seconds, not milliseconds.
Any() definitely can be faster in certain circumstances - as always though, it depends...
Good answer here: http://stackoverflow.com/a/305156/54222
@Ivan/@Stu: Ayende is using the Count property, not the Count() extension method.
Phil my previous comment you can results from Count property too
My results (without filtering data) ordered from faster to slower: Any() [Method] -> Count [Property] -> Count() [Method]
Oppps again. Today is not my day :-(
Ordered from faster to slower: Count() [Method] -> Count [Property] -> Any() [Method]
The fastest way: Count() [Extension Method]
Count [Property] is slower than Count() Any() [Extension Method] is much slower then Count property
@Iván could you elaborate on the testing? What was the data type, i.e., List<T>, that you used? How many items were in your collection? I think the stackoverflow link that @Simon provided has a good explaination.
Using JustDecompile, the Count property on List<T> has the following definition: public int Count { get { return this._size; } } which is returning a private field. The Count() extension method at minimum needs to check if the underlying collection is an ICollection<T> or ICollection. If so, it returns the Count property. It does at minimum one not null check, one cast to ICollection<T> using 'as', and one Count property access.
Generally, if the collection is an ICollection<T> the argument of using the Count() extension method or Count property is moot.
My test code: http://pastebin.com/awGgJpn9
@Phil "Ayende is using the Count property, not the Count() extension method." yeah I noticed that ... afterwards
@Ivan - I took your souce can and ran my own analysis. I could not corroborate your results. My code is here: http://pastebin.com/QadafRKG
One thing I added was to allow the user to pick which order the tests run. I also ran a smaller batch before timing to remove any issues with CPU caches. I ran this on Windows 7 x64 SP1, 16GB RAM, Q9400 @ 2.66GHz, .NET Framework 4 Client Profile
Here are my results:
D:>CountAnalysis.exe any-method-property Overhead: 415 Any: 15250 CountMethod: 5877 CountProperty: 304
D:>CountAnalysis.exe method-property-any Overhead: 413 CountMethod: 5877 CountProperty: 303 Any: 14274
D:>CountAnalysis.exe property-any-method Overhead: 413 CountProperty: 303 Any: 15391 CountMethod: 5878
D:>CountAnalysis.exe any-property-method Overhead: 414 CountProperty: 303 Any: 15082 CountMethod: 5893
Definitely not my day today.
I made a mistake when typing the name of the test that ran. These are the corrected results and much more logical;-)
[Count>0] 0.22 secs, [Count()>0] 0.86 secs (x3,91 slower then the fastest), [Any()] 4.97 secs (x22,58 slower then the fastest)
Thanks Phil
Comment preview