The RavenDB indexing process: Optimization–Parallelizing work

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

Apr 19 2012

The RavenDB indexing processOptimization–Parallelizing work

time to read 2 min | 258 words

One of the things that we are doing during the index process for RavenDB is applying triggers and deciding what, if and how a document will be indexed. The actual process is a bit more involved, because we have to do additional things (like figure out which indexes have already indexed those particular documents).

At any rate, the interesting thing is that this is a process which is pretty basic:

for doc in docs:
    matchingIndexes = FindIndexesFor(doc)
    if matchingIndexes.Count > 0:
       doc = ExecuteTriggers(doc) 
       if doc != null:
          yield doc

The interesting thing about this is that this is a set of operations that only works on a single document at a time, and the result is the modified documents.

We were able to gain significant perf boost by simply moving to a Parallel.ForEach call. This seems simple enough, right? Parallelize the work, get better benefits.

Except that there are issues with this as well, which I’ll touch on my next post.

Tweet Share Share 14 comments

Tags:

Comments

19 Apr 2012
11:29 AM

Stu

;-) Would you get any more performance if you switch to ".Any()" instead of ".Count > 0" ?

19 Apr 2012
11:29 AM

Ayende Rahien

Stu, Huh? There is no way Any can be faster.

19 Apr 2012
11:34 AM

Stu

doh ! I thought I was helping as well A quick google came back with http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0 and http://stackoverflow.com/questions/5741617/listt-any-or-count

So I'm guessing FindIndexesFor(doc) returns a List not an IEnumerable

Sorry ;-)

19 Apr 2012
13:12 PM

Iván Morales

I made some simple tests and these are the results: 0.1461996 ms. [x1.00] Count()> 0 0.6240561 ms. [x4.27] Count> 0 5.0751038 ms. [x34.71] Any()

The Count method is the fastest. In other test Count property is 4-6 times slower then Count() and Any() 30-50 times slower

19 Apr 2012
13:27 PM

Iván Morales

Upps, time is in seconds, not milliseconds.

19 Apr 2012
13:37 PM

Simon

Any() definitely can be faster in certain circumstances - as always though, it depends...

Good answer here: http://stackoverflow.com/a/305156/54222

19 Apr 2012
14:16 PM

Phil Bolduc

@Ivan/@Stu: Ayende is using the Count property, not the Count() extension method.

19 Apr 2012
14:29 PM

Iván Morales

Phil my previous comment you can results from Count property too

My results (without filtering data) ordered from faster to slower: Any() [Method] -> Count [Property] -> Count() [Method]

19 Apr 2012
14:35 PM

Iván Morales

Oppps again. Today is not my day :-(

Ordered from faster to slower: Count() [Method] -> Count [Property] -> Any() [Method]

The fastest way: Count() [Extension Method]

Count [Property] is slower than Count() Any() [Extension Method] is much slower then Count property

19 Apr 2012
14:58 PM

Phil Bolduc

@Iván could you elaborate on the testing? What was the data type, i.e., List<T>, that you used? How many items were in your collection? I think the stackoverflow link that @Simon provided has a good explaination.

Using JustDecompile, the Count property on List<T> has the following definition: public int Count { get { return this._size; } } which is returning a private field. The Count() extension method at minimum needs to check if the underlying collection is an ICollection<T> or ICollection. If so, it returns the Count property. It does at minimum one not null check, one cast to ICollection<T> using 'as', and one Count property access.

Generally, if the collection is an ICollection<T> the argument of using the Count() extension method or Count property is moot.

19 Apr 2012
15:50 PM

Iván Morales

My test code: http://pastebin.com/awGgJpn9

19 Apr 2012
17:08 PM

stu

@Phil "Ayende is using the Count property, not the Count() extension method." yeah I noticed that ... afterwards

19 Apr 2012
19:16 PM

Phil Bolduc

@Ivan - I took your souce can and ran my own analysis. I could not corroborate your results. My code is here: http://pastebin.com/QadafRKG

One thing I added was to allow the user to pick which order the tests run. I also ran a smaller batch before timing to remove any issues with CPU caches. I ran this on Windows 7 x64 SP1, 16GB RAM, Q9400 @ 2.66GHz, .NET Framework 4 Client Profile

Here are my results:

D:>CountAnalysis.exe any-method-property Overhead: 415 Any: 15250 CountMethod: 5877 CountProperty: 304

D:>CountAnalysis.exe method-property-any Overhead: 413 CountMethod: 5877 CountProperty: 303 Any: 14274

D:>CountAnalysis.exe property-any-method Overhead: 413 CountProperty: 303 Any: 15391 CountMethod: 5878

D:>CountAnalysis.exe any-property-method Overhead: 414 CountProperty: 303 Any: 15082 CountMethod: 5893

19 Apr 2012
21:23 PM

Iván Morales

Definitely not my day today.

I made a mistake when typing the name of the test that ran. These are the corrected results and much more logical;-)

[Count>0] 0.22 secs, [Count()>0] 0.86 secs (x3,91 slower then the fastest), [Any()] 4.97 secs (x22,58 slower then the fastest)

Thanks Phil

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

The RavenDB indexing processOptimization–Parallelizing work

More posts in "The RavenDB indexing process" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "The RavenDB indexing process" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication