One of the things that we are doing during the index process for RavenDB is applying triggers and deciding what, if and how a document will be indexed. The actual process is a bit more involved, because we have to do additional things (like figure out which indexes have already indexed those particular documents).
At any rate, the interesting thing is that this is a process which is pretty basic:
for doc in docs: matchingIndexes = FindIndexesFor(doc) if matchingIndexes.Count > 0: doc = ExecuteTriggers(doc) if doc != null: yield doc
The interesting thing about this is that this is a set of operations that only works on a single document at a time, and the result is the modified documents.
We were able to gain significant perf boost by simply moving to a Parallel.ForEach call. This seems simple enough, right? Parallelize the work, get better benefits.
Except that there are issues with this as well, which I’ll touch on my next post.