The RavenDB indexing processOptimization
The actual process done by RavenDB to index documents is a fairly complex one. In order to understand what exactly happened, I decided to break it apart to pseudo code.
It looks something like this:
while database_is_running: stale = find_stale_indexes() lastIndexedEtag = find_last_indexed_etag(stale) docs_to_index = get_documents_since(lastIndexedEtag, batch_size) filtered_docs = execute_read_filters(docs_to_index) indexing_work = [] for index in stale: index_docs = select_matching_docs(index, filtered_docs) if index_docs.empty: set_indexed(index, lastIndexedEtag) else indexing_work.add(index, index_docs) for work in indexing_work: work.index(work.index_docs)
And now let me show you the areas in which we did some perf work:
while database_is_running: stale = find_stale_indexes() lastIndexedEtag = find_last_indexed_etag(stale) docs_to_index = get_documents_since(lastIndexedEtag, batch_size) filtered_docs = execute_read_filters(docs_to_index) indexing_work = [] for index in stale: index_docs = select_matching_docs(index, filtered_docs) if index_docs.empty: set_indexed(index, lastIndexedEtag) else indexing_work.add(index, index_docs) for work in indexing_work: work.index(work.index_docs)
All of which gives us a major boost in the system performance. I’ll discuss each part of that work in detail, don’t worry
More posts in "The RavenDB indexing process" series:
- (24 Apr 2012) Optimization–Tuning? Why, we have auto tuning
- (23 Apr 2012) Optimization–Getting documents from disk
- (20 Apr 2012) Optimization–De-parallelizing work
- (19 Apr 2012) Optimization–Parallelizing work
- (18 Apr 2012) Optimization
Comments
The last: you could have just run index(index_docs) instead of adding it to a list and only at the end enumerate that list.
Nadav, Doing so would force me to wait to filtering for each index. Instead, I can do the filtering first, then execute all of the indexes at once.
Ayende, given the scarcity of good blog posts on RavenDB - could you release more of your future ones?
I'm still learning but i'm loving what i'm seeing in RavenDB. Could you write more about 'best practises' and common pitfalls?
I am making my pitch to switch out our custom SQL & Azure tables combination in a CRM we are developing for RavenDB. So far I have shed nearly 6000+ lines of code and the conversion has taken less than a week.
@Andrew - there's plenty of good blog posts for RavenDB, some people have found my few really helpful.
You can also join http://jabbr.net and chat to some of the guys in #ravendb room. Lots of people come in seeking help for different things and we can help most of the time.
Google Group is awesome too if you want quick answers from Ayende or his team.
Andrew, There are actually a LOT of really good blog posts about RavenDB. And we have a lot of documentation, screen casts and both TekPub and PluralSight have courses about it. Most of the future posts here are actually about how to implement and optimize RavenDB, not how to work with it.
Ayende, your pseudo code looks remarkably like python, any plans to switch languages? ;-)
Tim, This is actually Boo. And I have been doing that for 7 years or so.
Comment preview