Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,408

filter by tags archive

The RavenDB indexing processOptimization


The actual process done by RavenDB to index documents is a fairly complex one. In order to understand what exactly happened, I decided to break it apart to pseudo code.

It looks something like this:

while database_is_running:
  stale = find_stale_indexes()
  lastIndexedEtag = find_last_indexed_etag(stale)
  docs_to_index = get_documents_since(lastIndexedEtag, batch_size)
  
  filtered_docs = execute_read_filters(docs_to_index)
  
  indexing_work = []
  
  for index in stale:
    
    index_docs = select_matching_docs(index, filtered_docs)
    
    if index_docs.empty:
      set_indexed(index, lastIndexedEtag)
    else
      indexing_work.add(index, index_docs)
      
  for work in indexing_work:
  
     work.index(work.index_docs)

And now let me show you the areas in which we did some perf work:

while database_is_running:
  stale = find_stale_indexes()
  lastIndexedEtag = find_last_indexed_etag(stale)
  docs_to_index = get_documents_since(lastIndexedEtag, batch_size)
  
  filtered_docs = execute_read_filters(docs_to_index)
  
  indexing_work = []
  
  for index in stale:
    
    index_docs = select_matching_docs(index, filtered_docs)
    
    if index_docs.empty:
      set_indexed(index, lastIndexedEtag)
    else
      indexing_work.add(index, index_docs)
      
  for work in indexing_work:
  
     work.index(work.index_docs)

All of which gives us a major boost in the system performance. I’ll discuss each part of that work in detail, don’t worry Winking smile

More posts in "The RavenDB indexing process" series:

  1. (24 Apr 2012) Optimization–Tuning? Why, we have auto tuning
  2. (23 Apr 2012) Optimization–Getting documents from disk
  3. (20 Apr 2012) Optimization–De-parallelizing work
  4. (19 Apr 2012) Optimization–Parallelizing work
  5. (18 Apr 2012) Optimization

Comments

Nadav

The last: you could have just run index(index_docs) instead of adding it to a list and only at the end enumerate that list.

Ayende Rahien

Nadav, Doing so would force me to wait to filtering for each index. Instead, I can do the filtering first, then execute all of the indexes at once.

Andrew Harry

Ayende, given the scarcity of good blog posts on RavenDB - could you release more of your future ones?

I'm still learning but i'm loving what i'm seeing in RavenDB. Could you write more about 'best practises' and common pitfalls?

I am making my pitch to switch out our custom SQL & Azure tables combination in a CRM we are developing for RavenDB. So far I have shed nearly 6000+ lines of code and the conversion has taken less than a week.

Phillip Haydon

@Andrew - there's plenty of good blog posts for RavenDB, some people have found my few really helpful.

You can also join http://jabbr.net and chat to some of the guys in #ravendb room. Lots of people come in seeking help for different things and we can help most of the time.

Google Group is awesome too if you want quick answers from Ayende or his team.

Ayende Rahien

Andrew, There are actually a LOT of really good blog posts about RavenDB. And we have a lot of documentation, screen casts and both TekPub and PluralSight have courses about it. Most of the future posts here are actually about how to implement and optimize RavenDB, not how to work with it.

Tim
Tim

Ayende, your pseudo code looks remarkably like python, any plans to switch languages? ;-)

Ayende Rahien

Tim, This is actually Boo. And I have been doing that for 7 years or so.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats