Ayende @ Rahien

Refunds available at head office

The RavenDB indexing process: Optimization

The actual process done by RavenDB to index documents is a fairly complex one. In order to understand what exactly happened, I decided to break it apart to pseudo code.

It looks something like this:

while database_is_running:
  stale = find_stale_indexes()
  lastIndexedEtag = find_last_indexed_etag(stale)
  docs_to_index = get_documents_since(lastIndexedEtag, batch_size)
  
  filtered_docs = execute_read_filters(docs_to_index)
  
  indexing_work = []
  
  for index in stale:
    
    index_docs = select_matching_docs(index, filtered_docs)
    
    if index_docs.empty:
      set_indexed(index, lastIndexedEtag)
    else
      indexing_work.add(index, index_docs)
      
  for work in indexing_work:
  
     work.index(work.index_docs)

And now let me show you the areas in which we did some perf work:

while database_is_running:
  stale = find_stale_indexes()
  lastIndexedEtag = find_last_indexed_etag(stale)
  docs_to_index = get_documents_since(lastIndexedEtag, batch_size)
  
  filtered_docs = execute_read_filters(docs_to_index)
  
  indexing_work = []
  
  for index in stale:
    
    index_docs = select_matching_docs(index, filtered_docs)
    
    if index_docs.empty:
      set_indexed(index, lastIndexedEtag)
    else
      indexing_work.add(index, index_docs)
      
  for work in indexing_work:
  
     work.index(work.index_docs)

All of which gives us a major boost in the system performance. I’ll discuss each part of that work in detail, don’t worry Winking smile

Comments

Nadav
04/18/2012 03:14 PM by
Nadav

The last: you could have just run index(index_docs) instead of adding it to a list and only at the end enumerate that list.

Ayende Rahien
04/18/2012 05:49 PM by
Ayende Rahien

Nadav, Doing so would force me to wait to filtering for each index. Instead, I can do the filtering first, then execute all of the indexes at once.

Andrew Harry
04/18/2012 11:15 PM by
Andrew Harry

Ayende, given the scarcity of good blog posts on RavenDB - could you release more of your future ones?

I'm still learning but i'm loving what i'm seeing in RavenDB. Could you write more about 'best practises' and common pitfalls?

I am making my pitch to switch out our custom SQL & Azure tables combination in a CRM we are developing for RavenDB. So far I have shed nearly 6000+ lines of code and the conversion has taken less than a week.

Phillip Haydon
04/19/2012 03:26 AM by
Phillip Haydon

@Andrew - there's plenty of good blog posts for RavenDB, some people have found my few really helpful.

You can also join http://jabbr.net and chat to some of the guys in #ravendb room. Lots of people come in seeking help for different things and we can help most of the time.

Google Group is awesome too if you want quick answers from Ayende or his team.

Ayende Rahien
04/19/2012 08:11 AM by
Ayende Rahien

Andrew, There are actually a LOT of really good blog posts about RavenDB. And we have a lot of documentation, screen casts and both TekPub and PluralSight have courses about it. Most of the future posts here are actually about how to implement and optimize RavenDB, not how to work with it.

Tim
04/20/2012 07:32 AM by
Tim

Ayende, your pseudo code looks remarkably like python, any plans to switch languages? ;-)

Ayende Rahien
04/21/2012 09:16 AM by
Ayende Rahien

Tim, This is actually Boo. And I have been doing that for 7 years or so.

Comments have been closed on this topic.