Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,968 | Comments: 44,484

filter by tags archive

RavenDB & FreeDBAn optimization story


So, as I noted in a previous post, we loaded RavenDB with all of the music CDs in existence (or nearly so). A total of 3.1 million disks and 43 million tracks. And we had some performance problems. But we got over them, and I am proud to give you the results:

  Old New
Importing Data Couple of hours 42 minutes
Raven/DocumentsByEntityName And hour and a half 23.5 minutes
Simple index over disks Two hours and twenty minutes 24.1 minutes
Full text index over disks and tracks More than seven hours 37.5 minutes

Tests were run on the same machine, and the database HD was  a single 300 GB 7200 RPM drive.

I then decided to take this one step further, and check what would happen when we already had the indexes. So we created three indexes. One Raven/DocumentsByEntityName, one for doing simple querying over disks and one for full text searches on top of all disks and tracks.

With 3.1 million documents streaming in, and three indexes (at least one of them decidedly non trivial), the import process took an hour and five minutes. Even more impressive, the indexing process was fast enough to keep up with the incoming data so we only had about 1.5 seconds latency between inserting a document and having it indexed. (Note that we usually seem much lower times for indexing latencies, usually in the low tens of milliseconds, when we aren’t being bombarded with documents).

Next up, and something that we did not optimize, was figuring out how costly it would be to query this. I decided to go for the big guns, and tested querying the full text search index.

Testing “Query:Adele” returned a result (from a cold booted database) in less than 0.8 seconds. But remember, this is after a cold boot. So let us see what happen when we issue a few other queries?

  • Query:Pearl - 0.65 seconds
  • Query:Abba – 0.67 seconds
  • Query:Queen – 0.56 seconds
  • Query:Smith – 0.55 seconds
  • Query:James – 0.77 seconds

Note that I am querying radically different values, so I force different parts of the index to load.

Querying for “Query:Adele” again? 32 milliseconds.

Let us see a few more:

  • Query:Adams – 0.55 seconds
  • Query:Abrahams – 0.6 seconds
  • Query:Queen – 85 milliseconds
  • Query:James – 0.1 seconds

Now here are a few things that you might want to consider:

  1. We have done no warm up to the database, just started it up from cold boot and started querying.
  2. I actually think that we can do better than this, and this is likely to be the next place we are going to focus our optimization efforts.
  3. We are doing a query here over 3.1 million documents, using full text search.
  4. There is no caching involved in the speed increases.

More goodies are coming in.

More posts in "RavenDB & FreeDB" series:

  1. (17 Apr 2012) An optimization story
  2. (16 Apr 2012) An optimization opportunity

Comments

dotnetchris

Major major spike for Raven. I've always loved using it but importing big data was always a hurt piece it's great to see this vast of an improvement.

Rafal

This is afaik the first post showing some real performance numbers. And the optimized results look quite good - 1000 inserts/updates per second is fast enough for most applications. I did not yet decide to use Raven for any real application but it becomes harder to resist with every release.

Ayende Rahien

Rafal, Those numbers are actually not that important, because we are not testing / optimizing insert performance. What we are doing is testing indexing performance, something quite different.

In most systems, a RavenDB server will usually have at most one write per client request, so a few hundreds or thousands is pretty awesome in terms of the numbers of users that you can support on a single box.

Rafal

Ayende, the indexing speed was my main concern - I was afraid that in some situations the indexing process may lag too much behind data modification and Raven would be serving strange results to users. Therefore I'm happy to see that this is not a bottleneck. A database integrated with Lucene index is a great combination and one of the biggest advantages of Ravendb imho. Updating Lucene index in realtime is tricky and difficult to do when data modifications are very frequent, so a tool that can do that automatically and almost online is very valuable.

Lev Gorodinski

Ayende, are there plans to optimize faceted search in RavenDB? Or is this something that you think should be handled by a framework dedicated to search, such as Solr or ElasticSearch? I've made a simple optimization to the FacetedQueryRunner: http://pastebin.com/SRNtSaKN however I couldn't optimize much further without digging deeper into the internals.

Micha Schopman

Call me stupid, but whatexactly is optimized now. Was the previous post written on an older version of RavenDB and this post on a current stable release after optimizations had been made?

What did you change to speed up the import of the raw data for example? :)

Felipe Fujiy Pessoto

Ayende, since you´re using Lucene to store Indexes, do you actually change Lucene Source Code to do improvements?

Matt

@Lev Gorodinski

Can you explain a bit more what that code sample is doing? What's the optimisation?

Matt

@ Lev Gorodinski

Also see this thread https://groups.google.com/forum/#!msg/ravendb/GJImaCrrKkk/VVvXLXQ1RFcJ, for a discussion on other ways of improving the perf. But note that for most scenarios, the current impl is pretty quick!

Ayende Rahien

Lev, We have recently significantly optimized faceted searches, so you should probably try again. What optimization did you do in the referenced code?

Ayende Rahien

Micha, These posts were written two months ago, they refer to changes that went into the stable build 701. I'll discuss the exact changes in the next few posts.

Ayende Rahien

Felipe, No, all of those optimizations had to do with managing how we interact with Lucene, not Lucene itself.

Lev Gorodinski

The optimization is explained here: http://stackoverflow.com/questions/7640227/query-product-catalog-ravendb-store-for-spec-aggregate-over-arbitrary-collection it is minor overall.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  2. Production postmortem (4):
    23 Jul 2015 - The case of the native memory leak
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats