Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,123 | Comments: 45,465

filter by tags archive

What is up with RavenDB 2.0? Performance…

time to read 1 min | 115 words

Well, one thing that we put a lot of focus on was performance. In order to test that, I had a dataset of 4.66 million documents (IMDB data set, if you care) as well as two indexes defined.

The results for RavenDB 2.0 (drum roll):

Loading 4.66 millions records in 44 minutes. Average rate of less then half a millisecond per document.

But wait, what about the indexes? Well, RavenDB index stuff as they come, and as we were inserting the documents, they were indexed along the way. That meant that 11 seconds after we were done putting 4.66 millions documents to RavenDB, we were done indexing (across all indexes).

Pretty nice perf, even if I say so myself.


Pop Catalin

"Loading 4.66 millions records in 44 minutes"

This means 1756 documents / second.

Is the I/O channel saturated from this? Disk write speed maxed out?

I don't know about the complexity of those documents, however an ETL process can reach over 50k "rows" per second on my modest machine using bulk load.

Therefore I think it would be interesting to see some benchmarks for small documents (1 property), medium (10-100 properties) large (1000+ properties) and the I/O caracteristics of Raven DB during such operations.

Ayende Rahien

Pop, This is meant to show indexing performance more than anything else. Bulk load is doing something quite different.


What options do you have for doing an actual bulk load? Say if we wanted to load 250m moderately complex documents - is there some kind of bulk load option which can do batch indexing after?

Jeús López

May I ask where you downloaded the IMDB dataset from?

Remco Ros

@Jeús http://www.imdb.com/interfaces


Would be great if you could direct us to your ETL process. I noticed the old ETL project in the raven source is no longer there?

Will Hughes

Is it possible to get a comparison with Raven 1.x's performance using the same dataset and hardware?

Ayende Rahien

Jamie, We will have bulk load work done after the release. It is a bit involved, as you might imagine.

Ayende Rahien

The "ETL code" is just the smuggler.

Daniel Lang

I really don't understand why people care so much about 'bulk load' performance. I mean really, what's the difference between writing 1.000 or 5.000 documents per second WITHOUT indexing?

The whole point about raven is that is has indexes for you to do calculation or queries. If you don't need that, you have a key/value store for which you don't need raven in the first place.

Perf metrics without indexing are useless.


Daniel: Of course it matters, Jamie clearly stated why. If you need to store large amounts of data quickly, and only need indexes later, bulking makes sense.

Ayende Rahien

AndersM, Not really, just loading the data and waiting for indexing, and loading the data with indexing would result in about the same time frame


Ok, i did not know how Raven would handle this, but answered based on Daniels numbers :)


AndersM, Not really, just loading the data and waiting for indexing, and loading the data with indexing would result in about the same time frame

Maybe in the Ravendb world... As Will Hughes suggested, it would be more interesting to see the difference with the previous release, right now it's just some random numbers.

Daniel Lang

AndersM: My point is - the only metric I care about is the time it takes to do both, writing and indexing. No, I don't mean bulk import of data-sets because this is something you don't do frequently and when you do it, it's generally not time sensitive (like migrate from another database).

Catalin Pop

@Daniel Bulk loading should include indexing. In my earlier example, indexing during bulk load is enabled.

Sean Kearon

Very nice! What do the indexes look like?

Alexei K

So, how did the older version do on this? What's the improvement (if any) does 2.0 bring?


Have you full source code for this perf test?

Comment preview

Comments have been closed on this topic.


  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - 3 days from now
  2. The design of RavenDB 4.0: Making Lucene reliable - 4 days from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 5 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 6 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 7 days from now

And 13 more posts are pending...

There are posts all the way to May 30, 2016


  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (12):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats