Ayende @ Rahien

It's a girl

What is up with RavenDB 2.0? Performance…

Well, one thing that we put a lot of focus on was performance. In order to test that, I had a dataset of 4.66 million documents (IMDB data set, if you care) as well as two indexes defined.

The results for RavenDB 2.0 (drum roll):

Loading 4.66 millions records in 44 minutes. Average rate of less then half a millisecond per document.

But wait, what about the indexes? Well, RavenDB index stuff as they come, and as we were inserting the documents, they were indexed along the way. That meant that 11 seconds after we were done putting 4.66 millions documents to RavenDB, we were done indexing (across all indexes).

Pretty nice perf, even if I say so myself.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Pop Catalin
11/22/2012 04:23 PM by
Pop Catalin

"Loading 4.66 millions records in 44 minutes"

This means 1756 documents / second.

Is the I/O channel saturated from this? Disk write speed maxed out?

I don't know about the complexity of those documents, however an ETL process can reach over 50k "rows" per second on my modest machine using bulk load.

Therefore I think it would be interesting to see some benchmarks for small documents (1 property), medium (10-100 properties) large (1000+ properties) and the I/O caracteristics of Raven DB during such operations.

Ayende Rahien
11/22/2012 04:32 PM by
Ayende Rahien

Pop, This is meant to show indexing performance more than anything else. Bulk load is doing something quite different.

Jamie
11/22/2012 04:42 PM by
Jamie

What options do you have for doing an actual bulk load? Say if we wanted to load 250m moderately complex documents - is there some kind of bulk load option which can do batch indexing after?

Jeús López
11/22/2012 04:43 PM by
Jeús López

May I ask where you downloaded the IMDB dataset from?

Remco Ros
11/22/2012 07:23 PM by
Remco Ros

@Jeús http://www.imdb.com/interfaces

Nabil
11/22/2012 08:51 PM by
Nabil

Would be great if you could direct us to your ETL process. I noticed the old ETL project in the raven source is no longer there?

Will Hughes
11/22/2012 10:13 PM by
Will Hughes

Is it possible to get a comparison with Raven 1.x's performance using the same dataset and hardware?

Ayende Rahien
11/22/2012 11:41 PM by
Ayende Rahien

Jamie, We will have bulk load work done after the release. It is a bit involved, as you might imagine.

Ayende Rahien
11/22/2012 11:41 PM by
Ayende Rahien

The "ETL code" is just the smuggler.

Daniel Lang
11/23/2012 08:29 AM by
Daniel Lang

I really don't understand why people care so much about 'bulk load' performance. I mean really, what's the difference between writing 1.000 or 5.000 documents per second WITHOUT indexing?

The whole point about raven is that is has indexes for you to do calculation or queries. If you don't need that, you have a key/value store for which you don't need raven in the first place.

Perf metrics without indexing are useless.

AndersM
11/23/2012 09:10 AM by
AndersM

Daniel: Of course it matters, Jamie clearly stated why. If you need to store large amounts of data quickly, and only need indexes later, bulking makes sense.

Ayende Rahien
11/23/2012 09:12 AM by
Ayende Rahien

AndersM, Not really, just loading the data and waiting for indexing, and loading the data with indexing would result in about the same time frame

AndersM
11/23/2012 09:45 AM by
AndersM

Ok, i did not know how Raven would handle this, but answered based on Daniels numbers :)

Guillaume
11/23/2012 09:51 AM by
Guillaume

AndersM, Not really, just loading the data and waiting for indexing, and loading the data with indexing would result in about the same time frame

Maybe in the Ravendb world... As Will Hughes suggested, it would be more interesting to see the difference with the previous release, right now it's just some random numbers.

Daniel Lang
11/23/2012 10:43 AM by
Daniel Lang

AndersM: My point is - the only metric I care about is the time it takes to do both, writing and indexing. No, I don't mean bulk import of data-sets because this is something you don't do frequently and when you do it, it's generally not time sensitive (like migrate from another database).

Catalin Pop
11/23/2012 11:36 AM by
Catalin Pop

@Daniel Bulk loading should include indexing. In my earlier example, indexing during bulk load is enabled.

Sean Kearon
11/23/2012 04:24 PM by
Sean Kearon

Very nice! What do the indexes look like?

Alexei K
11/26/2012 05:18 PM by
Alexei K

So, how did the older version do on this? What's the improvement (if any) does 2.0 bring?

Alexey
12/06/2012 01:07 PM by
Alexey

Have you full source code for this perf test?

Comments have been closed on this topic.