RavenDB 4.0 Indexing Benchmark

time to read 4 min | 645 words

I run into this post, which was pretty interesting to read. It compares a bunch of ways to index inside CouchDB, so I decided to see how RavenDB 4.0 compares.

I wrote the following code to generate the data inside RavenDB:

In the CouchDB post, this took… a while. With RavenDB, this took 7.7 seconds and the database size at the end was 48.06 MB. This is my laptop, 6th gen i7 with 16 GB RAM and SSD drive. The CouchDB tests were run over a similar machine, but with 8 GB RAM, but RavenDB didn’t get to use much memory at all throughout the benchmark.

It is currently sitting on around 205MB working set, and allocated a total of 70 MB managed memory and 19 MB of native memory.

I then created the following index:

image

This does pretty much the same as the indexes created in CouchDB. Note that in this case, this is running in process, so the nearest equivalent would be to Erlang native views.

This took 1.281 seconds to index 100,000 documents, giving us a total of 78,064 indexed documents per second.

The working during the indexing process grew to 312 MB.

But those were small documents, what about larger ones? In the linked post, there was another test done, this time with each index also having a 300KB text property. Note that this is a single property.

The math goes like this: 300KB * 100,000 documents = 30,000,000 KB = 29,296 MB = 28.6 GB.

Inserting those took 1 minute and 40 seconds. However, the working set for RavenDB peaked at around 500 MB, and the total size of the database at the end was  512.06 MB. It took me a while to figure out what was going on.

One of the features that we have with the blittable format is the fact that we can compress large string values. The large string property was basically lorem ipsum repeated 700 times, so it compressed real well. And the beauty of this is that unless we actually need to touch that property and do something to it, it can remain compressed.

Indexing that amount of data took 1.45 seconds, for 68,965 indexes documents per second.

My next test was to avoid creating a single large property and go with a lot of smaller properties.

This generates 100,000 documents in the 90KB – 900 KB range. Inserting this amount of data took a while longer. This is mostly because of the amount of data that we write in this scenario which is in the many GB range. In fact, this is what my machine looked like while the insert was going on:

image

Overall, the insert process took 8:01 minutes to complete. And the final DB size was around 12 GB.

Indexing that amount of information took a lot longer, and consume a few resources:

image

Time to index? Well, it was about twice as much as before, with RavenDB taking a total of 2.58 seconds to index 100,000 documents totaling 12 GB in size.

That comes to 38,759 indexes documents per second.

A large part of that is related to the way we are storing information in blittable format. We don’t need to do any processing to it, so we are drastically faster.

Comparing to the CouchDB results in the linked post, we have:

  RavenDB CouchDB
Small documents 78,064 19,821
Large documents 38,759   9,363

In other words, we are talking about being 4 times faster than the faster CouchDB option.