RavenDB 4.0 Indexing Benchmark
I run into this post, which was pretty interesting to read. It compares a bunch of ways to index inside CouchDB, so I decided to see how RavenDB 4.0 compares.
I wrote the following code to generate the data inside RavenDB:
In the CouchDB post, this took… a while. With RavenDB, this took 7.7 seconds and the database size at the end was 48.06 MB. This is my laptop, 6th gen i7 with 16 GB RAM and SSD drive. The CouchDB tests were run over a similar machine, but with 8 GB RAM, but RavenDB didn’t get to use much memory at all throughout the benchmark.
It is currently sitting on around 205MB working set, and allocated a total of 70 MB managed memory and 19 MB of native memory.
I then created the following index:
This does pretty much the same as the indexes created in CouchDB. Note that in this case, this is running in process, so the nearest equivalent would be to Erlang native views.
This took 1.281 seconds to index 100,000 documents, giving us a total of 78,064 indexed documents per second.
The working during the indexing process grew to 312 MB.
But those were small documents, what about larger ones? In the linked post, there was another test done, this time with each index also having a 300KB text property. Note that this is a single property.
The math goes like this: 300KB * 100,000 documents = 30,000,000 KB = 29,296 MB = 28.6 GB.
Inserting those took 1 minute and 40 seconds. However, the working set for RavenDB peaked at around 500 MB, and the total size of the database at the end was 512.06 MB. It took me a while to figure out what was going on.
One of the features that we have with the blittable format is the fact that we can compress large string values. The large string property was basically lorem ipsum repeated 700 times, so it compressed real well. And the beauty of this is that unless we actually need to touch that property and do something to it, it can remain compressed.
Indexing that amount of data took 1.45 seconds, for 68,965 indexes documents per second.
My next test was to avoid creating a single large property and go with a lot of smaller properties.
This generates 100,000 documents in the 90KB – 900 KB range. Inserting this amount of data took a while longer. This is mostly because of the amount of data that we write in this scenario which is in the many GB range. In fact, this is what my machine looked like while the insert was going on:
Overall, the insert process took 8:01 minutes to complete. And the final DB size was around 12 GB.
Indexing that amount of information took a lot longer, and consume a few resources:
Time to index? Well, it was about twice as much as before, with RavenDB taking a total of 2.58 seconds to index 100,000 documents totaling 12 GB in size.
That comes to 38,759 indexes documents per second.
A large part of that is related to the way we are storing information in blittable format. We don’t need to do any processing to it, so we are drastically faster.
Comparing to the CouchDB results in the linked post, we have:
RavenDB | CouchDB | |
Small documents | 78,064 | 19,821 |
Large documents | 38,759 | 9,363 |
In other words, we are talking about being 4 times faster than the faster CouchDB option.
Comments
Interesting, although straight sql blows all these document DBs away ...
using the sqlite shell binary from http://sqlite.org/download.html
Run sqlite shell
Set timer on with sqlite>.timer on
Create Table sqlite>create table user(score,name,createdat);
Insert 100,000 rows in Table sqlite> BEGIN; sqlite> WITH cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<100000) insert into user SELECT x as score,"user"||x as name, datetime() as createdat FROM cnt; sqlite> COMMIT;
Execution time (~127ms) Run Time: real 0.127 user 0.125000 sys 0.000000
Sum values by month sqlite>select strftime('%m',createdat) AS month, sum(score) from user group by 1;
Execution time (~81ms) Run Time: real 0.081 user 0.093750 sys 0.000000
Mike, You are comparing apples and spaceships.
To start with, you are comparing in process db doing embedded work, and likely with fsync disabled to a remote database accessing work over the network. Next, your group by statement is a simple one on all values, where you need to actually show it per year/month.
And that said, since the month column isn't an aggregated value and isn't on the group by clause, I have no idea what sqlite puts there, but it can't be the right value.
Comment preview