After looking at this post detailing how to optimize data queries in EF Core, I obviously decided that I need to test how RavenDB handles the same load.
To make things fair, I tested this on my laptop, running on battery mode. The size of the data isn’t that much, only 100,000 books and half a million reviews, so I decided to increase that by an order of magnitude.
The actual queries we make from the application are pretty simple and static. We can sort by votes / publication date / price (ascending /descending) and we can filter by number of votes and the publication year.
This means that we don’t have an explosion of querying options, so that simplify the kind of work we are doing. To make things simple for myself, I kept the same model of books / authors and reviews as separate collections. This isn’t the best model for document database, but it allows us to compare apples to apples against the work the EF Core based solution and the RavenDB solution need to do.
A major cost in Jon’s solution is the need to aggregate the reviews for a book (so the average for the review can be computed). In the end, the only way to get the solution required was to just manually calculate the average reviews for each book and store the computation in the book. We’ll discuss this a bit more in a few minutes, for now, I want to turn our eyes toward the simplest possible query in this page, getting 100 books sorted by the book id.
Because we aren’t running on the same machine, it is hard to make direct parallels, but on Jon’s machine he got 80 ms for this kind of query on 100,000 books. When increasing the data to half a million books, the query time rose to 150ms. Running the same query gives us the results instantly (zero ms). Querying and sorting by the title, for example, give us the results in 19 ms for a page size of 100 books.
Now, let us look at the major complexity for this system, sorting and filtering by the number of votes in the system. This is hard because the reviews are stored separately from the books. With EF Core, there is the need to join between the tables, which is quite expensive and eventually led Jon to take upon himself the task of manually maintaining the values. With RavenDB, we can use a map/reduce index to handle this all for us. More specifically, we are going to use a multi map/reduce index.
Here is what the index definition looks like:
We map the results from both the Books and the BookReviews into the same shape, and then reduce them together into the final output, which contains the relevant aggregation.
Now, let us do some queries, shall we? Here is us querying over the entire dataset (an order of magnitude higher than the EF Core sample set), filtering by the published date and ordering by the computed votes average. In here, we get the first 100 items, and you can see that we got over 289,753 total results:
One very interesting feature of this query is that we are asking to include the book document for the results. This is handled after the query (so no need to do a join to the entire 289K+ results), and we are able to get everything we want in a very simple fashion.
Oh, and the total time? 17 ms. Compared to the 80ms result for EF with 1/10 of the data size. That is pretty nice (and yes, different machines, hard to compare, etc).
I’ll probably have another post on this topic, showing off some of the cool things that you can do with RavenDB and queries.
More posts in "re" series:
- (17 Nov 2022) RavenDB in a Distributed Cloud Environment
- (25 Jul 2022) Build your own database at Cloud Lunch & Learn
- (15 Jul 2022) Non relational data modeling & Database engine internals
- (11 Apr 2022) Clean Architecture with RavenDB
- (14 Mar 2022) Database Security in a Hostile World
- (02 Mar 2022) RavenDB–a really boring database