After looking at this post detailing how to optimize data queries in EF Core, I obviously decided that I need to test how RavenDB handles the same load.
To make things fair, I tested this on my laptop, running on battery mode. The size of the data isn’t that much, only 100,000 books and half a million reviews, so I decided to increase that by an order of magnitude.
The actual queries we make from the application are pretty simple and static. We can sort by votes / publication date / price (ascending /descending) and we can filter by number of votes and the publication year.
This means that we don’t have an explosion of querying options, so that simplify the kind of work we are doing. To make things simple for myself, I kept the same model of books / authors and reviews as separate collections. This isn’t the best model for document database, but it allows us to compare apples to apples against the work the EF Core based solution and the RavenDB solution need to do.
A major cost in Jon’s solution is the need to aggregate the reviews for a book (so the average for the review can be computed). In the end, the only way to get the solution required was to just manually calculate the average reviews for each book and store the computation in the book. We’ll discuss this a bit more in a few minutes, for now, I want to turn our eyes toward the simplest possible query in this page, getting 100 books sorted by the book id.
Because we aren’t running on the same machine, it is hard to make direct parallels, but on Jon’s machine he got 80 ms for this kind of query on 100,000 books. When increasing the data to half a million books, the query time rose to 150ms. Running the same query gives us the results instantly (zero ms). Querying and sorting by the title, for example, give us the results in 19 ms for a page size of 100 books.
Now, let us look at the major complexity for this system, sorting and filtering by the number of votes in the system. This is hard because the reviews are stored separately from the books. With EF Core, there is the need to join between the tables, which is quite expensive and eventually led Jon to take upon himself the task of manually maintaining the values. With RavenDB, we can use a map/reduce index to handle this all for us. More specifically, we are going to use a multi map/reduce index.
Here is what the index definition looks like:
We map the results from both the Books and the BookReviews into the same shape, and then reduce them together into the final output, which contains the relevant aggregation.
Now, let us do some queries, shall we? Here is us querying over the entire dataset (an order of magnitude higher than the EF Core sample set), filtering by the published date and ordering by the computed votes average. In here, we get the first 100 items, and you can see that we got over 289,753 total results:
One very interesting feature of this query is that we are asking to include the book document for the results. This is handled after the query (so no need to do a join to the entire 289K+ results), and we are able to get everything we want in a very simple fashion.
Oh, and the total time? 17 ms. Compared to the 80ms result for EF with 1/10 of the data size. That is pretty nice (and yes, different machines, hard to compare, etc).
I’ll probably have another post on this topic, showing off some of the cool things that you can do with RavenDB and queries.
More posts in "re" series:
- (09 Dec 2021) Why IndexedDB is slow and what to use instead
- (23 Jun 2021) The performance regression odyssey
- (27 Oct 2020) Investigating query performance issue in RavenDB
- (27 Dec 2019) Writing a very fast cache service with millions of entries
- (26 Dec 2019) Why databases use ordered indexes but programming uses hash tables
- (12 Nov 2019) Document-Level Optimistic Concurrency in MongoDB
- (25 Oct 2019) RavenDB. Two years of pain and joy
- (19 Aug 2019) The Order of the JSON, AKA–irresponsible assumptions and blind spots
- (10 Oct 2017) Entity Framework Core performance tuning–Part III
- (09 Oct 2017) Different I/O Access Methods for Linux
- (06 Oct 2017) Entity Framework Core performance tuning–Part II
- (04 Oct 2017) Entity Framework Core performance tuning–part I
- (26 Apr 2017) Writing a Time Series Database from Scratch
- (28 Jul 2016) Why Uber Engineering Switched from Postgres to MySQL
- (15 Jun 2016) Why you can't be a good .NET developer
- (12 Nov 2013) Why You Should Never Use MongoDB
- (21 Aug 2013) How memory mapped files, filesystems and cloud storage works
- (15 Apr 2012) Kiip’s MongoDB’s experience
- (18 Oct 2010) Diverse.NET
- (10 Apr 2010) NoSQL, meh
- (30 Sep 2009) Are you smart enough to do without TDD
- (17 Aug 2008) MVC Storefront Part 19
- (24 Mar 2008) How to create fully encapsulated Domain Models
- (21 Feb 2008) Versioning Issues With Abstract Base Classes and Interfaces
- (18 Aug 2007) Saving to Blob
- (27 Jul 2007) SSIS - 15 Faults Rebuttal
- (29 May 2007) The OR/M Smackdown
- (06 Mar 2007) IoC and Average Programmers
- (19 Sep 2005) DLinq Mapping