I run into a really interesting article about performance optimizations with EF Core and I thought that it deserve a second & third look. You might have noticed that I have been putting a lot of emphasis on performance and I had literally spent years on optimizing relational database access patterns, including building a profiler dedicated for inspecting what an OR/M is doing. I got the source and run the application.
I have a small bet with myself, saying that in any application using a relational database, I’ll be able to find a SELECT N+1 issue within one hour. So far, I think that my rate is 92% or so. In this case, I found the SELECT N+1 issue on the very first page load.
Matching this to the code, we have:
Which leads to:
And here we can already tell that there is a problem, we aren’t accessing the authors. This actually happens here:
So we have the view that is generating 10 out of 12 queries. And the more results per page you have, the more this costs.
But this is easily fixed once you know what you are looking at. Let us look at something else, the actual root query, it looks like this:
Yes, I too needed a minute to recover from this. We have:
- One JOIN
- Two correlated sub queries
Jon was able to optimize his code by 660ms to 80ms, which is pretty awesome. But that is all by making modifications to the access pattern in the database.
Given what I do for a living, I’m more interested in what it does inside the database, and here is what the query plan tells us:
There are only a few tens of thousands of records and the query is basically a bunch of index seeks and nested loop joins. But note that the way the query is structured forces the database to evaluate all possible results, then filter just the top few. That means that you have to wait until the entire result set has been processed, and as the size of your data grows, so will the cost of this query.
I don’t think that there is much that can be done here, given the relational nature of the data access ( no worries, I’m intending to write another post in this series, you guess what I’m going to write there, right? ).
More posts in "re" series:
- (23 Jun 2021) The performance regression odyssey
- (27 Oct 2020) Investigating query performance issue in RavenDB
- (27 Dec 2019) Writing a very fast cache service with millions of entries
- (26 Dec 2019) Why databases use ordered indexes but programming uses hash tables
- (12 Nov 2019) Document-Level Optimistic Concurrency in MongoDB
- (25 Oct 2019) RavenDB. Two years of pain and joy
- (19 Aug 2019) The Order of the JSON, AKA–irresponsible assumptions and blind spots
- (10 Oct 2017) Entity Framework Core performance tuning–Part III
- (09 Oct 2017) Different I/O Access Methods for Linux
- (06 Oct 2017) Entity Framework Core performance tuning–Part II
- (04 Oct 2017) Entity Framework Core performance tuning–part I
- (26 Apr 2017) Writing a Time Series Database from Scratch
- (28 Jul 2016) Why Uber Engineering Switched from Postgres to MySQL
- (15 Jun 2016) Why you can't be a good .NET developer
- (12 Nov 2013) Why You Should Never Use MongoDB
- (21 Aug 2013) How memory mapped files, filesystems and cloud storage works
- (15 Apr 2012) Kiip’s MongoDB’s experience
- (18 Oct 2010) Diverse.NET
- (10 Apr 2010) NoSQL, meh
- (30 Sep 2009) Are you smart enough to do without TDD
- (17 Aug 2008) MVC Storefront Part 19
- (24 Mar 2008) How to create fully encapsulated Domain Models
- (21 Feb 2008) Versioning Issues With Abstract Base Classes and Interfaces
- (18 Aug 2007) Saving to Blob
- (27 Jul 2007) SSIS - 15 Faults Rebuttal
- (29 May 2007) The OR/M Smackdown
- (06 Mar 2007) IoC and Average Programmers
- (19 Sep 2005) DLinq Mapping