reEntity Framework Core performance tuning–part I
I run into a really interesting article about performance optimizations with EF Core and I thought that it deserve a second & third look. You might have noticed that I have been putting a lot of emphasis on performance and I had literally spent years on optimizing relational database access patterns, including building a profiler dedicated for inspecting what an OR/M is doing. I got the source and run the application.
I have a small bet with myself, saying that in any application using a relational database, I’ll be able to find a SELECT N+1 issue within one hour. So far, I think that my rate is 92% or so. In this case, I found the SELECT N+1 issue on the very first page load.
Matching this to the code, we have:
Which leads to:
And here we can already tell that there is a problem, we aren’t accessing the authors. This actually happens here:
So we have the view that is generating 10 out of 12 queries. And the more results per page you have, the more this costs.
But this is easily fixed once you know what you are looking at. Let us look at something else, the actual root query, it looks like this:
Yes, I too needed a minute to recover from this. We have:
- One JOIN
- Two correlated sub queries
Jon was able to optimize his code by 660ms to 80ms, which is pretty awesome. But that is all by making modifications to the access pattern in the database.
Given what I do for a living, I’m more interested in what it does inside the database, and here is what the query plan tells us:
There are only a few tens of thousands of records and the query is basically a bunch of index seeks and nested loop joins. But note that the way the query is structured forces the database to evaluate all possible results, then filter just the top few. That means that you have to wait until the entire result set has been processed, and as the size of your data grows, so will the cost of this query.
I don’t think that there is much that can be done here, given the relational nature of the data access ( no worries, I’m intending to write another post in this series, you guess what I’m going to write there, right? ).
More posts in "re" series:
- (19 Jun 2024) Building a Database Engine in C# & .NET
- (05 Mar 2024) Technology & Friends - Oren Eini on the Corax Search Engine
- (15 Jan 2024) S06E09 - From Code Generation to Revolutionary RavenDB
- (02 Jan 2024) .NET Rocks Data Sharding with Oren Eini
- (01 Jan 2024) .NET Core podcast on RavenDB, performance and .NET
- (28 Aug 2023) RavenDB and High Performance with Oren Eini
- (17 Feb 2023) RavenDB Usage Patterns
- (12 Dec 2022) Software architecture with Oren Eini
- (17 Nov 2022) RavenDB in a Distributed Cloud Environment
- (25 Jul 2022) Build your own database at Cloud Lunch & Learn
- (15 Jul 2022) Non relational data modeling & Database engine internals
- (11 Apr 2022) Clean Architecture with RavenDB
- (14 Mar 2022) Database Security in a Hostile World
- (02 Mar 2022) RavenDB–a really boring database
Comments
I skimmed the original article. Did it mention the database engine being used? That SQL looks pretty horrible, performance wise. Especially if it's hand written.
Jedak, This is against SQL Server, I assume, and the SQL was probably generated by EF Core.
What application is generating those screenshots? I want it :)
Paul, That is the Entity Framework Profiler, and the screen shots were taken using the Windows Snipping Tool
why do you say that the query has to evaluate all possible results to just take top few? The order is on publish date , filter on soft deleted, so if you have an index on these there should be no need to check all the records.
Anyway, over the last several years i'm gradually transitioning from 'everything in the application/use ORM and forget about SQL' back to 'just do it in SQL and nothing will beat it'. Dont know about your experiences, but after repeatedly having to meticulously build and adjust linq queries to generate the sql i'd like to see (and never mention the fact that its too easy to find Linq constructs unsupported by the provider) i started wondering why go through this at all if i already know what the query should be. And another thing is transactions, its so easy to shoot yourself in the foot when you try to make some transactional operations in the application code..
"I don’t think that there is much that can be done here, given the relational nature of the data access "...
There's still a ton of low hanging fruit in that query. Any time you look at the query plan and see a table being accessed multiple times, and/or multiple index seeks for one statement there's a potential for improvement.
First off, without even changing the SQL, you can create a covering index on Review.BookId, Review.NumStars so you're not scanning two indexes.
Next, combine the subqueries to one so you're not hitting Review twice for no reason:
I would also try selecting the page of reviews first and then joining in Book, but the SoftDeleted check may preclude that depending on whether the requirements are to return exactly @pageSize records or @pageSize or less.
Chris, Well, to start with, you'll need to get EF Core to generate this query, and I'm not sure that you could. There is always the option of feeding it the SQL, of course. Then, there is
AuthorsStringUdf
, which I'm not sure about, but if this does a lookup per row, that is going to be expensive.Oren... in the source article you referenced, step 2 of 4 was using Dapper to run an optimized query. It's about optimizing SQL access in general, not EF core in particular -- 2/3 of the article is non EF ways to optimize, or ways to use EF to warehouse fields vs reading them.
That's a standard technique with OR/M's -- use them for CRUD where performance isn't an issue, but once you're optimizing it only makes sense to hand code SQL vs throwing up your hands and saying "I don’t think that there is much that can be done here, given the relational nature of the data access".
At that point, it's not about generating an EF query, but a SQL query, and I'd bet that an optimized query would be pretty close to the warehoused version -- at least close enough to prefer it.
Rafal, You're correct, I was thinking about the query where you sort of the average number of stars. And yes, in many cases, raw SQL is easier when you have more complex queries.
The last picture is too vague to see clearly.
Nelson, Yes, I'm sorry, the original was a few thousands pixels, and I had to shrink it to fit. The idea was to give you the rough general outline of the plan, not to inspect it.
Comment preview