Profiler new featureToo many joins detection
This is Josh’s feature, since we wrote most of the code for it together. Basically, it recognize a very common performance problem, queries that uses too many joins, such as this one:
Which would result in the following warning:
Queries with too many joins might be a performance problem. Each join requires the database to perform additional work, and the complexity and cost of the query grows rapidly with each additional join. While relational database are optimized for handling joins, it is often more efficient to perform several separate queries instead of a single query with several joins in it.
For OLTP systems, you should consider simplifying your queries or simplifying the data model. While I do not recommend avoiding joins completely, I strong discourage queries with large numbers of joins. Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development.
More posts in "Profiler new feature" series:
- (24 Feb 2010) Too many joins detection
- (14 Feb 2010) Integrating with application frameworks
- (26 Jan 2010) Side by Side diff
Comments
what's the condition to assert that there're too many join?
(i mean for example 'join count > 3')
kk,
I am not sure that I understand your question
I think that your general point about joins is correct. However in general many joins at once can very well be more efficient than separate queries if the problem of a cartesian product does not occur. In your example i do not understand why you do inner joins from Blog to Posts and Comments. You would not get Blogs without Posts or without Comments.
tobi,
This is just an example.
As you use inner joins, the smallest set filters the rest. This means that if the DB, based on statistics, first joins the pairs with the smallest sets, the temporary set in each hash loop is small. This means that you can't conclude without statistics (and how the optimizer of the DB works) that a given number of joins is 'too much' and will perform poorly, in fact, it might very well be much more efficient than alternatives.
i mean that when the profiler takes a sql query A how it decides that it must generate a warning of the type 'This statement has too many joins...'?
my hypothesis is that it do a simple check on the count of the number of joint into the query so it will do
if (joinCount > MaxAllowedJoin)
{
}
I'm also not convinced that this is a useful feature.
"For OLTP systems, you should consider simplifying your queries or simplifying the data model. While I do not recommend avoiding joins completely, I strong discourage queries with large numbers of joins."
OLTP systems typically are normalized to process many transactions as fast as possible. Querying an OLTP system will most likely always mean joining several tables. Simplifying the data model may actually be a BAD idea (in most cases it will be...).
"Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development."
If you don't notice that during development something is seriously wrong with your testing process. And sometimes you may actually need a crossjoin.
there is usually explicit support of cross join in SQL, that is, if you meant it, you better write it down. However I don't actually know if it is supported through an OR/M like NH.
Can we get a warning about Cartesian products?
kk,
Yes, that is basically it.
Isn't that pre-mature optimization as there ever is one? You can't decide based on the # of joins if it's more efficient to do it differently. So this warning/error gives a false notion to the user: the user might go and do things differently which are less efficient (as in-memory hash-loops to merge sets is always slower as data has to move to the client and the client can't decide which set to do first based on statistics, the rdbms can)
Hi, Ayende. The question of kk is how you can judge that a query contains to much joins ? Is there any thresholds for the number of joins ? For example a query containing Blogs,Posts, Comment, do this query will trigger the alert ? Is the number of maximum joins configurable ?
Petar,
Just by looking at the number of joins, yes.
And yes, that query would be caught, properly, I think.
And yes, it is configurable.
Depends on tables to join.
For Table which contains LookUp Types with few items, I don't think it's an issue.
Jerome
Perhaps is the time to add a new "warning" when the user use something like
"select id from children where date = (select max(date) from children where id = parentid)"
for a many-to-one instead a simple and efficient client-server de-normalization.
Comment preview