Ayende @ Rahien

It's a girl

Profiler new feature: Too many joins detection

This is Josh’s feature, since we wrote most of the code for it together. Basically, it recognize a very common performance problem, queries that uses too many joins, such as this one:

image

Which would result in the following warning:

image

Queries with too many joins might be a performance problem. Each join requires the database to perform additional work, and the complexity and cost of the query grows rapidly with each additional join. While relational database are optimized for handling joins, it is often more efficient to perform several separate queries instead of a single query with several joins in it.

For OLTP systems, you should consider simplifying your queries or simplifying the data model. While I do not recommend avoiding joins completely, I strong discourage queries with large numbers of joins. Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development.

Comments

kk
02/24/2010 11:05 AM by
kk

what's the condition to assert that there're too many join?

(i mean for example 'join count > 3')

Ayende Rahien
02/24/2010 11:18 AM by
Ayende Rahien

kk,

I am not sure that I understand your question

tobi
02/24/2010 12:29 PM by
tobi

I think that your general point about joins is correct. However in general many joins at once can very well be more efficient than separate queries if the problem of a cartesian product does not occur. In your example i do not understand why you do inner joins from Blog to Posts and Comments. You would not get Blogs without Posts or without Comments.

Ayende Rahien
02/24/2010 12:52 PM by
Ayende Rahien

tobi,

This is just an example.

Frans Bouma
02/24/2010 02:58 PM by
Frans Bouma

As you use inner joins, the smallest set filters the rest. This means that if the DB, based on statistics, first joins the pairs with the smallest sets, the temporary set in each hash loop is small. This means that you can't conclude without statistics (and how the optimizer of the DB works) that a given number of joins is 'too much' and will perform poorly, in fact, it might very well be much more efficient than alternatives.

kk
02/24/2010 03:32 PM by
kk

i mean that when the profiler takes a sql query A how it decides that it must generate a warning of the type 'This statement has too many joins...'?

my hypothesis is that it do a simple check on the count of the number of joint into the query so it will do

if (joinCount > MaxAllowedJoin)

{

 //generate warning

}

den Ben
02/24/2010 03:33 PM by
den Ben

I'm also not convinced that this is a useful feature.

"For OLTP systems, you should consider simplifying your queries or simplifying the data model. While I do not recommend avoiding joins completely, I strong discourage queries with large numbers of joins."

OLTP systems typically are normalized to process many transactions as fast as possible. Querying an OLTP system will most likely always mean joining several tables. Simplifying the data model may actually be a BAD idea (in most cases it will be...).

"Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development."

If you don't notice that during development something is seriously wrong with your testing process. And sometimes you may actually need a crossjoin.

Frank Quednau
02/24/2010 04:15 PM by
Frank Quednau

there is usually explicit support of cross join in SQL, that is, if you meant it, you better write it down. However I don't actually know if it is supported through an OR/M like NH.

Anonymous Coward
02/24/2010 04:24 PM by
Anonymous Coward

Can we get a warning about Cartesian products?

Ayende Rahien
02/24/2010 05:25 PM by
Ayende Rahien

kk,

Yes, that is basically it.

Frans Bouma
02/25/2010 10:35 AM by
Frans Bouma

Isn't that pre-mature optimization as there ever is one? You can't decide based on the # of joins if it's more efficient to do it differently. So this warning/error gives a false notion to the user: the user might go and do things differently which are less efficient (as in-memory hash-loops to merge sets is always slower as data has to move to the client and the client can't decide which set to do first based on statistics, the rdbms can)

Petar Petrov
02/25/2010 11:29 AM by
Petar Petrov

Hi, Ayende. The question of kk is how you can judge that a query contains to much joins ? Is there any thresholds for the number of joins ? For example a query containing Blogs,Posts, Comment, do this query will trigger the alert ? Is the number of maximum joins configurable ?

Ayende Rahien
02/25/2010 11:31 AM by
Ayende Rahien

Petar,

Just by looking at the number of joins, yes.

And yes, that query would be caught, properly, I think.

And yes, it is configurable.

Jerome
02/26/2010 08:17 AM by
Jerome

Depends on tables to join.

For Table which contains LookUp Types with few items, I don't think it's an issue.

Jerome

Fabio Maulo
02/28/2010 04:42 AM by
Fabio Maulo

Perhaps is the time to add a new "warning" when the user use something like

"select id from children where date = (select max(date) from children where id = parentid)"

for a many-to-one instead a simple and efficient client-server de-normalization.

Comments have been closed on this topic.