Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

Profiler new featureToo many joins detection


This is Josh’s feature, since we wrote most of the code for it together. Basically, it recognize a very common performance problem, queries that uses too many joins, such as this one:

image

Which would result in the following warning:

image

Queries with too many joins might be a performance problem. Each join requires the database to perform additional work, and the complexity and cost of the query grows rapidly with each additional join. While relational database are optimized for handling joins, it is often more efficient to perform several separate queries instead of a single query with several joins in it.

For OLTP systems, you should consider simplifying your queries or simplifying the data model. While I do not recommend avoiding joins completely, I strong discourage queries with large numbers of joins. Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development.

More posts in "Profiler new feature" series:

  1. (24 Feb 2010) Too many joins detection
  2. (14 Feb 2010) Integrating with application frameworks
  3. (26 Jan 2010) Side by Side diff

Comments

kk
kk

what's the condition to assert that there're too many join?

(i mean for example 'join count > 3')

Ayende Rahien

kk,

I am not sure that I understand your question

tobi

I think that your general point about joins is correct. However in general many joins at once can very well be more efficient than separate queries if the problem of a cartesian product does not occur. In your example i do not understand why you do inner joins from Blog to Posts and Comments. You would not get Blogs without Posts or without Comments.

Ayende Rahien

tobi,

This is just an example.

Frans Bouma

As you use inner joins, the smallest set filters the rest. This means that if the DB, based on statistics, first joins the pairs with the smallest sets, the temporary set in each hash loop is small. This means that you can't conclude without statistics (and how the optimizer of the DB works) that a given number of joins is 'too much' and will perform poorly, in fact, it might very well be much more efficient than alternatives.

kk
kk

i mean that when the profiler takes a sql query A how it decides that it must generate a warning of the type 'This statement has too many joins...'?

my hypothesis is that it do a simple check on the count of the number of joint into the query so it will do

if (joinCount > MaxAllowedJoin)

{

 //generate warning

}

den Ben

I'm also not convinced that this is a useful feature.

"For OLTP systems, you should consider simplifying your queries or simplifying the data model. While I do not recommend avoiding joins completely, I strong discourage queries with large numbers of joins."

OLTP systems typically are normalized to process many transactions as fast as possible. Querying an OLTP system will most likely always mean joining several tables. Simplifying the data model may actually be a BAD idea (in most cases it will be...).

"Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development."

If you don't notice that during development something is seriously wrong with your testing process. And sometimes you may actually need a crossjoin.

Frank Quednau

there is usually explicit support of cross join in SQL, that is, if you meant it, you better write it down. However I don't actually know if it is supported through an OR/M like NH.

Anonymous Coward

Can we get a warning about Cartesian products?

Ayende Rahien

kk,

Yes, that is basically it.

Frans Bouma

Isn't that pre-mature optimization as there ever is one? You can't decide based on the # of joins if it's more efficient to do it differently. So this warning/error gives a false notion to the user: the user might go and do things differently which are less efficient (as in-memory hash-loops to merge sets is always slower as data has to move to the client and the client can't decide which set to do first based on statistics, the rdbms can)

Petar Petrov

Hi, Ayende. The question of kk is how you can judge that a query contains to much joins ? Is there any thresholds for the number of joins ? For example a query containing Blogs,Posts, Comment, do this query will trigger the alert ? Is the number of maximum joins configurable ?

Ayende Rahien

Petar,

Just by looking at the number of joins, yes.

And yes, that query would be caught, properly, I think.

And yes, it is configurable.

Jerome

Depends on tables to join.

For Table which contains LookUp Types with few items, I don't think it's an issue.

Jerome

Fabio Maulo

Perhaps is the time to add a new "warning" when the user use something like

"select id from children where date = (select max(date) from children where id = parentid)"

for a many-to-one instead a simple and efficient client-server de-normalization.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats