Subqueries and JOINs

Understand when to use subqueries versus joins. This guide explains performance implications and how to rewrite subqueries as joins for efficiency.

A subquery can quite often, but not in all cases, be rewritten as a JOIN.

Rewriting Subqueries as JOINS

A subquery using IN can be rewritten with the DISTINCT keyword. Consider this query:

SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table2);

It can be rewritten like this:

SELECT DISTINCT table1.* FROM table1, table2 WHERE table1.col1=table2.col1;

NOT IN or NOT EXISTS queries can also be rewritten. For example, these two queries return the same result:

SELECT * FROM table1 WHERE col1 NOT IN (SELECT col1 FROM table2);
SELECT * FROM table1 WHERE NOT EXISTS (SELECT col1 FROM table2 
         WHERE table1.col1=table2.col1);

They can both be rewritten like this:

SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id 
       WHERE table2.id IS NULL;

Subqueries that can be rewritten as a LEFT JOIN are sometimes more efficient.

Using Subqueries instead of JOINS

There are some scenarios, though, which call for subqueries rather than joins:

  • When you want duplicates, but not false duplicates. Suppose Table_1 has three rows — {1,1,2} — and Table_2 has two rows — {1,2,2}. If you need to list the rows in Table_1 which are also in Table_2, only this subquery-based SELECT statement will give the right answer (1,1,2):

  • This SQL statement won't work:

  • because the result will be {1,1,2,2} — and the duplication of 2 is an error. This SQL statement won't work either:

  • because the result will be {1,2} — and the removal of the duplicated 1 is an error too.

  • When the outermost statement is not a query. The SQL statement:

  • can't be expressed using a join unless some rare SQL3 features are used.

  • When the join is over an expression. The SQL statement:

  • is hard to express with a join. In fact, the only way we can think of is this SQL statement:

  • which still involves a parenthesized query, so nothing is gained from the transformation.

  • When you want to see the exception. For example, suppose the question is: Which books are longer than Das Kapital? These two queries are effectively almost the same:

  • The difference is between these two SQL statements is, if there are two editions of Das Kapital (with different page counts), then the self-join example will return the books which are longer than the shortest edition of Das Kapital. That might be the wrong answer, since the original question didn't ask for "... longer than ANY book named Das Kapital" (it seems to contain a false assumption that there's only one edition).

This page is licensed: CC BY-SA / Gnu FDL

Last updated

Was this helpful?