Subqueries and JOINs
Understand when to use subqueries versus joins. This guide explains performance implications and how to rewrite subqueries as joins for efficiency.
A subquery can quite often, but not in all cases, be rewritten as a JOIN.
Rewriting Subqueries as JOINS
A subquery using IN can be rewritten with the DISTINCT keyword. Consider this query:
SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table2);It can be rewritten like this:
SELECT DISTINCT table1.* FROM table1, table2 WHERE table1.col1=table2.col1;NOT IN or NOT EXISTS queries can also be rewritten. For example, these two queries return the same result:
SELECT * FROM table1 WHERE col1 NOT IN (SELECT col1 FROM table2);
SELECT * FROM table1 WHERE NOT EXISTS (SELECT col1 FROM table2
WHERE table1.col1=table2.col1);They can both be rewritten like this:
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id
WHERE table2.id IS NULL;Subqueries that can be rewritten as a LEFT JOIN are sometimes more efficient.
Using Subqueries instead of JOINS
There are some scenarios, though, which call for subqueries rather than joins:
When you want duplicates, but not false duplicates. Suppose
Table_1has three rows — {1,1,2} — andTable_2has two rows — {1,2,2}. If you need to list the rows inTable_1which are also inTable_2, only this subquery-basedSELECTstatement will give the right answer (1,1,2):
This SQL statement won't work:
because the result will be {
1,1,2,2} — and the duplication of 2 is an error. This SQL statement won't work either:
because the result will be {
1,2} — and the removal of the duplicated 1 is an error too.When the outermost statement is not a query. The SQL statement:
can't be expressed using a join unless some rare SQL3 features are used.
When the join is over an expression. The SQL statement:
is hard to express with a join. In fact, the only way we can think of is this SQL statement:
which still involves a parenthesized query, so nothing is gained from the transformation.
When you want to see the exception. For example, suppose the question is: Which books are longer than Das Kapital? These two queries are effectively almost the same:
The difference is between these two SQL statements is, if there are two editions of Das Kapital (with different page counts), then the self-join example will return the books which are longer than the shortest edition of Das Kapital. That might be the wrong answer, since the original question didn't ask for "... longer than
ANYbook named Das Kapital" (it seems to contain a false assumption that there's only one edition).
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?

