Semi-join subquery optimizations
Contents
Semi-join subquery optimizations provide efficient handling of subqueries that that follow the form of:
SELECT ... WHERE expr IN (SELECT ... ) AND ...
that is, the subquery must be an IN-subquery and must be located in the WHERE clause in such a way that the WHERE clause can be rewritten into the form of subquery AND everything_else
. Our research shows it is the most common form of subquery.
Semi-join subquery optimizations are disabled by default. To enable all of them, you need to turn on their optimizer_switch flags like so:
SET optimizer_switch='semijoin=on,firstmatch=on,materialization=on,loosescan=on'
If a subquery is processed as a semi-join, EXPLAIN EXTENDED... ; SHOW WARNINGS
will show it as such:
MariaDB [world]> explain extended select * from Country where Country.Code IN ->(SELECT Country from City where Population > 5*1000*1000); Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`. `Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`. `Country` semi join (`world`.`City`) where ((`world`.`City`.`Population` > ((5 * 1000) * 1000)))
without semi-join optimizations, EXPLAIN EXTENDED
output will show the subquery predicate:
Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`. `Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`. `Country` where <in_optimizer>(`world`.`Country`.`Code`,`world`.`Country`.`Code` in (<materialize> (select `world`.`City`.`Country` from `world`.`City` where (`world`.`City`.`Population` > ((5 * 1000) * 1000)) ), <primary_index_lookup> (`world`.`Country`.`Code` in <temporary table> on distinct_key where ((`world`. `Country`.`Code` = `<subquery2>`.`Country`)))))
Subsequent sections describe the new subquery optimization strategies in greater detail.
Semi-join basics
Semi-join is a relational algebra operation. It is defined as an operation that returns all tuples from left relation, for which the right relation has any matches. To illustrate the difference from inner join, lets consider an example. First, let's consider a query with inner join:
select Country.Code from Country join City on Country.Code=City.Country;
(PIC)
Now, compare that with a semi-join operation:
In SQL, semi-joins are written as subqueries. The above operation can be expressed as: