Semi-join subquery optimizations
MariaDB has a set of optimizations specifically targeted at semi-join subqueries.
What is a semi-join subquery
A semi-join subquery has a form of
SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
that is, the subquery an IN-subquery and it is located in the WHERE clause. The most important part here is that with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery. Let's see why this is important. Consider a semi-join subquery:
select * from Country where Continent='Europe' and Country.Code in (select City.country from City where City.Population>1*1000*1000);
One can execute it naturally, by starting from Countries in Europe and checking if they have populous Cities:
The semi-join property also allows to execute "backwards": we can start from big Cities, and check which countries they are in:
Starting in MariaDB 5.3.3, Semi-join subquery optimizations are enabled by default. To disable them, you need to turn off their optimizer_switch flags like so:
SET optimizer_switch='semijoin=off,firstmatch=off,materialization=off,loosescan=off'
If a subquery is processed as a semi-join, EXPLAIN EXTENDED... ; SHOW WARNINGS
will show it as such:
MariaDB [world]> explain extended select * from Country where Country.Code IN ->(SELECT Country from City where Population > 5*1000*1000); Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`. `Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`. `Country` semi join (`world`.`City`) where ((`world`.`City`.`Population` > ((5 * 1000) * 1000)))
without semi-join optimizations, EXPLAIN EXTENDED
output will show the subquery predicate:
Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`. `Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`. `Country` where <in_optimizer>(`world`.`Country`.`Code`,`world`.`Country`.`Code` in (<materialize> (select `world`.`City`.`Country` from `world`.`City` where (`world`.`City`.`Population` > ((5 * 1000) * 1000)) ), <primary_index_lookup> (`world`.`Country`.`Code` in <temporary table> on distinct_key where ((`world`. `Country`.`Code` = `<subquery2>`.`Country`)))))
Subsequent sections describe the new subquery optimization strategies in greater detail.
Semi-join basics
Semi-join is a relational algebra operation. It is defined as an operation that returns all tuples from left relation, for which the right relation has any matches. To illustrate the difference from inner join, lets consider an example. First, let's consider a query with inner join:
select Country.Code from Country join City on Country.Code=City.Country;
(PIC)
Now, compare that with a semi-join operation:
In SQL, semi-joins are written as subqueries. The above operation can be expressed as: