Semi-join subquery optimizations

You are viewing an old version of this article. View the current version here.

MariaDB has a set of optimizations specifically targeted at semi-join subqueries.

What is a semi-join subquery

A semi-join subquery has a form of

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

that is, the subquery an IN-subquery and it is located in the WHERE clause. The most important part here is that with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery. Let's see why this is important. Consider a semi-join subquery:

select * from Country 
where 
  Continent='Europe' and 
  Country.Code in (select City.country 
                   from City 
                   where City.Population>1*1000*1000);

One can execute it "naturally", by starting from Countries in Europe and checking if they have populous Cities: semi-join-outer-to-inner

The semi-join property also allows to execute "backwards": we can start from big Cities, and check which countries they are in:

semi-join-inner-to-outer

Starting in MariaDB 5.3.3, Semi-join subquery optimizations are enabled by default. To disable them, you need to turn off their optimizer_switch flags like so:

SET optimizer_switch='semijoin=off,firstmatch=off,materialization=off,loosescan=off'

If a subquery is processed as a semi-join, EXPLAIN EXTENDED... ; SHOW WARNINGS will show it as such:

MariaDB [world]> explain extended select * from Country where Country.Code IN 
   ->(SELECT Country from City where Population > 5*1000*1000);

Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` 
AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.
`Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`.
`Country` semi join (`world`.`City`) where ((`world`.`City`.`Population` > ((5 * 
1000) * 1000)))

without semi-join optimizations, EXPLAIN EXTENDED output will show the subquery predicate:

Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` 
AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.
`Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`.
`Country` where <in_optimizer>(`world`.`Country`.`Code`,`world`.`Country`.`Code` in 
(<materialize> (select `world`.`City`.`Country` from `world`.`City` where
 (`world`.`City`.`Population` > ((5 * 1000) * 1000)) ), <primary_index_lookup>
(`world`.`Country`.`Code` in <temporary table> on distinct_key where ((`world`.
`Country`.`Code` = `<subquery2>`.`Country`)))))

Subsequent sections describe the new subquery optimization strategies in greater detail.

Semi-join basics

Semi-join is a relational algebra operation. It is defined as an operation that returns all tuples from left relation, for which the right relation has any matches. To illustrate the difference from inner join, lets consider an example. First, let's consider a query with inner join:

select Country.Code from Country join City on Country.Code=City.Country;

(PIC)

Now, compare that with a semi-join operation:

In SQL, semi-joins are written as subqueries. The above operation can be expressed as:

See Also

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.