Semi-join subquery optimizations

You are viewing an old version of this article. View the current version here.

MariaDB has a set of optimizations specifically targeted at semi-join subqueries.

What is a semi-join subquery

A semi-join subquery has a form of

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

that is, the subquery an IN-subquery and it is located in the WHERE clause. The important part here is that with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery.

Semi-join subquery optimizations are disabled by default. To enable all of them, you need to turn on their optimizer_switch flags like so:

SET optimizer_switch='semijoin=on,firstmatch=on,materialization=on,loosescan=on'

If a subquery is processed as a semi-join, EXPLAIN EXTENDED... ; SHOW WARNINGS will show it as such:

MariaDB [world]> explain extended select * from Country where Country.Code IN 
   ->(SELECT Country from City where Population > 5*1000*1000);

Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` 
AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.
`Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`.
`Country` semi join (`world`.`City`) where ((`world`.`City`.`Population` > ((5 * 
1000) * 1000)))

without semi-join optimizations, EXPLAIN EXTENDED output will show the subquery predicate:

Note (Code 1003): select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` 
AS `Name`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.
`Population` AS `Population`,`world`.`Country`.`Capital` AS `Capital` from `world`.
`Country` where <in_optimizer>(`world`.`Country`.`Code`,`world`.`Country`.`Code` in 
(<materialize> (select `world`.`City`.`Country` from `world`.`City` where
 (`world`.`City`.`Population` > ((5 * 1000) * 1000)) ), <primary_index_lookup>
(`world`.`Country`.`Code` in <temporary table> on distinct_key where ((`world`.
`Country`.`Code` = `<subquery2>`.`Country`)))))

Subsequent sections describe the new subquery optimization strategies in greater detail.

Semi-join basics

Semi-join is a relational algebra operation. It is defined as an operation that returns all tuples from left relation, for which the right relation has any matches. To illustrate the difference from inner join, lets consider an example. First, let's consider a query with inner join:

select Country.Code from Country join City on Country.Code=City.Country;

(PIC)

Now, compare that with a semi-join operation:

In SQL, semi-joins are written as subqueries. The above operation can be expressed as:

See Also

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.