Semi-join subquery optimizations
Contents
MariaDB has a set of optimizations specifically targeted at semi-join subqueries.
What is a semi-join subquery
A semi-join subquery has a form of
SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
that is, the subquery an IN-subquery and it is located in the WHERE clause. The most important part here is that with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery. Let's see why this is important. Consider a semi-join subquery:
select * from Country where Continent='Europe' and Country.Code in (select City.country from City where City.Population>1*1000*1000);
One can execute it "naturally", by starting from Countries in Europe and checking if they have populous Cities:
The semi-join property also allows to execute "backwards": we can start from big Cities, and check which countries they are in:
To put things in contrast, let's change the subquery to be non-semi-join:
select * from Country where Country.Continent='Europe' and (Country.Code in (select City.country from City where City.Population>1*1000*1000) or Country.SurfaceArea > 100*1000 -- Added this part );
It is still possible to start from countries, and then check
- If a country has any big cities
- if it has a large surface area:
The opposite, city-to-country way is not possible. This is not a semi-join.
How semi-joins are different from joins
Semi-join operation is similar to regular relational join. There is a difference, though: with semi-join, you don't care how many matches inner table has for an outer row. In the above countries-with-big-cities example, Germany will be returned once, even if it has three cities with population of more than one million.
Semi-join optimizations in MariaDB
MariaDB uses semi-join optimizations to run IN subqueries starting from MariaDB 5.3. Starting in MariaDB 5.3.3, Semi-join subquery optimizations are enabled by default. You can disable them by turning off their optimizer_switch flag like so:
SET optimizer_switch='semijoin=off'
MariaDB has four different semi-join execution strategies:
- Table pullout
- Materialization
- LooseScan
- DuplicateWeedout