All pages
Powered by GitBook
1 of 1

Loading...

Semi-join Subquery Optimizations

MariaDB has a set of optimizations specifically targeted at semi-join subqueries.

What is a Semi-Join Subquery

A semi-join subquery has a form of

that is, the subquery is an IN-subquery and it is located in the WHERE clause. The most important part here is

with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery

Let's see why this is important. Consider a semi-join subquery:

One can execute it "naturally", by starting from countries in Europe and checking if they have populous Cities:

The semi-join property also allows "backwards" execution: we can start from big cities, and check which countries they are in:

To contrast, let's change the subquery to be non-semi-join:

It is still possible to start from countries, and then check

  • if a country has any big cities

  • if it has a large surface area:

The opposite, city-to-country way is not possible. This is not a semi-join.

Difference from Inner Joins

Semi-join operations are similar to regular relational joins. There is a difference though: with semi-joins, you don't care how many matches an inner table has for an outer row. In the above countries-with-big-cities example, Germany will be returned once, even if it has three cities with populations of more than one million each.

Semi-Join Optimizations in MariaDB

MariaDB uses semi-join optimizations to run IN subqueries.The optimizations are enabled by default. You can disable them by turning off their like so:

MariaDB has five different semi-join execution strategies:

See Also

  • blog post

This page is licensed: CC BY-SA / Gnu FDL

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
SELECT * FROM Country 
WHERE 
  Continent='Europe' AND 
  Country.Code IN (SELECT City.country 
                   FROM City 
                   WHERE City.Population>1*1000*1000);
DuplicateWeedout execution strategy
optimizer_switch
Table pullout optimization
FirstMatch execution strategy
Semi-join Materialization execution strategy
LooseScan execution strategy
Subquery Optimizations Map
"Observations about subquery use cases"
http:en.wikipedia.org/wiki/Semijoin
semi-join-outer-to-inner
semi-join-inner-to-outer
non-semi-join-subquery
SELECT * FROM Country 
WHERE 
   Country.Continent='Europe' AND 
   (Country.Code IN (SELECT City.country 
                   FROM City WHERE City.Population>1*1000*1000) 
    OR Country.SurfaceArea > 100*1000  -- Added this part
   );
SET optimizer_switch='semijoin=off'