You are viewing an old version of this article. View the current version here.

DuplicateWeedout is an execution strategy for Semi-join subqueries.

The idea

The idea is to run the semi-join as if it was a regular inner join, and then eliminate the duplicate record combinations using a temporary table.

Suppose, you've got a query: you're looking for countries that have more than 33% percent of their population in one big city:

select * 
from Country 
where 
   Country.code IN (select City.Country
                    from City 
                    where 
                      City.Population > 0.33 * Country.Population and 
                      City.Population > 1*1000*1000);

First, we run a regular inner join between City and Country tables:

duplicate-weedout-inner-join

we get duplicates: in the above example we've got Germany three times, because it has three big cities. Now, lets put Duplicate Weedout into the picture:

duplicate-weedout-diagram

DuplicateWeedout in action

EXPLAIN output will have Start temporary and End temporary:

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: range
possible_keys: Population,Country
          key: Population
      key_len: 4
          ref: NULL
         rows: 238
        Extra: Using index condition; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: world.City.Country
         rows: 1
        Extra: Using where; End temporary
2 rows in set (0.00 sec)

Factsheet

  • DuplicateWeedout is shown as "Start temporary/End temporary" in EXPLAIN
  • The strategy can handle correlated subqueries, but cannot be applied if the subquery has grouping or aggregate functions.
  • DuplicateWeedout allows the optimizer to freely mix subquery's tables and parent select's tables.
  • There is no separate @@optimizer_switch flag for DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='optimizer_semijoin=off' command.

Comments

Comments loading...