Duplicate Weedout Strategy
DuplicateWeedout is an execution strategy for Semi-join subqueries.
The idea
The idea is to run the semi-join (a query with uses WHERE X IN (SELECT Y FROM ...)) as if it were a regular inner join, and then eliminate the duplicate record combinations using a temporary table.
Suppose, you have a query where you're looking for countries which have more than 33% percent of their population in one big city:
SELECT *
FROM Country
WHERE
Country.code IN (SELECT City.Country
FROM City
WHERE
City.Population > 0.33 * Country.Population AND
City.Population > 1*1000*1000);First, we run a regular inner join between the City and Country tables:

The Inner join produces duplicates. We have Germany three times, because it has three big cities.
Now, lets put DuplicateWeedout into the picture:

Here one can see that a temporary table with a primary key was used to avoid producing multiple records with 'Germany'.
DuplicateWeedout in action
The Start temporary and End temporary from the last diagram are shown in the EXPLAIN output:
This query will read 238 rows from the City table, and for each of them will make a primary key lookup in the Country table, which gives another 238 rows. This gives a total of 476 rows, and you need to add 238 lookups in the temporary table (which are typically much cheaper since the temporary table is in-memory).
If we run the same query with semi-join optimizations disabled, we'll get:
This plan will read (239 + 239*18) = 4541 rows, which is much slower.
Factsheet
DuplicateWeedoutis shown as "Start temporary/End temporary" inEXPLAIN.The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful
GROUP BYand/or aggregate functions.DuplicateWeedoutallows the optimizer to freely mix a subquery's tables and the parent select's tables.There is no separate @@optimizer_switch flag for
DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations withSET @@optimizer_switch='optimizer_semijoin=off'command.
See Also
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?

