La strategia DuplicateWeedout

Stai visualizzando una vecchia versione di questo article. Visualizza la versione più recente.

DuplicateWeedout è una strategia di esecuzione per le subquery di tipo semi-join.

L'idea

L'idea è eseguire la semi-join come se fosse una normale inner join, e poi eliminare le combinazioni di record duplicate usando una tabella temporanea.

Si supponga di avere una query con la quale si cercano i Paesi aventi più del 33% percento della loro popolazione in una grande città:

select * 
from Country 
where 
   Country.code IN (select City.Country
                    from City 
                    where 
                      City.Population > 0.33 * Country.Population and 
                      City.Population > 1*1000*1000);

Prima si esegue una normale inner join tra le tabelle City e Country:

duplicate-weedout-inner-join

La Inner join produce dei duplicati. Avremo la Germania tre volte, perché ha tre grandi città. Ma vediamo lo schema con DuplicateWeedout:

duplicate-weedout-diagram

Come si può vedere, è stata usata una tabella temporanea con una chiave primaria per evitare di produrre diversi record con 'Germany'.

DuplicateWeedout in azione

Start temporary e End temporary, presenti nell'ultimo diagramma, vengono mostrati nell'output di EXPLAIN:

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: range
possible_keys: Population,Country
          key: Population
      key_len: 4
          ref: NULL
         rows: 238
        Extra: Using index condition; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: world.City.Country
         rows: 1
        Extra: Using where; End temporary
2 rows in set (0.00 sec)

This query will read 238 rows from the City table, and for each of them will make a primary key lookup in the Country table, which gives another 238 rows. This gives a total of 476 rows, and you need to add 238 lookups in the temporary table (which are typically *much* cheaper since the temporary table is in-memory).

If we run the same EXPLAIN in MySQL, we'll get:

mysql> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 239
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: City
         type: index_subquery
possible_keys: Population,Country
          key: Country
      key_len: 3
          ref: func
         rows: 18
        Extra: Using where
2 rows in set (0.00 sec)

This plan will read (239 + 239*18) = 4541 rows, which is much slower.

Factsheet

  • DuplicateWeedout is shown as "Start temporary/End temporary" in EXPLAIN.
  • The strategy can handle correlated subqueries.
  • But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
  • DuplicateWeedout allows the optimizer to freely mix a subquery's tables and the parent select's tables.
  • There is no separate @@optimizer_switch flag for DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='optimizer_semijoin=off' command.

See also

Commenti

Sto caricando i commenti......
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.