Semi-join materialization strategy
Semi-join Materialization is a special kind of subquery materialization that's used for Semi-join subqueries. It actually includes two strategies:
- Materialization/lookup
- Materialization/scan
The idea
Let us again look at the query that finds countries in Europe that have big cities:
select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000) and Country.continent='Europe'
The subquery is uncorrelated, that is, we can run it independently of the upper query. The idea of semi-join materialization is to do just that, and fill a temporary table with possible values of City.country field of big cities, and then do a join with countries in Europe:
The join can be done in two directions: 1. From Materialized table to countries in Europe 2. From Countries in Europe to materialized table
The first way involves doing a full scan on the materialized table, so we call it "Materialization-scan".
If you run a join from Countries to materialized table, the cheapest way to find a match in materialized table is to make a lookup on its primary key (it has one: we used it to remove duplicates). Because of that, we call the strategy "Materialization-lookup".
Semi-join materialization in action
Materialization-Scan
If we chose to look for cities with population greater than 7 million, the optimizer will use Materialization-Scan and the EXPLAIN will show this:
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000); +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+ | 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 15 | | | 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.City.Country | 1 | | | 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition | +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+ 3 rows in set (0.01 sec)
Here, you can see:
- There are still two SELECTs (look for columns with
id=1
andid=2
MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 7*1000*1000)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <subquery2> type: ALL possible_keys: distinct_key key: NULL key_len: NULL ref: NULL rows: 15 Extra: *************************** 2. row *************************** id: 1 select_type: PRIMARY table: Country type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: world.City.Country rows: 1 Extra: *************************** 3. row *************************** id: 2 select_type: MATERIALIZED table: City type: range possible_keys: Population,Country key: Population key_len: 4 ref: NULL rows: 15 Extra: Using index condition 3 rows in set (0.00 sec)