Table pullout optimization
The idea of Table Pullout
Sometimes, a subquery can be re-written as a join. Let's take this query as example:
select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
If we know that there can be at most one country with with a given value of Country.Code (we can tell that if we see that table Country has a primary key or unique index over that column), we can re-write this query as:
select City.* from City, Country where City.Country=Country.Code AND Country.Population < 100*1000;
Table pullout in action
If one runs EXPLAIN for the above query in MySQL 5.1-5.6 or MariaDB 5.1-5.2, they'll get this plan:
MySQL [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000); +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | City | ALL | NULL | NULL | NULL | NULL | 4079 | Using where | | 2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3 | func | 1 | Using where | +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ 2 rows in set (0.00 sec)
It shows that the optimizer is going to do a full scan on table City, and for each city it will do a lookup in table Country.
Now, if one runs the same query in MariaDB 5.3, they will get this plan:
MariaDB [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000); +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ | 1 | PRIMARY | Country | range | PRIMARY,Population | Population | 4 | NULL | 37 | Using index condition | | 1 | PRIMARY | City | ref | Country | Country | 3 | world.Country.Code | 18 | | +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ 2 rows in set (0.00 sec)
The interesting parts are:
- Both tables have
select_type=PRIMARY
, andid=1
as if they were in one join. - The `Country` table is first, followed by the `City`.
Indeed, if one runs EXPLAIN EXTENDED; SHOW WARNINGS, they will see that the subquery is gone and it was replaced with a join:
MariaDB [world]> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select `world`.`City`.`ID` AS `ID`,`world`.`City`.`Name` AS `Name`,`world`.`City`.`Country` AS `Country`,`world`.`City`.`Population` AS `Population` from `world`.`City` join `world`.`Country` where ((`world`.`City`.`Country` = `world`.`Country`.`Code`) and (`world`.`Country`. `Population` < (100 * 1000))) 1 row in set (0.00 sec)
Changing subquery into join allows to feed the join to join optimizer, which can make a choice between two possible join orders
- City -> Country
- Country -> City