Table pullout optimization
Contents
The idea of Table Pullout
Sometimes, a subquery can be re-written as a join. Let's take this query as example:
select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
If we know that there can be at most one country with with a given value of Country.Code (we can tell that if we see that table Country has a primary key or unique index over that column), we can re-write this query as:
select City.* from City, Country where City.Country=Country.Code AND Country.Population < 100*1000;
Table pullout in action
If one runs EXPLAIN for the above query in MySQL 5.1-5.6 or MariaDB 5.1-5.2, they'll get this plan:
MySQL [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000); +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | City | ALL | NULL | NULL | NULL | NULL | 4079 | Using where | | 2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3 | func | 1 | Using where | +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ 2 rows in set (0.00 sec)
It shows that the optimizer is going to do a full scan on table City, and for each city it will do a lookup in table Country.
Now, if one runs the same query in MariaDB 5.3, they will get this plan:
MariaDB [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000); +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ | 1 | PRIMARY | Country | range | PRIMARY,Population | Population | 4 | NULL | 37 | Using index condition | | 1 | PRIMARY | City | ref | Country | Country | 3 | world.Country.Code | 18 | | +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ 2 rows in set (0.00 sec)
The interesting parts are:
- Both tables have
select_type=PRIMARY
, andid=1
as if they were in one join. - The `Country` table is first, followed by the `City`.
Indeed, if one runs EXPLAIN EXTENDED; SHOW WARNINGS, they will see that the subquery is gone and it was replaced with a join:
MariaDB [world]> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select `world`.`City`.`ID` AS `ID`,`world`.`City`.`Name` AS `Name`,`world`.`City`.`Country` AS `Country`,`world`.`City`.`Population` AS `Population` from `world`.`City` join `world`.`Country` where ((`world`.`City`.`Country` = `world`.`Country`.`Code`) and (`world`.`Country`. `Population` < (100 * 1000))) 1 row in set (0.00 sec)
Changing subquery into join allows to feed the join to join optimizer, which can make a choice between two possible join orders
- City -> Country
- Country -> City
as opposed to the only one choice of
- City->Country
we've had before the optimization.
In the above example, the choice produced a better query plan. Without pullout, the query plan with subquery was to read 4079 + 1*4079=8158 table records. With table pullout, the join plan was to read 37 + 37 * 18 = 703 rows. Not all row reads are equal, but generally, reading 10 times fewer table records is faster.
Table pullout factsheet
- Table pullout is possible only in semi-join subqueries.
- Table pullout is based on UNIQUE/PRIMARY key definitions.
- Doing table pullout does not cut off any possible query plans, so MariaDB will always try to pull out as much as possible.
- Table pullout is able to pull individual tables out of subqueries to their parent selects. If all tables in a subquery has been pulled out, the subquery (i.e. its semi-join) is removed completely.
- A common optimizer advice for MySQL has been "If possible, rewrite your subqueries as joins". Table pullout does exactly that, so manual rewrite is no longer necessary.
Controlling table pullout
There is no separate @@optimizer_switch flag for table pullout. Table pullout can be disabled by switching off all semi-join optimizations with
SET @@optimizer_switch='optimizer_semijoin=off'
command.