Optimizations for derived tables

You are viewing an old version of this article. View the current version here.

Derived tables are subqueries in the FROM clause. Prior to MariaDB 5.3/MySQL 5.6, they were too slow to be usable. In MariaDB 5.3/MySQL 5.6, there are two optimizations

  • Derived table merge
  • Automatic index creation for derived tables

which provide adequate performance.

Derived table merge

The idea

Users of "big" database systems are used to using FROM subqueries as a way to structure their queries. For example, if one's first thought was that they need to select cities with population greater than 10,000 people, and then that from these cities one needs to select those that are located in Germany, one could write this SQL:

select * 
from 
  (select * from City where Population > 10*1000) as big_city
where 
  big_city.Country='DEU'

For MySQL, such syntax was a taboo. If you run EXPLAIN for this query, you can see why:

mysql> explain select * from (select * from City where Population > 1*1000) as big_city where big_city.Country='DEU' ;
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 4068 | Using where |
|  2 | DERIVED     | City       | ALL  | Population    | NULL | NULL    | NULL | 4079 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.60 sec)

it plans to do the following actions:

derived-inefficent

From left to right:

  1. Execute the subquery: (select * from City where Population > 1*1000), exactly as it was written in the query
  2. Put result of the subquery into temporary table.
  3. Read back, and apply WHERE condition from the upper select, big_city.Country='DEU'

Executing subquery like this is very inefficient, because highly-selective condition from the parent select, (Country='DEU' ) is not used when scanning the base table City. We read too many records from the City table, and then we have to write them into temporary table and read back again, before finally filtering them out.

If one runs this query in MariaDB/MySQL 5.6, they get this

MariaDB [world]> explain select * from (select * from City where Population > 1*1000) as big_city where big_city.Country='DEU';
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys      | key     | key_len | ref   | rows | Extra                              |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
|  1 | SIMPLE      | City  | ref  | Population,Country | Country | 3       | const |   90 | Using index condition; Using where |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)

One can see that

  • The output has only one line. This means that the subquery has been merged into the top-level SELECT.
  • Table City is accessed through index on Country column. Apparently, Country='DEU' condition was used to construct ref access on the table.
  • The query will read about 90 rows, which is a big improvement over 4079 row reads plus 4068 temporary table reads/writes we've had before.

Derived table merge fact sheet

  • Derived tables (=subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same that requirements for VIEWs to allow algorithm=merge
  • The optimization is enabled by default. It can be disabled with
set @@optimizer_switch='derived_merge=OFF'
  • MySQL/MariaDB that do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. MySQL Bug#44802) of EXPLAIN statements taking very long time. Starting from MariaDB 5.3+ and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.

Automatic

See also

Optimizing Subqueries in the FROM Clause in MySQL 5.6 manual

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.