1 of 6

Optimizations for Derived Tables

Optimize derived tables in MariaDB Server queries. This section provides techniques and strategies to improve the performance of subqueries and complex joins, enhancing overall query efficiency.

Condition Pushdown into Derived Table Optimization

If a query uses a derived table (or a view), the first action that the query optimizer will attempt is to apply the derived-table-merge-optimization and merge the derived table into its parent select. However, that optimization is only applicable when the select inside the derived table has a join as the top-level operation. If it has a GROUP-BY, DISTINCT, or uses window functions, then derived-table-merge-optimization is not applicable.

In that case, the Condition Pushdown optimization is applicable.

Introduction to Condition Pushdown

Consider an example

The naive way to execute the above is to

Compute the OCT_TOTALS contents (for all customers).
The, select the line with customer_id=1

This is obviously inefficient, if there are 1000 customers, then one will be doing up to 1000 times more work than necessary.

However, the optimizer can take the condition customer_id=1 and push it down into the OCT_TOTALS view.

Inside the OCT_\TOTALS, the added condition is put into its HAVING clause, so we end up with:

Then, parts of HAVING clause that refer to GROUP BY columns are moved into the WHERE clause:

Once a restriction like customer_id=1 is in the WHERE, the query optimizer can use it to construct efficient table access paths.

Controlling the Optimization

The optimization is enabled by default. One can disable it by setting the flag condition_pushdown_for_derived to OFF.

The pushdown from HAVING to WHERE part is controlled by condition_pushdown_from_having flag in .

From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

No optimizer hint is available.

Derived Table Merge Optimization

Background

Users of "big" database systems are used to using FROM subqueries as a way to structure their queries. For example, if one's first thought was to select cities with population greater than 10,000 people, and then that from these cities to select those that are located in Germany, one could write this SQL:

For MySQL, using such syntax was taboo. If you run EXPLAIN for this query, you can see why:

It plans to do the following actions:

From left to right:

Execute the subquery: (SELECT * FROM City WHERE Population > 1*1000), exactly as it was written in the query.
Put result of the subquery into a temporary table.
Read back, and apply a WHERE condition from the upper select, big_city.Country='DEU'

Executing a subquery like this is very inefficient, because the highly-selective condition from the parent select, (Country='DEU') is not used when scanning the base table City. We read too many records from theCity table, and then we have to write them into a temporary table and read them back again, before finally filtering them out.

Derived table merge in action

If one runs this query in MariaDB/MySQL 5.6, they get this:

From the above, one can see that:

The output has only one line. This means that the subquery has been merged into the top-level SELECT.
Table City is accessed through an index on the Country column. Apparently, the Country='DEU' condition was used to construct ref access on the table.

Factsheet

Derived tables (subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same as requirements for VIEWs to allow algorithm=merge.
The optimization is enabled by default. It can be disabled with:

Versions of MySQL and MariaDB which do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. ) of EXPLAIN statements taking a very long time. Starting from + and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.

Derived Table with Key Optimization

The idea

If a derived table cannot be merged into its parent SELECT, it will be materialized in a temporary table, and then parent select will treat it as a regular base table.

Before /MySQL 5.6, the temporary table would never have any indexes, and the only way to read records from it would be a full table scan. Starting from the mentioned versions of the server, the optimizer has an option to create an index and use it for joins with other tables.

Example

Consider a query: we want to find countries in Europe, that have more than one million people living in cities. This is accomplished with this query:

The EXPLAIN output for it will show:

One can see here that

table <derived2> is accessed through key0.
ref column shows world.Country.Code
if we look that up in the original query, we find the equality that was used to construct ref access: Country.Code=cities_in_country.Country

Factsheet

The idea of "derived table with key" optimization is to let the materialized derived table have one key which is used for joins with other tables.
The optimization is applied then the derived table could not be merged into its parent SELECT
- which happens when the derived table doesn't meet criteria for mergeable VIEW
The optimization is ON by default; it can be switched off like so:

Lateral Derived Optimization

MariaDB supports the Lateral Derived optimization, also referred to as "Split Grouping Optimization" or "Split Materialized Optimization" in some sources.

Description

The optimization's use case is

The query uses a derived table (or a VIEW, or a non-recursive CTE)
The derived table/View/CTE has a GROUP BY operation as its top-level operation
The query only needs data from a few GROUP BY groups

An example of this: consider a VIEW that computes totals for each customer in October:

And a query that does a join with the customer table to get October totals for "Customer#1" and Customer#2:

Before Lateral Derived optimization, MariaDB would execute the query as follows:

Materialize the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.
Join it with table customer.

The EXPLAIN would look like so:

It is obvious that Step #1 is very inefficient: we compute totals for all customers in the database, while we will only need them for two customers. (If there are 1000 customers, we are doing 500x more work than needed here)

Lateral Derived optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables. This allows pushing the equality customer.customer_id=OCT_TOTALS.customer_id down into the derived table/view, where it can be used to limit the computation to compute totals only for the customer of interest.

The query plan will look as follows:

Scan table customer and find customer_id for Customer#1 and Customer#2.
For each customer_id, compute the October totals, for this specific customer.

The EXPLAIN output will look like so:

Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.

In EXPLAIN FORMAT=JSON output, the optimization is shown like so:

Note the "lateral": 1 member.

Controlling the Optimization

Lateral Derived is enabled by default. The optimizer will make a cost-based decision whether the optimization should be used.

If you need to disable the optimization, it has an flag. It can be disabled like so:

From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

For example, by default, this table and query makes use of the optimization:

CREATE TABLE t1 ( n1 INT(10) NOT NULL, n2 INT(10) NOT NULL, c1 CHAR(1) NOT NULL, KEY c1 (c1), KEY n1_c1_n2 (n1,c1,n2) ) ENGINE=innodb CHARSET=latin1;

INSERT INTO t1 VALUES (0, 2, 'a'), (1, 3, 'a');

INSERT INTO t1 SELECT seq+1,seq+2,'c' FROM seq_1_to_1000;

ANALYZE TABLE t1;

EXPLAIN SELECT t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G

References

Jira task:
Commit:

_{This page is licensed: CC BY-SA / Gnu FDL}

Split Materialized Optimization

This is another name for Lateral Derived Optimization.

_{This page is licensed: CC BY-SA / Gnu FDL}

Derived Table Merge Optimization

Background

SELECT * 
FROM 
  (SELECT * FROM City WHERE Population > 10*1000) AS big_city
WHERE 
  big_city.Country='DEU'

For MySQL, using such syntax was taboo. If you run EXPLAIN for this query, you can see why:

mysql> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
  AS big_city WHERE big_city.Country='DEU' ;
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 4068 | Using where |
|  2 | DERIVED     | City       | ALL  | Population    | NULL | NULL    | NULL | 4079 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.60 sec)

It plans to do the following actions:

From left to right:

Execute the subquery: (SELECT * FROM City WHERE Population > 1*1000), exactly as it was written in the query.
Put result of the subquery into a temporary table.
Read back, and apply a WHERE condition from the upper select, big_city.Country='DEU'

Derived table merge in action

If one runs this query in MariaDB/MySQL 5.6, they get this:

From the above, one can see that:

The output has only one line. This means that the subquery has been merged into the top-level SELECT.
Table City is accessed through an index on the Country column. Apparently, the Country='DEU' condition was used to construct ref access on the table.

Factsheet

Derived tables (subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same as requirements for VIEWs to allow algorithm=merge.
The optimization is enabled by default. It can be disabled with:

Versions of MySQL and MariaDB which do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. ) of EXPLAIN statements taking a very long time. Starting from + and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.

Lateral Derived Optimization

MariaDB supports the Lateral Derived optimization, also referred to as "Split Grouping Optimization" or "Split Materialized Optimization" in some sources.

Description

The optimization's use case is

The query uses a derived table (or a VIEW, or a non-recursive CTE)
The derived table/View/CTE has a GROUP BY operation as its top-level operation
The query only needs data from a few GROUP BY groups

An example of this: consider a VIEW that computes totals for each customer in October:

And a query that does a join with the customer table to get October totals for "Customer#1" and Customer#2:

Before Lateral Derived optimization, MariaDB would execute the query as follows:

Materialize the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.
Join it with table customer.

The EXPLAIN would look like so:

The query plan will look as follows:

Scan table customer and find customer_id for Customer#1 and Customer#2.
For each customer_id, compute the October totals, for this specific customer.

The EXPLAIN output will look like so:

Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.

In EXPLAIN FORMAT=JSON output, the optimization is shown like so:

Note the "lateral": 1 member.

Controlling the Optimization

Lateral Derived is enabled by default. The optimizer will make a cost-based decision whether the optimization should be used.

If you need to disable the optimization, it has an flag. It can be disabled like so:

From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

For example, by default, this table and query makes use of the optimization:

CREATE TABLE t1 ( n1 INT(10) NOT NULL, n2 INT(10) NOT NULL, c1 CHAR(1) NOT NULL, KEY c1 (c1), KEY n1_c1_n2 (n1,c1,n2) ) ENGINE=innodb CHARSET=latin1;

INSERT INTO t1 VALUES (0, 2, 'a'), (1, 3, 'a');

INSERT INTO t1 SELECT seq+1,seq+2,'c' FROM seq_1_to_1000;

ANALYZE TABLE t1;

EXPLAIN SELECT t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G

References

Jira task:
Commit:

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizations for Derived Tables

Condition Pushdown into Derived Table Optimization

Introduction to Condition Pushdown

Controlling the Optimization

See Also

Derived Table Merge Optimization

Background

Derived table merge in action

Factsheet

See Also

Derived Table with Key Optimization

The idea

Example

Factsheet

See Also

Lateral Derived Optimization

Description

Controlling the Optimization

References

Split Materialized Optimization

Condition Pushdown into Derived Table Optimization

Introduction to Condition Pushdown

Controlling the Optimization

See Also

Derived Table Merge Optimization

Background

Derived table merge in action

Factsheet

See Also

Optimizations for Derived Tables

Split Materialized Optimization

Derived Table with Key Optimization

The idea

Example

Factsheet

See Also

Lateral Derived Optimization

Description

Controlling the Optimization

References