All pages
Powered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Condition Pushdown into Derived Table Optimization

If a query uses a derived table (or a view), the first action that the query optimizer will attempt is to apply the derived-table-merge-optimization and merge the derived table into its parent select. However, that optimization is only applicable when the select inside the derived table has a join as the top-level operation. If it has a GROUP-BY, DISTINCT, or uses window functions, then derived-table-merge-optimization is not applicable.

In that case, the Condition Pushdown optimization is applicable.

Introduction to Condition Pushdown

Consider an example

The naive way to execute the above is to

  1. Compute the OCT_TOTALS contents (for all customers).

  2. The, select the line with customer_id=1

This is obviously inefficient, if there are 1000 customers, then one will be doing up to 1000 times more work than necessary.

However, the optimizer can take the condition customer_id=1 and push it down into the OCT_TOTALS view.

Inside the OCT_\TOTALS, the added condition is put into its HAVING clause, so we end up with:

Then, parts of HAVING clause that refer to GROUP BY columns are moved into the WHERE clause:

Once a restriction like customer_id=1 is in the WHERE, the query optimizer can use it to construct efficient table access paths.

Controlling the Optimization

The optimization is enabled by default. One can disable it by setting the flag condition_pushdown_for_derived to OFF.

The pushdown from HAVING to WHERE part is controlled by condition_pushdown_from_having flag in .

From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

No optimizer hint is available.

See Also

  • Condition Pushdown through Window Functions (since )

  • (since )

  • The Jira task for the feature is .

This page is licensed: CC BY-SA / Gnu FDL

CREATE VIEW OCT_TOTALS AS
SELECT
  customer_id,
  SUM(amount) AS TOTAL_AMT
FROM orders
WHERE  order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY customer_id;

SELECT * FROM OCT_TOTALS WHERE customer_id=1
optimizer_switch
optimizer_switch
DERIVED_CONDITION_PUSHDOWN and NO_DERIVED_CONDITION_PUSHDOWN
Condition Pushdown into IN Subqueries
MDEV-9197
SELECT
  customer_id,
  SUM(amount) AS TOTAL_AMT
FROM orders
WHERE  order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY customer_id
HAVING
  customer_id=1 
SELECT
  customer_id,
  SUM(amount) AS TOTAL_AMT
FROM orders
WHERE
  order_date BETWEEN '2017-10-01' AND '2017-10-31' AND
  customer_id=1
GROUP BY customer_id

Derived Table Merge Optimization

Background

Users of "big" database systems are used to using FROM subqueries as a way to structure their queries. For example, if one's first thought was to select cities with population greater than 10,000 people, and then that from these cities to select those that are located in Germany, one could write this SQL:

For MySQL, using such syntax was taboo. If you run EXPLAIN for this query, you can see why:

It plans to do the following actions:

derived-inefficent

From left to right:

  1. Execute the subquery: (SELECT * FROM City WHERE Population > 1*1000), exactly as it was written in the query.

  2. Put result of the subquery into a temporary table.

  3. Read back, and apply a WHERE condition from the upper select, big_city.Country='DEU'

Executing a subquery like this is very inefficient, because the highly-selective condition from the parent select, (Country='DEU') is not used when scanning the base table City. We read too many records from theCity table, and then we have to write them into a temporary table and read them back again, before finally filtering them out.

Derived table merge in action

If one runs this query in MariaDB/MySQL 5.6, they get this:

From the above, one can see that:

  1. The output has only one line. This means that the subquery has been merged into the top-level SELECT.

  2. Table City is accessed through an index on the Country column. Apparently, the Country='DEU' condition was used to construct ref access on the table.

Factsheet

  • Derived tables (subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same as requirements for VIEWs to allow algorithm=merge.

  • The optimization is enabled by default. It can be disabled with:

  • Versions of MySQL and MariaDB which do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. ) of EXPLAIN statements taking a very long time. Starting from + and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.

See Also

  • FAQ entry:

This page is licensed: CC BY-SA / Gnu FDL

SELECT * 
FROM 
  (SELECT * FROM City WHERE Population > 10*1000) AS big_city
WHERE 
  big_city.Country='DEU'
mysql> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
  AS big_city WHERE big_city.Country='DEU' ;
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 4068 | Using where |
|  2 | DERIVED     | City       | ALL  | Population    | NULL | NULL    | NULL | 4079 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.60 sec)

Optimizations for Derived Tables

Optimize derived tables in MariaDB Server queries. This section provides techniques and strategies to improve the performance of subqueries and complex joins, enhancing overall query efficiency.

The query will read about 90 rows, which is a big improvement over the 4079 row reads plus 4068 temporary table reads/writes we had before.
MySQL Bug #44802

Split Materialized Optimization

This is another name for Lateral Derived Optimization.

This page is licensed: CC BY-SA / Gnu FDL

MariaDB [world]> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
  AS big_city WHERE big_city.Country='DEU';
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys      | key     | key_len | ref   | rows | Extra                              |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
|  1 | SIMPLE      | City  | ref  | Population,Country | Country | 3       | const |   90 | Using index condition; Using where |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)
SET @@optimizer_switch='derived_merge=OFF'

Derived Table with Key Optimization

The idea

If a derived table cannot be merged into its parent SELECT, it will be materialized in a temporary table, and then parent select will treat it as a regular base table.

Before /MySQL 5.6, the temporary table would never have any indexes, and the only way to read records from it would be a full table scan. Starting from the mentioned versions of the server, the optimizer has an option to create an index and use it for joins with other tables.

Example

Consider a query: we want to find countries in Europe, that have more than one million people living in cities. This is accomplished with this query:

The EXPLAIN output for it will show:

One can see here that

  • table <derived2> is accessed through key0.

  • ref column shows world.Country.Code

  • if we look that up in the original query, we find the equality that was used to construct ref access: Country.Code=cities_in_country.Country

Factsheet

  • The idea of "derived table with key" optimization is to let the materialized derived table have one key which is used for joins with other tables.

  • The optimization is applied then the derived table could not be merged into its parent SELECT

    • which happens when the derived table doesn't meet criteria for mergeable VIEW

  • The optimization is ON by default; it can be switched off like so:

See Also

  • in MySQL 5.6 manual

This page is licensed: CC BY-SA / Gnu FDL

.
Optimizing Subqueries in the FROM Clause
Subquery Optimizations Map
SELECT * 
FROM
   Country, 
   (SELECT 
       SUM(City.Population) AS urban_population, 
       City.Country 
    FROM City 
    GROUP BY City.Country 
    HAVING 
    urban_population > 1*1000*1000
   ) AS cities_in_country
WHERE 
  Country.Code=cities_in_country.Country AND Country.Continent='Europe';
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
| id | select_type | table      | type | possible_keys     | key       | key_len | ref                | rows | Extra                           |
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
|  1 | PRIMARY     | Country    | ref  | PRIMARY,continent | continent | 17      | const              |   60 | Using index condition           |
|  1 | PRIMARY     | <derived2> | ref  | key0              | key0      | 3       | world.Country.Code |   17 |                                 |
|  2 | DERIVED     | City       | ALL  | NULL              | NULL      | NULL    | NULL               | 4079 | Using temporary; Using filesort |
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
SET optimizer_switch='derived_with_keys=off'

Lateral Derived Optimization

MariaDB supports the Lateral Derived optimization, also referred to as "Split Grouping Optimization" or "Split Materialized Optimization" in some sources.

Description

The optimization's use case is

  • The query uses a derived table (or a VIEW, or a non-recursive CTE)

  • The derived table/View/CTE has a GROUP BY operation as its top-level operation

  • The query only needs data from a few GROUP BY groups

An example of this: consider a VIEW that computes totals for each customer in October:

And a query that does a join with the customer table to get October totals for "Customer#1" and Customer#2:

Before Lateral Derived optimization, MariaDB would execute the query as follows:

  1. Materialize the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.

  2. Join it with table customer.

The EXPLAIN would look like so:

It is obvious that Step #1 is very inefficient: we compute totals for all customers in the database, while we will only need them for two customers. (If there are 1000 customers, we are doing 500x more work than needed here)

Lateral Derived optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables. This allows pushing the equality customer.customer_id=OCT_TOTALS.customer_id down into the derived table/view, where it can be used to limit the computation to compute totals only for the customer of interest.

The query plan will look as follows:

  1. Scan table customer and find customer_id for Customer#1 and Customer#2.

  2. For each customer_id, compute the October totals, for this specific customer.

The EXPLAIN output will look like so:

Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.

In EXPLAIN FORMAT=JSON output, the optimization is shown like so:

Note the "lateral": 1 member.

Controlling the Optimization

Lateral Derived is enabled by default. The optimizer will make a cost-based decision whether the optimization should be used.

If you need to disable the optimization, it has an flag. It can be disabled like so:

From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

For example, by default, this table and query makes use of the optimization:

CREATE TABLE t1 ( n1 INT(10) NOT NULL, n2 INT(10) NOT NULL, c1 CHAR(1) NOT NULL, KEY c1 (c1), KEY n1_c1_n2 (n1,c1,n2) ) ENGINE=innodb CHARSET=latin1;

INSERT INTO t1 VALUES (0, 2, 'a'), (1, 3, 'a');

INSERT INTO t1 SELECT seq+1,seq+2,'c' FROM seq_1_to_1000;

ANALYZE TABLE t1;

EXPLAIN SELECT t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G

References

  • Jira task:

  • Commit:

This page is licensed: CC BY-SA / Gnu FDL

The optimization can be disabled as follows:

EXPLAIN SELECT /*+ NO_SPLIT_MATERIALIZED(t) */ t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G

No optimizer hint is available.

optimizer_switch
SPLIT_MATERLIZED or NO_SPLIT_MATERIALIZED
MDEV-13369
b14e2b044b
CREATE VIEW OCT_TOTALS AS
SELECT
  customer_id,
  SUM(amount) AS TOTAL_AMT
FROM orders
WHERE
  order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY
  customer_id;
SELECT *
FROM
  customer, OCT_TOTALS
WHERE
  customer.customer_id=OCT_TOTALS.customer_id AND
  customer.customer_name IN ('Customer#1', 'Customer#2')
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key       | key_len | ref                       | rows  | Extra                    |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
|    1 | PRIMARY     | customer   | range | PRIMARY,name  | name      | 103     | NULL                      | 2     | Using where; Using index |
|    1 | PRIMARY     | <derived2> | ref   | key0          | key0      | 4       | test.customer.customer_id | 36    |                          |
|    2 | DERIVED     | orders     | index | NULL          | o_cust_id | 4       | NULL                      | 36738 | Using where              |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
| id   | select_type     | table      | type  | possible_keys | key       | key_len | ref                       | rows | Extra                    |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
|    1 | PRIMARY         | customer   | range | PRIMARY,name  | name      | 103     | NULL                      | 2    | Using where; Using index |
|    1 | PRIMARY         | <derived2> | ref   | key0          | key0      | 4       | test.customer.customer_id | 2    |                          |
|    2 | LATERAL DERIVED | orders     | ref   | o_cust_id     | o_cust_id | 4       | test.customer.customer_id | 1    | Using where              |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
...
        "table": {
          "table_name": "<derived2>",
          "access_type": "ref",
...
          "materialized": {
            "lateral": 1,
SET optimizer_switch='split_materialized=off'
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
         type: ref
possible_keys: c1,n1_c1_n2
          key: c1
      key_len: 1
          ref: const
         rows: 2
        Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ref
possible_keys: key0
          key: key0
      key_len: 8
          ref: test.t1.n1,test.t1.n2
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: LATERAL DERIVED
        table: t1
         type: ref
possible_keys: c1,n1_c1_n2
          key: n1_c1_n2
      key_len: 4
          ref: test.t1.n1
         rows: 1
        Extra: Using where; Using index
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
         type: ref
possible_keys: c1,n1_c1_n2
          key: c1
      key_len: 1
          ref: const
         rows: 2
        Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ref
possible_keys: key0
          key: key0
      key_len: 8
          ref: test.t1.n1,test.t1.n2
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: t1
         type: ref
possible_keys: c1
          key: c1
      key_len: 1
          ref: const
         rows: 2
        Extra: Using index condition; Using where; Using temporary; Using filesort
Why is ORDER BY in a FROM subquery ignored?
MariaDB 10.3
MariaDB 10.4
MariaDB 5.3
MariaDB 5.3
What is MariaDB 5.3