1 of 10

Subquery Optimizations

Optimize subqueries in MariaDB Server for improved performance. This section provides techniques and best practices to ensure your nested queries execute efficiently and enhance overall query speed.

Condition Pushdown Into IN subqueries

This article describes Condition Pushdown into IN subqueries as implemented in MDEV-12387.

optimizer_switch flag name: condition_pushdown_for_subquery.

_{This page is licensed: CC BY-SA / Gnu FDL}

Conversion of Big IN Predicates Into Subqueries

Starting from , the optimizer converts certain big IN predicates into IN subqueries.

That is, an IN predicate in the form

is converted into an equivalent IN-subquery:

which opens new opportunities for the query optimizer.

The conversion happens if the following conditions are met:

the IN list has more than 1000 elements (One can control it through the in_predicate_conversion_threshold parameter).
the [NOT] IN condition is at the top level of the WHERE/ON clause.

Controlling the Optimization

The optimization is on by default. (and debug builds prior to that) introduced the variable. Set to 0 to disable the optimization.

Benefits of the Optimization

If column is a key-prefix, MariaDB optimizer will process the condition

by trying to construct a range access. If the list is large, the analysis may take a lot of memory and CPU time. The problem gets worse when column is a part of a multi-column index and the query has conditions on other parts of the index.

Conversion of IN predicates into subqueries bypass the range analysis, which means the query optimization phase will use less CPU and memory.

Possible disadvantages of the conversion are are:

The optimization may convert 'IN LIST elements' key accesses to a table scan (if there is no other usable index for the table)
The estimates for the number of rows matching the IN (...) are less precise.

Links

_{This page is licensed: CC BY-SA / Gnu FDL}

EXISTS-to-IN Optimization

MySQL (including MySQL 5.6) has only one execution strategy for EXISTS subqueries. The strategy is essentially the straightforward, "naive" execution, without any rewrites.

MariaDB 5.3 introduced a rich set of optimizations for IN subqueries. Since then, it makes sense to convert an EXISTS subquery into an IN so that the new optimizations can be used.

EXISTS will be converted into IN in two cases:

Trivially correlated EXISTS subqueries
Semi-join EXISTS

We will now describe these two cases in detail

Trivially-correlated EXISTS subqueries

Often, EXISTS subquery is correlated, but the correlation is trivial. The subquery has form

and "outer_col" is the only place where the subquery refers to outside fields. In this case, the subquery can be re-written into uncorrelated IN:

(NULL values require some special handling, see below). For uncorrelated IN subqueries, MariaDB is able a cost-based choice between two execution strategies:

(basically, convert back into EXISTS)

That is, converting trivially-correlated EXISTS into uncorrelated IN gives query optimizer an option to use Materialization strategy for the subquery.

Currently, EXISTS->IN conversion works only for subqueries that are at top level of the WHERE clause, or are under NOT operation which is directly at top level of the WHERE clause.

Semi-join EXISTS subqueries

If EXISTS subquery is an AND-part of the WHERE clause:

then it satisfies the main property of :

with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery

Semi-join optimizer offers a rich set of execution strategies for both correlated and uncorrelated subqueries. The set includes FirstMatch strategy which is an equivalent of how EXISTS suqueries are executed, so we do not lose any opportunities when converting an EXISTS subquery into a semi-join.

In theory, it makes sense to convert all kinds of EXISTS subqueries: convert both correlated and uncorrelated ones, convert irrespectively of whether the subquery has inner=outer equality.

In practice, the subquery will be converted only if it has inner=outer equality. Both correlated and uncorrelated subqueries are converted.

Handling of NULL values

TODO: rephrase this:

IN has complicated NULL-semantics. NOT EXISTS doesn't.
EXISTS-to-IN adds IS NOT NULL before the subquery predicate, when required

Control

The optimization is controlled by the exists_to_in flag in . Before , the optimization was OFF by default. Since , it has been ON by default.

Limitations

EXISTS-to-IN doesn't handle

subqueries that have GROUP BY, aggregate functions, or HAVING clause
subqueries are UNIONs
a number of degenerate edge cases

_{This page is licensed: CC BY-SA / Gnu FDL}

Non-semi-join Subquery Optimizations

Certain kinds of IN-subqueries cannot be flattened into . These subqueries can be both correlated or non-correlated. In order to provide consistent performance in all cases, MariaDB provides several alternative strategies for these types of subqueries. Whenever several strategies are possible, the optimizer chooses the optimal one based on cost estimates.

The two primary non-semi-join strategies are materialization (also called outside-in materialization), and in-to-exists transformation. Materialization is applicable only for non-correlated subqueries, while in-to-exist can be used both for correlated and non-correlated subqueries.

Applicability

An IN subquery cannot be flattened into a semi-join in the following cases. The examples below use the World database from the MariaDB regression test suite.

Optimizing GROUP BY and DISTINCT Clauses in Subqueries

A DISTINCT clause and a GROUP BY without a corresponding HAVING clause have no meaning in IN/ALL/ANY/SOME/EXISTS subqueries. The reason is that IN/ALL/ANY/SOME/EXISTS only check if an outer row satisfies some condition with respect to all or any row in the subquery result. Therefore is doesn't matter if the subquery has duplicate result rows or not - if some condition is true for some row of the subquery, this condition will be true for all duplicates of this row. Notice that GROUP BY without a corresponding HAVING clause is equivalent to a DISTINCT.

and later versions automatically remove DISTINCT and GROUP BY without HAVING if these clauses appear in an IN/ALL/ANY/SOME/EXISTS subquery. For instance:

SELECT * FROM t1
WHERE t1.a > ALL(SELECT DISTINCT b FROM t2 WHERE t2.c > 100)

is transformed to:

SELECT * FROM t1
WHERE t1.a > ALL(SELECT b FROM t2 WHERE t2.c > 100)

Removing these unnecessary clauses allows the optimizer to find more efficient query plans because it doesn't need to take care of post-processing the subquery result to satisfy DISTINCT / GROUP BY.

_{This page is licensed: CC BY-SA / Gnu FDL}

Semi-join Subquery Optimizations

MariaDB has a set of optimizations specifically targeted at semi-join subqueries.

What is a Semi-Join Subquery

A semi-join subquery has a form of

that is, the subquery is an IN-subquery and it is located in the WHERE clause. The most important part here is

with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery

Let's see why this is important. Consider a semi-join subquery:

One can execute it "naturally", by starting from countries in Europe and checking if they have populous Cities:

The semi-join property also allows "backwards" execution: we can start from big cities, and check which countries they are in:

To contrast, let's change the subquery to be non-semi-join:

It is still possible to start from countries, and then check

if a country has any big cities
if it has a large surface area:

The opposite, city-to-country way is not possible. This is not a semi-join.

Difference from Inner Joins

Semi-join operations are similar to regular relational joins. There is a difference though: with semi-joins, you don't care how many matches an inner table has for an outer row. In the above countries-with-big-cities example, Germany will be returned once, even if it has three cities with populations of more than one million each.

Semi-Join Optimizations in MariaDB

MariaDB uses semi-join optimizations to run IN subqueries.The optimizations are enabled by default. You can disable them by turning off their like so:

MariaDB has five different semi-join execution strategies:

Subquery Cache

The goal of the subquery cache is to optimize the evaluation of correlated subqueries by storing results together with correlation parameters in a cache and avoiding re-execution of the subquery in cases where the result is already in the cache.

Administration

The cache is on by default. One can switch it off using the optimizer_switch subquery_cache setting, like so:

The efficiency of the subquery cache is visible in 2 statistical variables:

- Global counter for all subquery cache hits.
- Global counter for all subquery cache misses.

The session variables and influence the size of in-memory temporary tables in the table used for caching. It cannot grow more than the minimum of the above variables values (see the section for details).

Visibility

Your usage of the cache is visible in EXTENDED EXPLAIN output (warnings) as"<expr_cache><//list of parameters//>(//cached expression//)". For example:

In the example above the presence of"<expr_cache><test.t1.a>(...)" is how you know you are using the subquery cache.

Implementation

Every subquery cache creates a temporary table where the results and all parameters are stored. It has a unique index over all parameters. First the cache is created in a table (if doing this is impossible the cache becomes disabled for that expression). When the table grows up to the minimum oftmp_table_size and max_heap_table_size, the hit rate will be checked:

if the hit rate is really small (<0.2) the cache will be disabled.
if the hit rate is moderate (<0.7) the table will be cleaned (all records deleted) to keep the table in memory
if the hit rate is high the table will be converted to a disk table (for 5.3.0 it can only be converted to a disk table).

Performance Impact

Here are some examples that show the performance impact of the subquery cache (these tests were made on a 2.53 GHz Intel Core 2 Duo MacBook Pro with dbt-3 scale 1 data set).

example

cache on

cache off

gain

hit

miss

hit rate

Example 1

Dataset from DBT-3 benchmark, a query to find customers with balance near top in their nation:

Example 2

DBT-3 benchmark, Query #17

Example 3

DBT-3 benchmark, Query #2

Example 4

DBT-3 benchmark, Query #20

Subquery Optimizations Map

Below is a map showing all types of subqueries allowed in the SQL language, and the optimizer strategies available to handle them.

Uncolored areas represent different kinds of subqueries, for example:
- Subqueries that have form x IN (SELECT ...)
- Subqueries that are in the FROM

Table Pullout Optimization

Table pullout is an optimization for Semi-join subqueries.

The idea of Table Pullout

Sometimes, a subquery can be re-written as a join. For example:

If we know that there can be, at most, one country with a given value of Country.Code (we can tell that if we see that table Country has a primary key or unique index over that column), we can re-write this query as:

Table pullout in action

If one runs for the above query in MySQL 5.1-5.6 or -5.2, they'll get this plan:

It shows that the optimizer is going to do a full scan on table City, and for each city it will do a lookup in table Country.

If one runs the same query in , they will get this plan:

The interesting parts are:

Both tables have select_type=PRIMARY, and id=1 as if they were in one join.
The Country table is first, followed by the City table.

Indeed, if one runs EXPLAIN EXTENDED; SHOW WARNINGS, they will see that the subquery is gone and it was replaced with a join:

Changing the subquery into a join allows feeding the join to the join optimizer, which can make a choice between two possible join orders:

City -> Country
Country -> City

as opposed to the single choice of

City->Country

which we had before the optimization.

In the above example, the choice produces a better query plan. Without pullout, the query plan with a subquery would read (4079 + 1*4079)=8158 table records. With table pullout, the join plan would read (37 + 37 * 18) = 703 rows. Not all row reads are equal, but generally, reading 10 times fewer table records is faster.

Table pullout fact sheet

Table pullout is possible only in semi-join subqueries.
Table pullout is based on UNIQUE/PRIMARY key definitions.
Doing table pullout does not cut off any possible query plans, so MariaDB will always try to pull out as much as possible.

Controlling table pullout

There is no separate @@optimizer_switch flag for table pullout. Table pullout can be disabled by switching off all semi-join optimizations withSET @@optimizer_switch='semijoin=off' command.

_{This page is licensed: CC BY-SA / Gnu FDL}

Table Pullout Optimization

Table pullout is an optimization for Semi-join subqueries.

The idea of Table Pullout

Sometimes, a subquery can be re-written as a join. For example:

SELECT *
FROM City 
WHERE City.Country IN (SELECT Country.Code
                       FROM Country 
                       WHERE Country.Population < 100*1000);

SELECT City.* 
FROM 
  City, Country 
WHERE
 City.Country=Country.Code AND Country.Population < 100*1000;

Table pullout in action

If one runs for the above query in MySQL 5.1-5.6 or -5.2, they'll get this plan:

It shows that the optimizer is going to do a full scan on table City, and for each city it will do a lookup in table Country.

If one runs the same query in , they will get this plan:

The interesting parts are:

Both tables have select_type=PRIMARY, and id=1 as if they were in one join.
The Country table is first, followed by the City table.

Indeed, if one runs EXPLAIN EXTENDED; SHOW WARNINGS, they will see that the subquery is gone and it was replaced with a join:

Changing the subquery into a join allows feeding the join to the join optimizer, which can make a choice between two possible join orders:

City -> Country
Country -> City

as opposed to the single choice of

City->Country

which we had before the optimization.

Table pullout fact sheet

Table pullout is possible only in semi-join subqueries.
Table pullout is based on UNIQUE/PRIMARY key definitions.
Doing table pullout does not cut off any possible query plans, so MariaDB will always try to pull out as much as possible.

Controlling table pullout

There is no separate @@optimizer_switch flag for table pullout. Table pullout can be disabled by switching off all semi-join optimizations withSET @@optimizer_switch='semijoin=off' command.

_{This page is licensed: CC BY-SA / Gnu FDL}

EXISTS-to-IN Optimization

MySQL (including MySQL 5.6) has only one execution strategy for EXISTS subqueries. The strategy is essentially the straightforward, "naive" execution, without any rewrites.

MariaDB 5.3 introduced a rich set of optimizations for IN subqueries. Since then, it makes sense to convert an EXISTS subquery into an IN so that the new optimizations can be used.

EXISTS will be converted into IN in two cases:

Trivially correlated EXISTS subqueries
Semi-join EXISTS

We will now describe these two cases in detail

Trivially-correlated EXISTS subqueries

Often, EXISTS subquery is correlated, but the correlation is trivial. The subquery has form

and "outer_col" is the only place where the subquery refers to outside fields. In this case, the subquery can be re-written into uncorrelated IN:

(NULL values require some special handling, see below). For uncorrelated IN subqueries, MariaDB is able a cost-based choice between two execution strategies:

(basically, convert back into EXISTS)

That is, converting trivially-correlated EXISTS into uncorrelated IN gives query optimizer an option to use Materialization strategy for the subquery.

Currently, EXISTS->IN conversion works only for subqueries that are at top level of the WHERE clause, or are under NOT operation which is directly at top level of the WHERE clause.

Semi-join EXISTS subqueries

If EXISTS subquery is an AND-part of the WHERE clause:

then it satisfies the main property of :

with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery

In theory, it makes sense to convert all kinds of EXISTS subqueries: convert both correlated and uncorrelated ones, convert irrespectively of whether the subquery has inner=outer equality.

In practice, the subquery will be converted only if it has inner=outer equality. Both correlated and uncorrelated subqueries are converted.

Handling of NULL values

TODO: rephrase this:

IN has complicated NULL-semantics. NOT EXISTS doesn't.
EXISTS-to-IN adds IS NOT NULL before the subquery predicate, when required

Control

The optimization is controlled by the exists_to_in flag in . Before , the optimization was OFF by default. Since , it has been ON by default.

Limitations

EXISTS-to-IN doesn't handle

subqueries that have GROUP BY, aggregate functions, or HAVING clause
subqueries are UNIONs
a number of degenerate edge cases

_{This page is licensed: CC BY-SA / Gnu FDL}

Subquery Optimizations

Condition Pushdown Into IN subqueries

Conversion of Big IN Predicates Into Subqueries

Controlling the Optimization

Benefits of the Optimization

See Also

Links

EXISTS-to-IN Optimization

Trivially-correlated EXISTS subqueries

Semi-join EXISTS subqueries

Handling of NULL values

Control

Limitations

Non-semi-join Subquery Optimizations

Applicability

Optimizing GROUP BY and DISTINCT Clauses in Subqueries

Semi-join Subquery Optimizations

What is a Semi-Join Subquery

Difference from Inner Joins

Semi-Join Optimizations in MariaDB

See Also

Subquery Cache

Administration

Visibility

Implementation

Performance Impact

Example 1

Example 2

Example 3

Example 4

See Also

Subquery Optimizations Map

Table Pullout Optimization

The idea of Table Pullout

Table pullout in action

Table pullout fact sheet

Controlling table pullout

Subquery Optimizations

Conversion of Big IN Predicates Into Subqueries

Controlling the Optimization

Benefits of the Optimization

See Also

Links

Condition Pushdown Into IN subqueries

Optimizing GROUP BY and DISTINCT Clauses in Subqueries

Subquery Optimizations Map

Links to pages about individual optimizations:

See also

Non-semi-join Subquery Optimizations

Applicability

Negated subquery predicate (NOT IN)

Subquery in the SELECT or HAVING clause

Subquery with a UNION

Materialization for non-correlated IN-subqueries

Materialization basics

NULL-aware efficient execution

Limitations

The IN-TO-EXISTS transformation

Performance discussion

Example speedup over MySQL 5.x and /5.2

Performance guidelines

Optimizer control

Semi-join Subquery Optimizations

What is a Semi-Join Subquery

Difference from Inner Joins

Semi-Join Optimizations in MariaDB

See Also

Table Pullout Optimization

The idea of Table Pullout

Table pullout in action

Table pullout fact sheet

Controlling table pullout

Subquery Cache

Administration

Visibility

Implementation

Performance Impact

Example 1

Example 2

Example 3