EXISTS-to-IN optimization

You are viewing an old version of this article. View the current version here.

EXISTS-to-IN optimization

MySQL (including MySQL 5.6) has only one execution strategy for EXISTS subqueries. The strategy is essentially the straightforward, "naive" execution, without any rewrites. MariaDB 5.3+ will try to reduce the number of subquery re-executions with [[subquery predicate cache]].

MariaDB 5.3+ introduced a rich set of optimizations for IN subuqeries. Often, it makes sense to convert an EXISTS subquery into IN so that it can take advantage of all those optimizations.

EXISTS-to-IN will do the conversion in two cases: 1. Trivially correlated EXISTS subqueries 2. Semi-join EXISTS.

Trivially-correlated EXISTS

Often, EXISTS subuqery is correlated, but the correlation is trivial. The subquery has form

EXISTS (SELECT ... FROM ... WHERE outer_col= inner_col AND inner_where)

and "outer_col" is the only place where the subquery refers to outside fields. In this case, the subquery can be re-written into uncorrelated IN:

outer_col IN (SELECT ... FROM ... WHERE inner_where)

(we ignore special cases with NULLs, for now).

For uncorrelated IN subqueries, MariaDB will make a cost-based choice between two execution strategies: - IN-to-EXISTS (basically, convert back into EXISTS) - Materialization

Conclusion: trivially-correlated EXISTS is converted into IN so that we get an option to use Materialization.

Limitation

Currently, EXISTS->IN conversion works only for subqueries that are at top level of the WHERE clause, or are under NOT operation, which is directly at top level of the WHERE clause.

Semi-join EXISTS

SELECT ... FROM ... WHERE EXISTS (SELECT ...)

looks similar to semi-join subqueries. It satisfies the main semi-join property that rows from the outer select that have no matches inside the subquery do not get into the query output.

Semi-join optimizer offers a rich set of execution strategies for both correlated and uncorrelated subqueries. The set includes FirstMatch strategy which is an equivalent of how EXISTS suqueries are executed, so we do not lose any opportunities when converting an EXISTS subqery into a semi-join.

In theory: it makes sense to convert all kinds of EXISTS subqueries: convert both correlated and uncorrelated ones, convert irrespectively of whether the subquery has inner=outer equality.

In practice: the subquery will be converted only if it has inner=outer equality. Both correlated and uncorrelated subqueries are converted.

Handling of NULL values

TODO: rephrase this:

  • IN has complicated NULL-semantics. NOT EXISTS doesn't.
  • EXISTS-to-IN adds IS NOT NULL before the subquery predicate, when required

Control

The optimization is controlled by exists_to_in flag in @@optimizer_switch. Currently, the optimization is OFF by default.

Limitations

EXISTS-to-IN doesn't handle

  • subqueries that have GROUP BY, aggregate functions, or HAVING clause
  • subqueries are UNIONs
  • a number of degenerate edge cases

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.