optimizer_join_limit_pref_ratio optimization

Basics

Off (0) by default, when enabling this optimization, MariaDB will consider a join order that may shorten query execution time based on the ORDER BY ... LIMIT n clause. For small values of n, this may improve performance.

Set the value of optimizer_join_limit_pref_ratio to a non-zero value to enable this option (higher values are more conservative, recommended value is 100), or set to 0 (the default value) to disable it.

Detailed description

Problem setting

By default, the MariaDB optimizer picks a join order without considering the ORDER BY ... LIMIT clause, when present.

For example, consider a query looking at latest 10 orders together with customers who made them:

select *
from
  customer,order
where
  customer.name=order.customer_name
order by
  order.date desc
limit 10

The two possible plans are: customer->orders:

+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
| id   | select_type | table    | type | possible_keys | key           | key_len | ref           | rows | Extra                                        |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
|    1 | SIMPLE      | customer | ALL  | name          | NULL          | NULL    | NULL          | 9623 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | orders   | ref  | customer_name | customer_name | 103     | customer.name | 1    |                                              |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+

and orders->customer:

+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
| id   | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows | Extra       |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
|    1 | SIMPLE      | orders   | index | customer_name | order_date | 4       | NULL                 | 10   | Using where |
|    1 | SIMPLE      | customer | ref   | name          | name       | 103     | orders.customer_name | 1    |             |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+

The customer->orders plan computes a join between all customers and orders, saves that result into a temporary table, and then uses filesort to get the 10 most recent orders. This query plan doesn't benefit from the fact that just 10 orders are needed.

However, in contrast, the orders->customers plan uses an index to read rows in the ORDER BY order. The query can stop execution once it finds 10 order-and-customer combinations. This is much faster than computing the entire join. Under this plan, and when this new optimization, we can leverage ORDER BY ... LIMIT to stop early, when we have the 10 combinations.

Plans with LIMIT shortcuts are difficult to estimate

It is fundamentally difficult to produce a reliable estimate for ORDER BY ... LIMIT shortcuts. Let's take an example from the previous section to see why. This query searches for last 10 orders that were shipped by air:

select *
from
  customer,order
where
  customer.name=order.customer_name 
  and order.shipping_method='Airplane'
order by
  order.date desc
limit 10

Suppose we know beforehand that 50% of orders are shipped by air. Assuming there's no correlation between date and shipping method, orders->customer plan will need to scan 20 orders before we find 10 that are shipped by air. But if there is correlation, then we may need to scan up to (total_orders*0.5 + 10) before we find first 10 orders that are shipped by air. Scanning about 50% of all orders can be expensive.

This situation worsens when the query has constructs whose selectivity is not known. For example, suppose the WHERE condition was

  order.shipping_method='%Airplane%'

in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.

Providing guidelines to the optimizer

Due to these challenges, the optimization is not enabled by default.

When running a mostly OLTP workload such that query WHERE conditions have suitable indexes or are not very selective, then any ORDER BY ... LIMIT queries will typically find matching rows quickly. In this case, it makes sense to give the following guidance to the optimizer:

Do consider the query plan using LIMIT short-cutting 
and prefer it if it promises at least X times speedup.

The value of X is given to the optimizer via optimizer_join_limit_pref_ratio setting. Higher values carry less risk. The recommended value is 100: prefer the LIMIT join order if it promises at least 100x speedup.

References

  • MDEV-34720 introduces optimizer_join_limit_pref_ratio optimization
  • MDEV-18079 is about future development that would make the optimizer handle such cases without user guidance.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.