optimizer_join_limit_pref_ratio optimization

You are viewing an old version of this article. View the current version here.

Basics

This optimization makes the join optimizer to specifically consider a join order that can short-cut its execution based on ORDER BY ... LIMIT n clause. For small values of n, using such join order can cause speedups.

The optimization is not enabled by default. One needs to set optimizer_join_limit_pref_ratio to enable it. The default value of 0 means disable, suggested value when enabling is 100.

Detailed description

Problem setting

MariaDB optimizer picks the join order without taking into account the possibility to short-cut join execution due to ORDER BY ... LIMIT.

For example, consider a query looking at latest 10 orders together with customers who made them:

select *
from
  customer,order
where
  customer.name=order.customer_name
order by
  order.date desc
limit 10

The two possible plans are: customer->orders:

+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
| id   | select_type | table    | type | possible_keys | key           | key_len | ref           | rows | Extra                                        |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
|    1 | SIMPLE      | customer | ALL  | name          | NULL          | NULL    | NULL          | 9623 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | orders   | ref  | customer_name | customer_name | 103     | customer.name | 1    |                                              |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+

and orders->customer:

+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
| id   | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows | Extra       |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
|    1 | SIMPLE      | orders   | index | customer_name | order_date | 4       | NULL                 | 10   | Using where |
|    1 | SIMPLE      | customer | ref   | name          | name       | 103     | orders.customer_name | 1    |             |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+

The customer->orders plan computes a join between all customers and orders, saves its result into a temporary table and then uses filesort to get the 10 most recent orders. It doesn't benefit much from the fact that just 10 orders are needed.

The orders->customers plan uses an index to read rows in the ORDER BY order. It can stop execution as soon as it has found 10 order-and-customer combinations. This is much faster than computing the entire join. Let's call this *LIMIT short-cutting*.

Does MariaDB optimizer take *LIMIT short-cutting* into account when choosing which join order to use? No.

(It does take Using temporary into account which increases the cost of customer->orders plan but not the LIMIT short-cutting).

Plans with LIMIT short-cut are difficult to estimate

The reason is that it is although LIMIT short cutting is beneficial, it is fundamentally difficult to produce a reliable estimate for it.

Let's take example from the previous section but search only for orders that were shipped by air:

select *
from
  customer,order
where
  customer.name=order.customer_name 
  and order.shipping_method='Airplane'
order by
  order.date desc
limit 10

Suppose, we know that 50% of orders are shipped by air. Assuming there's no correlation between date and shipping method, orders->customer plan will need to scan 20 orders before we find 10 that are shipped by air. But if there is correlation, we may need to scan up to (total_orders*0.5 + 10) before we find first 10 orders that are shipped by air. Scanning about 50% of all orders can be much more expensive.

The situation gets worse if the query has constructs whose selectivity is not known. Suppose the WHERE condition was

<<sql>> order.shipping_method='%Airplane%' <<sql>>

in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.

2. Optimization is user-controlled

Due to these inherent risks described above, the optimization is not enabled by default.

However if one is running mostly OLTP workload so that their WHERE conditions have suitable indexes or not very selective, then ORDER BY LIMIT queries will typically find matching rows quickly. In this case it makes sense to give the following guidance to the optimizer:

<<quote>>
Consider the query plan using LIMIT short-cutting and prefer it if it promises X times speedup.
<</quote>>

The value of X is given to the optimizer via optimizer_join_limit_pref_ratio setting. Higher values carry less risk. The recommended value is 100 (prefer LIMIT join order if it promises at least 100x speedup).

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.