optimizer_join_limit_pref_ratio optimization
Contents
Basics
This optimization makes the join optimizer to specifically consider a join order that can short-cut its execution based on ORDER BY ... LIMIT n
clause. For small values of n
, using such join order can cause speedups.
The optimization is not enabled by default. One needs to set optimizer_join_limit_pref_ratio
to enable it. The default value of 0 means disable, suggested value when enabling is 100.
Detailed description
Problem setting
MariaDB optimizer picks the join order without taking into account the possibility to short-cut join execution due to ORDER BY ... LIMIT.
For example, consider a query looking at latest 10 orders together with customers who made them:
select * from customer,order where customer.name=order.customer_name order by order.date desc limit 10
The two possible plans are:
customer->orders
:
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+ | 1 | SIMPLE | customer | ALL | name | NULL | NULL | NULL | 9623 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | orders | ref | customer_name | customer_name | 103 | customer.name | 1 | | +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
and orders->customer
:
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+ | 1 | SIMPLE | orders | index | customer_name | order_date | 4 | NULL | 10 | Using where | | 1 | SIMPLE | customer | ref | name | name | 103 | orders.customer_name | 1 | | +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
The customer->orders
plan computes a join between all customers and orders, saves its result into a temporary table and then uses filesort to get the 10 most recent orders. It doesn't benefit much from the fact that just 10 orders are needed.
The orders->customers
plan uses an index to read rows in the ORDER BY order. It can stop execution as soon as it has found 10 order-and-customer
combinations. This is much faster than computing the entire join. Let's call this *LIMIT short-cutting*.
Does MariaDB optimizer take *LIMIT short-cutting* into account when choosing which join order to use? No.
(It does take Using temporary
into account which increases the cost of customer->orders
plan but not the LIMIT short-cutting).
Plans with LIMIT short-cut are difficult to estimate
The reason is that it is although LIMIT short cutting is beneficial, it is fundamentally difficult to produce a reliable estimate for it.
Let's take example from the previous section but search only for orders that were shipped by air:
select * from customer,order where customer.name=order.customer_name and order.shipping_method='Airplane' order by order.date desc limit 10
Suppose, we know that 50% of orders are shipped by air.
Assuming there's no correlation between date and shipping method, orders->customer
plan will need to scan 20 orders before we find 10 that are shipped by air.
But if there is correlation, we may need to scan up to (total_orders*0.5 + 10)
before we
find first 10 orders that are shipped by air. Scanning about 50% of all orders can be much more expensive.
The situation gets worse if the query has constructs whose selectivity is not known. Suppose the WHERE condition was
<<sql>>
order.shipping_method='%Airplane%'
<<sql>>
in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.