optimizer_join_limit_pref_ratio optimization
Contents
Basics
This optimization makes the join optimizer to specifically consider a join order that can short-cut its execution based on ORDER BY ... LIMIT n
clause. For small values of n
, using such join order can cause speedups.
The optimization is not enabled by default.
One needs to set optimizer_join_limit_pref_ratio
to enable it. The default value of 0 means disable, suggested value when enabling is 100.
Detailed description
Problem setting
MariaDB optimizer picks the join order without taking into account the possibility to short-cut join execution due to ORDER BY ... LIMIT.
For example, consider a query looking at latest 10 orders together with customers who made them:
select * from customer,order where customer.name=order.customer_name order by order.date desc limit 10
The two possible plans are:
customer->orders
:
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+ | 1 | SIMPLE | customer | ALL | name | NULL | NULL | NULL | 9623 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | orders | ref | customer_name | customer_name | 103 | customer.name | 1 | | +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
and orders->customer
:
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+ | 1 | SIMPLE | orders | index | customer_name | order_date | 4 | NULL | 10 | Using where | | 1 | SIMPLE | customer | ref | name | name | 103 | orders.customer_name | 1 | | +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
The customer->orders
plan computes a join between all customers and orders, saves its result into a temporary table and then uses filesort to get the 10 most recent orders. It doesn't benefit much from the fact that just 10 orders are needed.
The orders->customers
plan uses an index to read rows in the ORDER BY order. It can stop execution as soon as it has found 10 order-and-customer
combinations. This is much faster than computing the entire join. Let's call this *LIMIT short-cutting*.
Does MariaDB optimizer take *LIMIT short-cutting* into account when choosing which join order to use? No.
(It does take Using temporary
into account which increases the cost of customer->orders
plan but not the LIMIT short-cutting).
Plans with LIMIT short-cut are difficult to estimate
The reason is that it is although LIMIT short cutting is beneficial, it is fundamentally difficult to produce a reliable estimate for it.
Let's take example from the previous section but search only for orders that were shipped by air:
select * from customer,order where customer.name=order.customer_name and order.shipping_method='Airplane' order by order.date desc limit 10
Suppose, we know that 50% of orders are shipped by air.
Assuming there's no correlation between date and shipping method, orders->customer
plan will need to scan 20 orders before we find 10 that are shipped by air.
But if there is correlation, we may need to scan up to (total_orders*0.5 + 10)
before we
find first 10 orders that are shipped by air. Scanning about 50% of all orders can be much more expensive.
The situation gets worse if the query has constructs whose selectivity is not known. Suppose the WHERE condition was
<<sql>>
order.shipping_method='%Airplane%'
<<sql>>
in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.
2. Optimization is user-controlled
Due to these inherent risks described above, the optimization is not enabled by default.
However if one is running mostly OLTP workload so that their WHERE conditions have suitable indexes or not very selective, then ORDER BY LIMIT queries will typically find matching rows quickly. In this case it makes sense to give the following guidance to the optimizer:
<<quote>> Consider the query plan using LIMIT short-cutting and prefer it if it promises X times speedup. <</quote>>
The value of X
is given to the optimizer via optimizer_join_limit_pref_ratio
setting.
Higher values carry less risk. The recommended value is 100 (prefer LIMIT join order if it
promises at least 100x speedup).