optimizer_join_limit_pref_ratio optimization
Basics
This optimization makes the join optimizer to specifically consider a join order that can short-cut its execution based on ORDER BY ... LIMIT n clause. For small values of n
, using such join order can cause speedups.
The optimization is not enabled by default. One needs to set optimizer_join_limit_pref_ratio
to enable it. The default value of 0 means disable, suggested value when enabling is 100.
Detailed description
Problem setting
MariaDB optimizer picks the join order without taking into account the possibility to short-cut join execution due to ORDER BY ... LIMIT.
For example, consider a query looking at latest 10 orders together with customers who made them:
select * from customer,order where customer.name=order.customer_name order by order.date desc limit 10
The two possible plans are:
customer->orders
:
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+ | 1 | SIMPLE | customer | ALL | name | NULL | NULL | NULL | 9623 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | orders | ref | customer_name | customer_name | 103 | customer.name | 1 | | +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
and orders->customer
:
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+ | 1 | SIMPLE | orders | index | customer_name | order_date | 4 | NULL | 10 | Using where | | 1 | SIMPLE | customer | ref | name | name | 103 | orders.customer_name | 1 | | +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
orders->customers
is probably much more efficient as the optimizer can stop computing the join as soon as it has found 10 last orders.
Does MariaDB optimizer take this into account when choosing which join order to use? No. (It does take "Using temporary" into account which makes customer->orders plan more expensive but that increase may or may not make customer->orders more expensive)
Inherent risks in plans using short-cutting
Plans that rely on short-cutting suffer from the problem of unknown selectivity. Let's modify the query to search for orders shipped by airplane:
select * from customer,order where customer.name=order.customer_name and order.shipping_method='Airplane' order by order.date desc limit 10
Suppose, we know that 50% of orders are shipped by air. Assuming no correlation between date and shipping method, we will need to scan 20 orders before we find 10 that are shipped by air. But if there is a correlation, we may need to scan up to 50% of all orders before we find 10 orders that are shipped by air.
Moreover, queries may have conditions whose selectivity is not known:
<<sql>>
order.shipping_method='%Airplane%'
<<sql>>
if this condition filters out a lot of rows, it will increase the number of records one needs to enumerate before producing #LIMIT rows.