LATERAL Derived optimization
Starting from MariaDB 10.3, MariaDB has Lateral Derived optimization, also referred to as "Split Grouping Optimization" in some sources.
Description
The optimization's use case is
- The query uses a derived table (or a VIEW, or a non-recursive CTE)
- The derived table/View/CTE has a GROUP BY operation as its top-level operation
- The query only needs data from a few GROUP BY groups
An example of this: consider a VIEW that computes totals for each customer in October:
create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id;
Now, one can join the view with Customer table to get October totals for "Customer#1" and Customer#2:
select * from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_name IN ('Customer#1', 'Customer#2')
Before Lateral Derived optimization, MariaDB would execute the query as follows:
- Materialized the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.
- Join it with table customer.
It is obvious that Step #1 is very inefficient: we compute totals for all customers in the database, while we will only need them for two customers.
Lateral Derived optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables.
Then, the equality customer.customer_id=OCT_TOTALS.customer_id
, where it can be used to only compute totals for the customers of interest.
The EXPLAIN output will look like so:
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | | | 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
Note the line with id=2
: select_type is LATERAL DERIVED
. And table customer uses ref access referring to customer.customer_id
, which is normally not allowed for derived tables.
Controlling the optimization
Lateral Derived is enabled by default, the optimizer will make a cost-based decision whether the optimization should be used.
If you need to disable the optimization, it has an optimizer_switch
flag. It can be disabled like so:
set optimizer_switch='split_materialized=off'