Optimizer Trace Guide
Contents
Optimizer trace uses the JSON format. It is basically a structured log file showing what actions were taken by the query optimizer.
A Basic Example
Let's take a simple query:
MariaDB> explain select * from t1 where a<10; +------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 10 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
One can see the full trace here. Taking only the component names, one gets:
MariaDB> select * from information_schema.optimizer_trace limit 1\G *************************** 1. row *************************** QUERY: select * from t1 where a<10 TRACE: { "steps": [ { "join_preparation": { ... } }, { "join_optimization": { "select_id": 1, "steps": [ { "condition_processing": { ... } }, { "table_dependencies": [ ... ] }, { "ref_optimizer_key_uses": [ ... ] }, { "rows_estimation": [ { "range_analysis": { "analyzing_range_alternatives" : { ... }, "chosen_range_access_summary": { ... }, }, "selectivity_for_indexes" : { ... }, "selectivity_for_columns" : { ... } } ] }, { "considered_execution_plans": [ ... ] }, { "attaching_conditions_to_tables": { ... } } ] } }, { "join_execution": { ... } } ] }
Trace Structure
For each SELECT, there are two "Steps":
join_preparation
join_optimization
Join preparation shows early query rewrites. join_optmization
is where most of the query optimizations are done. They are:
condition_processing
- basic rewrites in WHERE/ON conditions.ref_optimizer_key_uses
- Construction of possible ways to do ref and eq_ref accesses.rows_estimation
- Consideration of range and index_merge accesses.considered_execution_plans
- Join optimization itself, that is, choice of the join order.attaching_conditions_to_tables
- Once the join order is fixed, parts of the WHERE clause are "attached" to tables to filter out rows as early as possible.
The above steps are for just one SELECT. If the query has subqueries, each SELECT will have these steps, and there will be extra steps/rewrites to handle the subquery construct itself.
Extracting Trace Components
If you are interested in some particular part of the trace, MariaDB has two functions that come in handy:
- JSON_EXTRACT extracts a part of JSON document
- JSON_DETAILED presents it in a user-readable way.
For example, the contents of the analyzing_range_alternatives
node can be extracted like so:
MariaDB> select JSON_DETAILED(JSON_EXTRACT(trace, '$**.analyzing_range_alternatives')) -> from INFORMATION_SCHEMA.OPTIMIZER_TRACE\G *************************** 1. row *************************** JSON_DETAILED(JSON_EXTRACT(trace, '$**.analyzing_range_alternatives')): [ { "range_scan_alternatives": [ { "index": "a_b_c", "ranges": [ "(1) <= (a,b) < (4,50)" ], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 4, "cost": 6.2509, "chosen": true } ], "analyzing_roworder_intersect": { "cause": "too few roworder scans" }, "analyzing_index_merge_union": [] } ]
Examples of various information in the trace
Basic rewrites
A lot of applications construct database query text on the fly, which sometimes means that the query has constructs that are repetitive or redundant. In most cases, the optimizer will be able to remove them. One can check the trace to be sure:
explain select * from t1 where not (col1 >= 3);
Optimizer trace will show:
"steps": [ { "join_preparation": { "select_id": 1, "steps": [ { "expanded_query": "select t1.a AS a,t1.b AS b,t1.col1 AS col1 from t1 where t1.col1 < 3" }
Here, one can see that NOT
was removed.
Similarly, one can also see that IN(...)
with one element is the same as equality:
explain select * from t1 where col1 in (1);
will show
"join_preparation": { "select_id": 1, "steps": [ { "expanded_query": "select t1.a AS a,t1.b AS b,t1.col1 AS col1 from t1 where t1.col1 = 1"
On the other hand, converting an UTF-8 column to UTF-8 is not removed:
explain select * from t1 where convert(utf8_col using utf8) = 'hello';
will show
"join_preparation": { "select_id": 1, "steps": [ { "expanded_query": "select t1.a AS a,t1.b AS b,t1.col1 AS col1,t1.utf8_col AS utf8_col from t1 where convert(t1.utf8_col using utf8) = 'hello'" }
so redundant CONVERT
calls should be used with caution.
Range optimizer - what ranges will be scanned
MySQL/MariaDB optimizer has a complex part called the Range Optimizer. This is a module that examines WHERE (and ON) clause and constructs index ranges that need to be scanned to answer the query. The rules for constructing the ranges are quite complex.
An example: Consider a table
create table some_events ( start_date date, end_date date, ... key (start_date, end_date) );
and a query:
mysql> explain select * from some_events where start_date >= '2019-09-10' and end_date <= '2019-09-14'; +------+-------------+-------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | some_events | ALL | start_date | NULL | NULL | NULL | 1000 | Using where | +------+-------------+-------------+------+---------------+------+---------+------+------+-------------+
One might think that the optimizer would be able to use the restrictions on both start_date and end_date to construct a narrow range to be scanned. But this is not so, one of the restrictions creates a left-endpoint range and the other one creates a right-endpoint range, hence they cannot be combined.
select JSON_DETAILED(JSON_EXTRACT(trace, '$**.analyzing_range_alternatives')) as trace from information_schema.optimizer_trace\G *************************** 1. row *************************** trace: [ { "range_scan_alternatives": [ { "index": "start_date", "ranges": [ "(2019-09-10,NULL) < (start_date,end_date)" ], ...
the potential range only uses one of the bounds.