ANALYZE statement
MariaDB starting with 10.0.0
The ANALYZE statement
command is a new feature in MariaDB 10.1.0.
Contents
Command description
The ANALYZE statement
command is similar to EXPLAIN statement
. ANALYZE
statement
will invoke the optimizer, execute the statement, and then produce EXPLAIN
output, annotated with statistics from statement execution.
This lets one check how optimizer's estimates about the query plan are close to the reality.
The syntax is
ANALYZE explainable_statement;
where the statement is any statement for which one can run EXPLAIN.
Command output
Consider an example:
MariaDB> analyze select * from tbl1 where key1 between 10 and 200 and col1 like 'foo%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl1 type: range possible_keys: key1 key: key1 key_len: 5 ref: NULL rows: 181 r_rows: 181 filtered: 100.00 r_filtered: 10.50 Extra: Using index condition; Using where
Compared to EXPLAIN
, ANALYZE
produces two extra columns:
- r_rows is an observation-based counterpart of rows column. It shows how many rows were actually read from the table.
- r_filtered is an observation-based counterpart of filtered column. It shows which fraction of rows was left after applying the WHERE condition.
Interpreting the output
Joins
Let's consider a more complicated example.
analyze select * from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_acctbal < 0 and orders.o_totalprice > 200*1000
+----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+ | 1 | SIMPLE | customer | ALL | PRIMARY,... | NULL | NULL | NULL | 149095 | 150000 | 18.08 | 9.13 | Using where | | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | 10 | 100.00 | 30.03 | Using where | +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+
Here, one can see that
- For table customer, customer.rows=149095, customer.r_rows=150000. The estimate for number of rows we will read was fairly precise
- customer.filtered=18.08, customer.r_filtered=9.13. The optimizer somewhat overestimated the number of records that will match selectivity of condition attached to `customer` table. (in general, when you have a full scan and r_filtered is less than 15%, it's time to consider adding an appropriate index)
- For table orders, orders.rows=7, orders.r_rows=10. This means that on average, there are 7 orders for a given c_custkey, but in our case there were 10, which is close to the expectation (when this number is consistently far from the expectation, maybe it's time to run ANALYZE TABLE, or even edit the table statistics manually to get better query plans)
- orders.filtered=100, orders.r_filtered=30.03. The optimizer didn't have any way to estimate which fraction of records will be left after it check the condition that is attached to table orders (it's orders.o_totalprice > 200*1000). So, it used 100%. In reality, it is 30%. 30% is typically not selective enough to warrant adding new indexes. For joins with many tables, it might be worth to collect and use column statistics for columns in question, this may help the optimizer to pick a better query plan.
Meaning of NULL in r_rows and r_filtered
Let's modify the previous example slightly
analyze select * from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_acctbal < -0 and customer.c_comment like '%foo%' orders.o_totalprice > 200*1000
+----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+ | 1 | SIMPLE | customer | ALL | PRIMARY,... | NULL | NULL | NULL | 149095 | 150000 | 18.08 | 0.00 | Using where | | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | NULL | 100.00 | NULL | Using where | +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+
Here, one can see that orders.r_rows=NULL and orders.r_filtered=NULL. This means that table orders was not scanned even once. Indeed, we can also see customer.r_filtered=0.00. This shows that a part of WHERE attached to table `customer` was never satisfied (or, satisfied in less than 0.01% of cases).
Notes
ANALYZE UPDATE
orANALYZE DELETE
will actually make updates/deletes.- PostgreSQL has a similar command,
EXPLAIN ANALYZE
.
See also
- the Jira task is: https://mariadb.atlassian.net/browse/MDEV-406