Before , the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:
Storage engines provided poor statistics (this was fixed to some degree with the introduction of Persistent Statistics).
The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.
Engine-independent table statistics lift these limitations.
Statistics are stored in regular tables in the mysql database.
it is possible for a DBA to read and update the values.
More data is collected/used.
are a subset of engine-independent table statistics that can improve the query plan chosen by the optimizer in certain situations.
Statistics are stored in three tables, , and .
Use or update of data from these tables is controlled by variable. Possible values are listed below:
Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.
The statement can be used to collect table statistics. However, simply running ANALYZE TABLE table_name does not collect engine-independent (or histogram) statistics by default.
When the statement is executed, MariaDB makes a call to the table's storage engine, and the storage engine collects its own statistics for the table. The specific behavior depends on the storage engine. For the default storage engine, see for more information.
may also collect engine-independent statistics for the table. The specific behavior depends on the value of the system variable. Engine-independent statistics will only be collected if one of the following is true:
The system variable is set to complementary or preferably.
The statement includes the PERSISTENT FOR clause.
The system variable is set to preferably_for_queries by default. With this value, engine-independent statistics are used by default if available, but they are not collected by default. If you want to use engine-independent statistics with the default configuration, then you will have to collect them by executing the statement and by specifying the PERSISTENT FOR clause. It is recommended to collect engine-independent statistics on as-needed basis, so typically one will not have engine-independent statistics for all indexes/all columns.
When to collect statistics is very dependent on the dataset. If data changes frequently it may be necessary to collect statistics more frequently, and the benefits may be very noticeable (see ). If the data distribution is relatively static, the costs of collecting may outweigh any benefits.
The syntax for the statement has been extended with the PERSISTENT FOR clause. This clause allows one to collect engine-independent statistics only for particular columns or indexes. This clause also allows one to collect engine-independent statistics, regardless of the value of the system variable. For example:
Statistics for columns using the and data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.
Statistics are stored in three tables, , and .
It is possible to update statistics tables manually. One should modify the table(s) with regular // statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue command.
A few scenarios where one might need to update statistics tables manually:
Deleting the statistics. Currently, the command will collect the statistics, but there is no special command to delete statistics.
Running ANALYZE on a different server. To collect engine-independent statistics ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.
(mariadb.org blog)
This page is licensed: CC BY-SA / Gnu FDL
Utilize statistics to optimize queries in MariaDB Server. This section explains how the database uses statistical information to generate efficient query execution plans and improve performance.
'never'
The optimizer doesn't use data from statistics tables.
'complementary'
The optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine.
'preferably'
Prefer the data from statistics tables, if it's not available there, use the data from the storage engine.
'complementary_for_queries'
Same as complementary, but for queries only (to avoid needlessly collecting for ANALYZE TABLE).
'preferably_for_queries'
Same as preferably, but for queries only (to avoid needlessly collecting for ANALYZE TABLE). Default.
ANALYZE TABLE table_name PERSISTENT FOR ALL;-- update all engine-independent statistics for all columns and indexes
ANALYZE TABLE tbl PERSISTENT FOR ALL;
-- update specific columns and indexes:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...);
-- empty lists are allowed:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES ();
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...);
-- the following will only update mysql.table_stats fields:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES ();
-- when use_stat_tables is set to 'COMPLEMENTARY' or 'PREFERABLY',
-- a simple ANALYZE TABLE collects engine-independent statistics for all columns and indexes.
SET SESSION use_stat_tables='COMPLEMENTARY';
ANALYZE TABLE tbl;InnoDB statistics are stored on disk and are therefore persistent. Prior to , InnoDB statistics were not stored on disk, meaning that on server restarts the statistics would need to be recalculated, which is both needless computation, as well as leading to inconsistent query plans.
There are a number of variables that control persistent statistics:
innodb_stats_persistent - when set (the default) enables InnoDB persistent statistics.
innodb_stats_auto_recalc - when set (the default), persistent statistics are automatically recalculated when the table changes significantly (more than 10% of the rows)
- Number of index pages sampled (default 20) when estimating cardinality and statistics for indexed columns. Increasing this value will increases index statistics accuracy, but use more I/O resources when running .
These settings can be overwritten on a per-table basis by use of the , and clauses in a or statement.
Details of the statistics are stored in two system tables in the :
The statement can be used to recalculate InnoDB statistics.
The statement triggers a reload of the statistics.
MariaDB starting with
Prior to , and , also caused InnoDB statistics to be reloaded. From , and , this is no longer the case.
This page is licensed: CC BY-SA / Gnu FDL
Added extra logging to slow log of 'Thread_id, Schema, Query Cache hit, Rows sent and Rows examined'
Added optional logging to slow log, through log_slow_verbosity, of query plan statistics
Added new session variables log_slow_rate_limit, log_slow_verbosity, log_slow_filter
Added log-slow-file as synonym for 'slow-log-file', as most slow-log variables start with 'log-slow'
Added log-slow-time as synonym for long-query-time.
You can set the verbosity of what's logged to the slow query log by setting the variable to a combination of the following values:
All (From )
Enable all verbosity options.
Query_plan
Warnings (From )
Print all errors, warnings and notes related to statement, up to log_slow_max_warnings lines.
full.
The default value for log_slow_verbosity is ' ', to be compatible with MySQL 5.1.
The possible values for log_slow_verbosity areinnodb,query_plan,explain,engine,warnings. Multiple options are separated by ','. log_slow_verbosity is not supported when log_output='TABLE'.
In the future we will add more engine statistics and also support for other engines.
You can define which queries to log to the slow query log by setting the variable to a combination of the following values:
All (From )
Enable all filter options. log_slow_filter will be shown as having all options set.
admin
Multiple options are separated by ','. If you don't specify any options everything will be logged (same as setting the value to All.
The variable limits logging to the slow query log by not logging every query (only one query / log_slow_rate_limit is logged). This is mostly useful when debugging and you get too much information to the slow query log.
Note that in any case, only queries that takes longer than log_slow_time orlong_query_time' are logged (as before).
MariaDB starting with
If one enables the warning option for log_slow_verbosity, all notes and warnings for a slow query will also be added to the slow query log. This is very usable when one has enabled warnings for . log_slow_max_warnings limits the number of warnings printed to the slow query log per query. The default value is 10.
Part of this addition is based on the patch from .
This page is licensed: CC BY-SA / Gnu FDL
explain
EXPLAIN output is logged in the slow query log. See explain-in-the-slow-query-log for details.
Innodb (From . Before that this option did nothing)
Kept for compatibility. Same as engine.
engine (From )
Writes statistics from the storage engine. This includes:
Engine_time
Milliseconds spent inside engine calls (read_row / read_next_row etc)
All
Old shortcut to enable all the verbosity options
Log administrative statements (create, optimize, drop etc...)
log_slow_admin_statements maps to this option.
filesort
Log statement if it uses filesort
filesort_on_disk
Log statement if it uses filesort that needs temporary tables on disk
filesort_priority_queue (from )
Log statement if it uses filesort with priority_queue (filesort can either use disk or priority queue).
full_join
Log statements that don't uses indexes to join tables
full_scan
Log statements that use full table scans
not_using_index (From )
Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows
Disregards long_query_time, unlike other options!
log_queries_not_using_indexes maps to this option
query_cache
Log statements that are resolved by the query cache
query_cache_miss
Log statements that are not resolved by the query cache
tmp_table
Log statements that use in memory temporary tables
tmp_table_on_disk
Log statements that use temporary tables on disk
Pages_accessed
Number of pages accessed from page buffer (innodb-buffer-pool / key cache)
InnoDB
Pages_updated
Number of pages updated in memory
InnoDB
Pages_read_time
Milliseconds spend reading pages from storage
InnoDB
Old_rows_read
Number of retrievals of old versions of rows in the engine (versioning)
InnoDB
Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations. Before their introduction, all conditions on non-indexed columns were ignored when searching for the best execution plan. Histograms can be collected for both indexed and non-indexed columns, and are made available to the optimizer.
Histogram statistics are stored in the table, which stores data for , and so are essentially a subset of engine-independent table statistics.
Histograms are used by default from if they are available. However, histogram statistics are not automatically collected, as collection is expensive, requiring a full table scan. See for details.
Consider this example, using the following query:
Let's assume that
table t1 contains 100 records
table t2 contains 1000 records
there is a primary index on t1(a)
there is a secondary index on t2(a)
there is no index defined on column t2.b
the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)
Before histograms were introduced, the optimizer would choose the plan that:
accesses t1 using a table scan
accesses t2 using index t2(a)
checks the condition t2.b BETWEEN 1 AND 3
This plan examines all rows of both tables and performs 100 index look-ups.
With histograms available, the optimizer can choose the following, more efficient plan:
accesses table t2 in a table scan
checks the condition t2.b BETWEEN 1 AND 3
accesses t1 using index t1(a)
This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.
There are a number of system variables that affect histograms.
The histogram_size variable determines the size, in bytes, from 0 to 255, used for a histogram. This is effectively the number of bins for histogram_type=SINGLE_PREC_HB or number of bins/2 for histogram_type=DOUBLE_PREC_HB. If it is set to 0 (the default for and below), no histograms are created when running an ANALYZE TABLE.
The histogram_type variable determines whether single precision (SINGLE_PREC_HB) or double precision (DOUBLE_PREC_HB) height-balanced histograms are created. From , double precision is the default. For and below, single precision is the default.
From , JSON_HB, JSON-format histograms, are accepted.
The optimizer_use_condition_selectivity controls which statistics can be used by the optimizer when looking for the best query execution plan.
1 Use selectivity of predicates as in .
2 Use selectivity of all range predicates supported by indexes.
3 Use selectivity of all range predicates estimated without histogram.
4 Use selectivity of all range predicates estimated with histogram.
5 Additionally use selectivity of certain non-range predicates calculated on record sample.
From , the default is 4. Until , the default is 1.
Here is an example of the dramatic impact histogram-based statistics can make. The query is based on with 60 million records in the lineitem table.
First,
Next, a really bad plan, yet one sometimes chosen:
Persistent statistics don't improve matters:
The default flags for optimizer_switch do not help much:
Using statistics doesn't help either:
Now, taking into account the cost of the dependent subquery:
Finally, using join_buffer as well:
(mariadb.org blog)
(video)
(video)
This page is licensed: CC BY-SA / Gnu FDL
The User Statistics (userstat) plugin creates the , , the , and the tables in the database. As an alternative to these tables, the plugin also adds the , the , the , and the statements.
These tables and commands can be used to understand the server activity better and to identify the sources of your database's load.
The plugin also adds the , , , and statements.
The MariaDB implementation of this plugin is based on the from Percona and Ourdelta. The original code comes from Google (Mark Callaghan's team) with additional work from Percona, Ourdelta, and Weldon Whipple. The MariaDB implementation provides the same functionality as the userstatv2 patch but a lot of changes have been made to make it faster and to better fit the MariaDB infrastructure.
SELECT * FROM t1,t2 WHERE t1.a=t2.a AND t2.b BETWEEN 1 AND 3;SELECT SQL_CALC_FOUND_ROWS s_name, s_address FROM
supplier, nation WHERE
s_suppkey IN
(SELECT ps_suppkey FROM partsupp WHERE
ps_partkey IN (SELECT p_partkey FROM part WHERE
p_name LIKE 'forest%') AND
ps_availqty >
(SELECT 0.5 * SUM(l_quantity) FROM lineitem WHERE
l_partkey = ps_partkey AND l_suppkey = ps_suppkey AND
l_shipdate >= DATE('1994-01-01') AND
l_shipdate < DATE('1994-01-01') + INTERVAL '1' YEAR ))
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY s_name
LIMIT 10;SET optimizer_switch='materialization=off,semijoin=off';+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation | ALL |...| 25 |100.00 | Using where;...
| 1 | PRIMARY | supplier | ref |...| 1447 |100.00 | Using where; Subq
| 2 | DEP SUBQ| partsupp | idxsq |...| 38 |100.00 | Using where
| 4 | DEP SUBQ| lineitem | ref |...| 3 |100.00 | Using where
| 3 | DEP SUBQ| part | unqsb |...| 1 |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(51.78 sec)+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | supplier | ALL |...|100381|100.00 | Using where; Subq
| 1 | PRIMARY | nation | ref |...| 1 |100.00 | Using where
| 2 | DEP SUBQ| partsupp | idxsq |...| 38 |100.00 | Using where
| 4 | DEP SUBQ| lineitem | ref |...| 3 |100.00 | Using where
| 3 | DEP SUBQ| part | unqsb |...| 1 |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(7 min 33.42 sec)SET use_stat_tables='preferably';
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | supplier | ALL |...|10000 |100.00 | Using where;
| 1 | PRIMARY | nation | ref |...| 1 |100.00 | Using where
| 2 | DEP SUBQ| partsupp | idxsq |...| 80 |100.00 | Using where
| 4 | DEP SUBQ| lineitem | ref |...| 7 |100.00 | Using where
| 3 | DEP SUBQ| part | unqsb |...| 1 |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(7 min 40.44 sec)SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | supplier | ALL |...|10000 |100.00 | Using where;
| 1 | PRIMARY | nation | ref |...| 1 |100.00 | Using where
| 1 | PRIMARY | <subq2> | eq_ref|...| 1 |100.00 |
| 2 | MATER | part | ALL |.. |2000000|100.00 | Using where
| 2 | MATER | partsupp | ref |...| 4 |100.00 | Using where; Subq
| 4 | DEP SUBQ| lineitem | ref |...| 7 |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(5 min 21.44 sec)SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
SET optimizer_use_condition_selectivity=4;
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation | ALL |...| 25 |4.00 | Using where
| 1 | PRIMARY | supplier | ref |...| 4000 |100.00 | Using where;
| 1 | PRIMARY | <subq2> | eq_ref|...| 1 |100.00 |
| 2 | MATER | part | ALL |.. |2000000|1.56 | Using where
| 2 | MATER | partsupp | ref |...| 4 |100.00 | Using where; Subq
| 4 | DEP SUBQ| lineitem | ref |...| 7 | 30.72 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(5 min 22.41 sec)SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
SET optimizer_use_condition_selectivity=4;
SET optimizer_switch='expensive_pred_static_pushdown=ON';
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation | ALL |...| 25 | 4.00 | Using where
| 1 | PRIMARY | supplier | ref |...| 4000 |100.00 | Using where;
| 2 | PRIMARY | partsupp | ref |...| 80 |100.00 |
| 2 | PRIMARY | part | eq_ref|...| 1 | 1.56 | where; Subq; FM
| 4 | DEP SUBQ| lineitem | ref |...| 7 | 30.72 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(49.89 sec)SET optimizer_switch= 'materialization=DEFAULT,semijoin=DEFAULT';
SET optimizer_use_condition_selectivity=4;
SET optimizer_switch='expensive_pred_static_pushdown=ON';
SET join_cache_level=6;
SET optimizer_switch='mrr=ON';
SET optimizer_switch='mrr_sort_keys=ON';
SET join_buffer_size=1024*1024*16;
SET join_buffer_space_limit=1024*1024*32;
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table | type |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation | AL L |...| 25 | 4.00 | Using where
| 1 | PRIMARY | supplier | ref |...| 4000 |100.00 | where; BKA
| 2 | PRIMARY | partsupp | ref |...| 80 |100.00 | BKA
| 2 | PRIMARY | part | eq_ref|...| 1 | 1.56 | where Sq; FM; BKA
| 4 | DEP SUBQ| lineitem | ref |...| 7 | 30.72 | Using where
+---+-------- +----------+-------+...+------+----------+------------
10 ROWS IN SET
(35.71 sec)The userstat plugin works by keeping several hash tables in memory. All variables are incremented while the query is running. At the end of each statement the global values are updated.
By default statistics are not collected. This is to ensure that statistics collection does not cause any extra load on the server unless desired.
Set the userstat=ON system variable in a relevant server option group in an option file to enable the plugin. For example:
The value can also be changed dynamically. For example:
The userstat plugin creates the USER_STATISTICS, CLIENT_STATISTICS, the INDEX_STATISTICS, and the TABLE_STATISTICS tables in the INFORMATION_SCHEMA database.
As an alternative to the INFORMATION_SCHEMA tables, the userstat plugin also adds the SHOW USER_STATISTICS, the SHOW CLIENT_STATISTICS, the SHOW INDEX_STATISTICS, and the SHOW TABLE_STATISTICS statements.
These commands are another way to display the information stored in the information schema tables. WHERE clauses are accepted. LIKE clauses are accepted but ignored.
The userstat plugin also adds the FLUSH USER_STATISTICS, FLUSH CLIENT_STATISTICS, FLUSH INDEX_STATISTICS, and FLUSH TABLE_STATISTICS statements, which discard the information stored in the specified information schema table.
2.0
Stable
2.0
Gamma
2.0
Stable
2.0
Gamma
2.0
Stable
2.0
Gamma
2.0
Stable
2.0
Gamma
Description: If set to 1, user statistics will be activated.
Command line: --userstat=1
Scope: Global
Dynamic: Yes
Data Type: boolean
Default Value: OFF
User Statistics introduced a number of new status variables:
This page is licensed: CC BY-SA / Gnu FDL
[mariadb]
...
userstat = 1SET GLOBAL userstat=1;SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS\G
*************************** 1. row ***************************
USER: root
TOTAL_CONNECTIONS: 1
CONCURRENT_CONNECTIONS: 0
CONNECTED_TIME: 297
BUSY_TIME: 0.001725
CPU_TIME: 0.001982
BYTES_RECEIVED: 388
BYTES_SENT: 2327
BINLOG_BYTES_WRITTEN: 0
ROWS_READ: 0
ROWS_SENT: 12
ROWS_DELETED: 0
ROWS_INSERTED: 13
ROWS_UPDATED: 0
SELECT_COMMANDS: 4
UPDATE_COMMANDS: 0
OTHER_COMMANDS: 3
COMMIT_TRANSACTIONS: 0
ROLLBACK_TRANSACTIONS: 0
DENIED_CONNECTIONS: 0
LOST_CONNECTIONS: 0
ACCESS_DENIED: 0
EMPTY_QUERIES: 1SELECT * FROM INFORMATION_SCHEMA.CLIENT_STATISTICS\G
*************************** 1. row ***************************
CLIENT: localhost
TOTAL_CONNECTIONS: 3
CONCURRENT_CONNECTIONS: 0
CONNECTED_TIME: 4883
BUSY_TIME: 0.009722
CPU_TIME: 0.0102131
BYTES_RECEIVED: 841
BYTES_SENT: 13897
BINLOG_BYTES_WRITTEN: 0
ROWS_READ: 0
ROWS_SENT: 214
ROWS_DELETED: 0
ROWS_INSERTED: 207
ROWS_UPDATED: 0
SELECT_COMMANDS: 10
UPDATE_COMMANDS: 0
OTHER_COMMANDS: 13
COMMIT_TRANSACTIONS: 0
ROLLBACK_TRANSACTIONS: 0
DENIED_CONNECTIONS: 0
LOST_CONNECTIONS: 0
ACCESS_DENIED: 0
EMPTY_QUERIES: 1
1 row in set (0.00 sec)SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE TABLE_NAME = "author";
+--------------+------------+------------+-----------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | ROWS_READ |
+--------------+------------+------------+-----------+
| books | author | by_name | 15 |
+--------------+------------+------------+-----------+SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS WHERE TABLE_NAME='user';
+--------------+------------+-----------+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+--------------+------------+-----------+--------------+------------------------+
| mysql | user | 5 | 2 | 2 |
+--------------+------------+-----------+--------------+------------------------+SHOW USER_STATISTICS
SHOW CLIENT_STATISTICS
SHOW INDEX_STATISTICS
SHOW TABLE_STATISTICSFLUSH USER_STATISTICS
FLUSH CLIENT_STATISTICS
FLUSH INDEX_STATISTICS
FLUSH TABLE_STATISTICS