Engine-independent table statistics
MariaDB starting with 10.0.1
Engine-independent table statistics was implemented in MariaDB 10.0.1
Contents
Introduction
Before MariaDB 10.0, MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked, however it had some deficiencies:
- Storage engines provided poor statistics (this was fixed to some extent in MySQL 5.6)
- The statistics was supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
- There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.
This project is about lifting these limitations.
- Statistics is stored in regular tables in
mysql
database.- it is possible for DBA to read and update the values
- More data is collected/used.
Currently, statistics data these three tables:
+--------------------------+ | Tables_in_mysql (%stats) | +--------------------------+ | column_stats | | index_stats | | table_stats | +--------------------------+
use/update of data from these tables is controlled by use_stat_tables variable. Possible values are listed below:
Value | Meaning |
---|---|
'never' | the optimizer doesn't use data from statistics tables |
'complementary' | the optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine |
'preferably' | prefer the data from statistics tables, if it's not available there, use the data from the storage engine. |
ANALYZE TABLE statement
MySQL/MariaDB supports the ANALYZE TABLE statement:
ANALYZE TABLE table_name;
When it is run, the SQL layer will make a call to the storage engine to update the statistics. The effect of the call depends on the storage engine.
With engine-independent statistics:
- If
@@use_stat_tables='never'
, only storage engine statistics will be updated. - For other values of
@@use_stat_tables
, both storage engine statistics and engine-independent statistics will be updated.
note: currently, engine-independent statistics is collected by doing full table and full index scans. It can be quite expensive.
The syntax for ANALYZE TABLE has been extended to allow one to collect statistics only for particular columns or indexes:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...); -- empty lists are allowed: ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (); ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...); -- the following will only update mysql.table_stats fields: ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (); -- the following is a synonym for old 'analyze table': ANALYZE TABLE tbl PERSISTENT FOR ALL;
Statistics tables
mysql.table_stats
CREATE TABLE table_stats ( db_name varchar(64) COLLATE utf8_bin NOT NULL, table_name varchar(64) COLLATE utf8_bin NOT NULL, cardinality bigint(21) unsigned DEFAULT NULL, PRIMARY KEY (db_name,table_name) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Tables';
Column | Meaning |
---|---|
db_name | Database the table is in |
table_name | Table name |
cardinality | Number of records in the table |
mysql.column_stats
CREATE TABLE column_stats ( db_name varchar(64) COLLATE utf8_bin NOT NULL, table_name varchar(64) COLLATE utf8_bin NOT NULL, column_name varchar(64) COLLATE utf8_bin NOT NULL, min_value varchar(255) COLLATE utf8_bin DEFAULT NULL, max_value varchar(255) COLLATE utf8_bin DEFAULT NULL, nulls_ratio double DEFAULT NULL, avg_length double DEFAULT NULL, avg_frequency double DEFAULT NULL, PRIMARY KEY (db_name,table_name,column_name) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns';
Column | Meaning |
---|---|
db_name | Database the table is in |
table_name | Table name |
column_name | Name of the column |
min_value | Minimum value in the table (in text form) |
max_value | Maximum value in the table (in text form) |
nulls_ratio | Fraction of NULL values (0 - no NULLs, 0.5 - half values are NULLs, 1 - all values are NULLs) |
avg_length | Average length of column value, in bytes* |
avg_frequency | average number of records with the same value |
* - counted as if one run select avg(length(col)). This doesn't count NULL bytes, assumes endspace removal for CHAR(n), etc.
mysql.index_stats
CREATE TABLE index_stats ( db_name varchar(64) COLLATE utf8_bin NOT NULL, table_name varchar(64) COLLATE utf8_bin NOT NULL, index_name varchar(64) COLLATE utf8_bin NOT NULL, prefix_arity int(11) unsigned NOT NULL, avg_frequency double DEFAULT NULL, PRIMARY KEY (db_name,table_name,index_name,prefix_arity) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Indexes';
Column | Meaning |
---|---|
db_name | Database the table is in |
table_name | Table name |
index_name | Name of the index |
prefix_arity | index prefix length. 1 for the first keypart, 2 for the first two, and so forth. InnoDB's extended keys are supported |
avg_frequency | Average #of records one will find for given values of (keypart1, keypart2, ..), provided the values will be found in the table. |
Where to get it
- The feature is a part of MariaDB 10.0
- There is a branch based on MariaDB 5.5 at lp:maria-captains/maria/maria-5.5-mwl248