Engine-independent table statistics

You are viewing an old version of this article. View the current version here.
MariaDB starting with 10.0.1

Engine-independent table statistics was implemented in MariaDB 10.0.1

Introduction

Before MariaDB 10.0, MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked, however it had some deficiencies:

  • Storage engines provided poor statistics (this was fixed to some extent in MySQL 5.6)
  • The statistics was supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
  • There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

This project is about lifting these limitations.

  • Statistics is stored in regular tables in mysql database.
    • it is possible for DBA to read and update the values
  • More data is collected/used.

Currently, statistics data these three tables:

+--------------------------+
| Tables_in_mysql (%stats) |
+--------------------------+
| column_stats             |
| index_stats              |
| table_stats              |
+--------------------------+

use/update of data from these tables is controlled by use_stat_tables variable. Possible values are listed below:

ValueMeaning
'never'the optimizer doesn't use data from statistics tables
'complementary'the optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine
'preferably'prefer the data from statistics tables, if it's not available there, use the data from the storage engine.

ANALYZE TABLE statement

MySQL/MariaDB supports the ANALYZE TABLE statement:

ANALYZE TABLE table_name;

When it is run, the SQL layer will make a call to the storage engine to update the statistics. The effect of the call depends on the storage engine.

With engine-independent statistics:

  • If @@use_stat_tables='never', only storage engine statistics will be updated.
  • For other values of @@use_stat_tables, both storage engine statistics and engine-independent statistics will be updated.

note: currently, engine-independent statistics is collected by doing full table and full index scans. It can be quite expensive.

The syntax for ANALYZE TABLE has been extended to allow one to collect statistics only for particular columns or indexes:

ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...);

-- empty lists are allowed:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES ();
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...);

-- the following will only update mysql.table_stats fields:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES ();

-- the following is a synonym for old 'analyze table':
ANALYZE TABLE tbl PERSISTENT FOR ALL;

Statistics tables

mysql.table_stats

CREATE TABLE table_stats (
  db_name varchar(64) COLLATE utf8_bin NOT NULL,
  table_name varchar(64) COLLATE utf8_bin NOT NULL,
  cardinality bigint(21) unsigned DEFAULT NULL,
  PRIMARY KEY (db_name,table_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Tables';

ColumnMeaning
db_nameDatabase the table is in
table_nameTable name
cardinalityNumber of records in the table

mysql.column_stats

CREATE TABLE column_stats (
  db_name varchar(64) COLLATE utf8_bin NOT NULL,
  table_name varchar(64) COLLATE utf8_bin NOT NULL,
  column_name varchar(64) COLLATE utf8_bin NOT NULL,
  min_value varchar(255) COLLATE utf8_bin DEFAULT NULL,
  max_value varchar(255) COLLATE utf8_bin DEFAULT NULL,
  nulls_ratio double DEFAULT NULL,
  avg_length double DEFAULT NULL,
  avg_frequency double DEFAULT NULL,
  PRIMARY KEY (db_name,table_name,column_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns';
ColumnMeaning
db_nameDatabase the table is in
table_nameTable name
column_nameName of the column
min_valueMinimum value in the table (in text form)
max_valueMaximum value in the table (in text form)
nulls_ratioFraction of NULL values (0 - no NULLs, 0.5 - half values are NULLs, 1 - all values are NULLs)
avg_lengthAverage length of column value, in bytes*
avg_frequencyaverage number of records with the same value

* - counted as if one run select avg(length(col)). This doesn't count NULL bytes, assumes endspace removal for CHAR(n), etc.

mysql.index_stats

CREATE TABLE index_stats (
  db_name varchar(64) COLLATE utf8_bin NOT NULL,
  table_name varchar(64) COLLATE utf8_bin NOT NULL,
  index_name varchar(64) COLLATE utf8_bin NOT NULL,
  prefix_arity int(11) unsigned NOT NULL,
  avg_frequency double DEFAULT NULL,
  PRIMARY KEY (db_name,table_name,index_name,prefix_arity)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Indexes';
ColumnMeaning
db_nameDatabase the table is in
table_nameTable name
index_nameName of the index
prefix_arityindex prefix length. 1 for the first keypart, 2 for the first two, and so forth. InnoDB's extended keys are supported
avg_frequencyAverage #of records one will find for given values of (keypart1, keypart2, ..), provided the values will be found in the table.

Where to get it

  • The feature is a part of MariaDB 10.0
  • There is a branch based on MariaDB 5.5 at lp:maria-captains/maria/maria-5.5-mwl248

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.