Engine-independent table statistics

You are viewing an old version of this article. View the current version here.

This is a preliminary documentation about engine-independent table statistics.

Introduction

Before this project, MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked, however it had some deficiencies:

  • Storage engines provided poor statistics (this fixed to some extent in MySQL 5.6)
  • The statistics was supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
  • There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

This project is about lifting these limitations.

  • Statistics is stored in regular tables in mysql database.
    • it is possible for DBA to read and update the values
  • More data is collected/used.

Currently, statistics data these three tables:

+--------------------------+
| Tables_in_mysql (%stat%) |
+--------------------------+
| column_stat              |
| index_stat               |
| table_stat               |
+--------------------------+

use/update of data from these tables is controlled by @@use_stat_tables variable. Possible values are

  • 'never' - the optimizer doesn't use data from statistics tables
  • 'complementary' - the optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine
  • 'preferably' - prefer the data from statistics tables, if it's not available there, use the data from the storage engine.

ANALYZE TABLE extensions

ANALYZE TABLE command syntax has been extended. TODO: how exactly.

Statistics tables

mysql.table_stat

CREATE TABLE table_stat (
  db_name varchar(64) COLLATE utf8_bin NOT NULL,
  table_name varchar(64) COLLATE utf8_bin NOT NULL,
  cardinality bigint(21) unsigned DEFAULT NULL,
  PRIMARY KEY (db_name,table_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Tables';
ColumnMeaning
db_nameDatabase the table is in
table_nameTable name
cardinalityNumber of records in the table

mysql.column_stat

CREATE TABLE column_stat (
  db_name varchar(64) COLLATE utf8_bin NOT NULL,
  table_name varchar(64) COLLATE utf8_bin NOT NULL,
  column_name varchar(64) COLLATE utf8_bin NOT NULL,
  min_value varchar(255) COLLATE utf8_bin DEFAULT NULL,
  max_value varchar(255) COLLATE utf8_bin DEFAULT NULL,
  nulls_ratio double DEFAULT NULL,
  avg_length double DEFAULT NULL,
  avg_frequency double DEFAULT NULL,
  PRIMARY KEY (db_name,table_name,column_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns';
ColumnMeaning
db_nameDatabase the table is in
table_nameTable name
column_nameName of the column
min_valueMinimum value in the table (in text form)
max_valueMaximum value in the table (in text form)
nulls_ratioFraction of NULL values (0 - no NULLs, 0.5 - half values are NULLs, 1 - all values are NULLs)
avg_lengthAverage length of column value, in bytes*
avg_frequencyaverage number of records with the same value

* - counted as if one run select avg(length(col)). This doesn't count NULL bytes, assumes endspace removal for CHAR(n), etc.

mysql.index_stat

CREATE TABLE index_stat (
  db_name varchar(64) COLLATE utf8_bin NOT NULL,
  table_name varchar(64) COLLATE utf8_bin NOT NULL,
  index_name varchar(64) COLLATE utf8_bin NOT NULL,
  prefix_arity int(11) unsigned NOT NULL,
  avg_frequency double DEFAULT NULL,
  PRIMARY KEY (db_name,table_name,index_name,prefix_arity)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Indexes';
ColumnMeaning
db_nameDatabase the table is in
table_nameTable name
index_nameName of the index
prefix_arityindex prefix length. 1 for the first keypart, 2 for the first two, and so forth. InnoDB's extended keys are supported
avg_frequencyAverage #of records one will find for given values of (keypart1, keypart2, ..), provided the values will be found in the table.

Where to get it

  • The code is at lp:maria-captains/maria/maria-5.5-mwl248

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.