Engine-independent Table Statistics

You are viewing an old version of this article. View the current version here.
MariaDB starting with 10.0.1

The engine-independent table statistics feature was implemented in MariaDB 10.0.1

Introduction

Before MariaDB 10.0, the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:

  • Storage engines provided poor statistics (this was fixed to some extent in MySQL 5.6)
  • The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
  • There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

Engine-independent table statistics lift these limitations.

  • Statistics are stored in regular tables in the mysql database.
    • it is possible for a DBA to read and update the values.
  • More data is collected/used.

Currently, statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.

Use or update of data from these tables is controlled by use_stat_tables variable. Possible values are listed below:

ValueMeaning
'never'the optimizer doesn't use data from statistics tables
'complementary'the optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine
'preferably'prefer the data from statistics tables, if it's not available there, use the data from the storage engine.

ANALYZE TABLE Statement

MySQL/MariaDB supports the ANALYZE TABLE statement:

ANALYZE TABLE table_name;

When it is run, the SQL layer will make a call to the storage engine to update the statistics. The effect of the call depends on the storage engine.

With engine-independent statistics:

  • If @@use_stat_tables='never' and PERSISTENT FOR isn't used as part of the ALTER TABLE statement, then only storage engine statistics will be updated and not engine-independent statistics.
  • For other values of @@use_stat_tables, both storage engine statistics and engine-independent statistics will be updated.

Currently, engine-independent statistics is collected by doing full table and full index scans, and can be quite expensive. Also note that statistics for blob and text columns are not collected. If explicitly specified, a warning is returned.

The syntax for ANALYZE TABLE has been extended to allow one to collect statistics only for particular columns or indexes:

ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...);

-- empty lists are allowed:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES ();
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...);

-- the following will only update mysql.table_stats fields:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES ();

-- the following will collect statistics for all columns and indexes and store them:
ANALYZE TABLE tbl PERSISTENT FOR ALL;

-- with use_stat_tables != NEVER a simple ANALYZE TABLE collects engine-independent statistics for all columns and indexes.
SET SESSION use_stat_tables='COMPLEMENTARY';
ANALYZE TABLE tbl;

Manual Updates to Statistics Tables

Statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.

It is possible to update statistics tables manually. One should modify the table(s) with regular INSERT/UPDATE/DELETE statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue FLUSH TABLES command.

A few scenarios where one might need to update statistics tables manually:

  • Deleting the statistics. Currently, ANALYZE TABLE command will collect the statistics, but there is no special command to delete statistics.
  • Running ANALYZE on a different server. ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
  • In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.

See Also

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.