1 of 6

Statistics for Optimizing Queries

Utilize statistics to optimize queries in MariaDB Server. This section explains how the database uses statistical information to generate efficient query execution plans and improve performance.

Engine-Independent Table Statistics

Introduction

Before , the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:

Storage engines provided poor statistics (this was fixed to some degree with the introduction of Persistent Statistics).
The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

Engine-independent table statistics lift these limitations.

Statistics are stored in regular tables in the mysql database.
- it is possible for a DBA to read and update the values.
More data is collected/used.

are a subset of engine-independent table statistics that can improve the query plan chosen by the optimizer in certain situations.

Statistics are stored in three tables, , and .

Use or update of data from these tables is controlled by variable. Possible values are listed below:

Value

Meaning

Collecting Statistics with the ANALYZE TABLE Statement

Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.

The statement can be used to collect table statistics. However, simply running ANALYZE TABLE table_name does not collect engine-independent (or histogram) statistics by default.

When the statement is executed, MariaDB makes a call to the table's storage engine, and the storage engine collects its own statistics for the table. The specific behavior depends on the storage engine. For the default storage engine, see for more information.

may also collect engine-independent statistics for the table. The specific behavior depends on the value of the system variable. Engine-independent statistics will only be collected if one of the following is true:

The system variable is set to complementary or preferably.
The statement includes the PERSISTENT FOR clause.

The system variable is set to preferably_for_queries by default. With this value, engine-independent statistics are used by default if available, but they are not collected by default. If you want to use engine-independent statistics with the default configuration, then you will have to collect them by executing the statement and by specifying the PERSISTENT FOR clause. It is recommended to collect engine-independent statistics on as-needed basis, so typically one will not have engine-independent statistics for all indexes/all columns.

When to collect statistics is very dependent on the dataset. If data changes frequently it may be necessary to collect statistics more frequently, and the benefits may be very noticeable (see ). If the data distribution is relatively static, the costs of collecting may outweigh any benefits.

Collecting Statistics for Specific Columns or Indexes

The syntax for the statement has been extended with the PERSISTENT FOR clause. This clause allows one to collect engine-independent statistics only for particular columns or indexes. This clause also allows one to collect engine-independent statistics, regardless of the value of the system variable. For example:

Statistics for columns using the and data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.

Examples of Statistics Collection

Manual Updates to Statistics Tables

Statistics are stored in three tables, , and .

It is possible to update statistics tables manually. One should modify the table(s) with regular // statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue command.

A few scenarios where one might need to update statistics tables manually:

Deleting the statistics. Currently, the command will collect the statistics, but there is no special command to delete statistics.
Running ANALYZE on a different server. To collect engine-independent statistics ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.

Histogram-Based Statistics

Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations. Before their introduction, all conditions on non-indexed columns were ignored when searching for the best execution plan. Histograms can be collected for both indexed and non-indexed columns, and are made available to the optimizer.

Histogram statistics are stored in the mysql.column_stats table, which stores data for engine-independent table statistics, and so are essentially a subset of engine-independent table statistics.

Histograms are used by default from if they are available. However, histogram statistics are not automatically collected, as collection is expensive, requiring a full table scan. See Collecting Statistics with the ANALYZE TABLE Statement for details.

Consider this example, using the following query:

Let's assume that

table t1 contains 100 records
table t2 contains 1000 records
there is a primary index on t1(a)
there is a secondary index on t2(a)
there is no index defined on column t2.b
the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Before histograms were introduced, the optimizer would choose the plan that:

accesses t1 using a table scan
accesses t2 using index t2(a)
checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and performs 100 index look-ups.

With histograms available, the optimizer can choose the following, more efficient plan:

accesses table t2 in a table scan
checks the condition t2.b BETWEEN 1 AND 3
accesses t1 using index t1(a)

This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.

System Variables

There are a number of system variables that affect histograms.

histogram_size

The variable determines the size, in bytes, from 0 to 255, used for a histogram. This is effectively the number of bins for histogram_type=SINGLE_PREC_HB or number of bins/2 for histogram_type=DOUBLE_PREC_HB. If it is set to 0 (the default for and below), no histograms are created when running an .

histogram_type

The variable determines whether single precision (SINGLE_PREC_HB) or double precision (DOUBLE_PREC_HB) height-balanced histograms are created. From , double precision is the default. For and below, single precision is the default.

From , JSON_HB, JSON-format histograms, are accepted.

optimizer_use_condition_selectivity

The controls which statistics can be used by the optimizer when looking for the best query execution plan.

1 Use selectivity of predicates as in .
2 Use selectivity of all range predicates supported by indexes.
3 Use selectivity of all range predicates estimated without histogram.

From , the default is 4. Until , the default is 1.

Example

Here is an example of the dramatic impact histogram-based statistics can make. The query is based on with 60 million records in the lineitem table.

First,

Next, a really bad plan, yet one sometimes chosen:

don't improve matters:

The default flags for do not help much:

Using statistics doesn't help either:

Now, taking into account the cost of the dependent subquery:

Finally, using as well:

InnoDB Persistent Statistics

InnoDB statistics are stored on disk and are therefore persistent. Prior to , InnoDB statistics were not stored on disk, meaning that on server restarts the statistics would need to be recalculated, which is both needless computation, as well as leading to inconsistent query plans.

There are a number of variables that control persistent statistics:

innodb_stats_persistent - when set (the default) enables InnoDB persistent statistics.
innodb_stats_auto_recalc - when set (the default), persistent statistics are automatically recalculated when the table changes significantly (more than 10% of the rows)
- Number of index pages sampled (default 20) when estimating cardinality and statistics for indexed columns. Increasing this value will increases index statistics accuracy, but use more I/O resources when running .

These settings can be overwritten on a per-table basis by use of the , and clauses in a or statement.

Details of the statistics are stored in two system tables in the :

The statement can be used to recalculate InnoDB statistics.

The statement triggers a reload of the statistics.

MariaDB starting with

Prior to , and , also caused InnoDB statistics to be reloaded. From , and , this is no longer the case.

Slow Query Log Extended Statistics

Overview

Added extra logging to slow log of 'Thread_id, Schema, Query Cache hit, Rows sent and Rows examined'
Added optional logging to slow log, through log_slow_verbosity, of query plan statistics
Added new session variables log_slow_rate_limit, log_slow_verbosity, log_slow_filter
Added log-slow-file as synonym for 'slow-log-file', as most slow-log variables start with 'log-slow'
Added log-slow-time as synonym for long-query-time.

Session Variables

log_slow_verbosity

You can set the verbosity of what's logged to the slow query log by setting the variable to a combination of the following values:

All (From )
- Enable all verbosity options.
Query_plan

Option

Description

Engine

Warnings (From )
- Print all errors, warnings and notes related to statement, up to log_slow_max_warnings lines.
full.

The default value for log_slow_verbosity is ' ', to be compatible with MySQL 5.1.

The possible values for log_slow_verbosity areinnodb,query_plan,explain,engine,warnings. Multiple options are separated by ','. log_slow_verbosity is not supported when log_output='TABLE'.

In the future we will add more engine statistics and also support for other engines.

log_slow_filter

You can define which queries to log to the slow query log by setting the variable to a combination of the following values:

All (From )
- Enable all filter options. log_slow_filter will be shown as having all options set.
admin

Multiple options are separated by ','. If you don't specify any options everything will be logged (same as setting the value to All.

log_slow_rate_limit

The variable limits logging to the slow query log by not logging every query (only one query / log_slow_rate_limit is logged). This is mostly useful when debugging and you get too much information to the slow query log.

Note that in any case, only queries that takes longer than log_slow_time orlong_query_time' are logged (as before).

log_slow_max_warnings

MariaDB starting with

If one enables the warning option for log_slow_verbosity, all notes and warnings for a slow query will also be added to the slow query log. This is very usable when one has enabled warnings for . log_slow_max_warnings limits the number of warnings printed to the slow query log per query. The default value is 10.

Credits

Part of this addition is based on the patch from .

User Statistics

The User Statistics (userstat) plugin creates the USER_STATISTICS, CLIENT_STATISTICS, the INDEX_STATISTICS, and the TABLE_STATISTICS tables in the INFORMATION_SCHEMA database. As an alternative to these tables, the plugin also adds the SHOW USER_STATISTICS, the SHOW CLIENT_STATISTICS, the SHOW INDEX_STATISTICS, and the SHOW TABLE_STATISTICS statements.

These tables and commands can be used to understand the server activity better and to identify the sources of your database's load.

The plugin also adds the FLUSH USER_STATISTICS, FLUSH CLIENT_STATISTICS, FLUSH INDEX_STATISTICS, and FLUSH TABLE_STATISTICS statements.

The MariaDB implementation of this plugin is based on the userstatv2 patch from Percona and Ourdelta. The original code comes from Google (Mark Callaghan's team) with additional work from Percona, Ourdelta, and Weldon Whipple. The MariaDB implementation provides the same functionality as the userstatv2 patch but a lot of changes have been made to make it faster and to better fit the MariaDB infrastructure.

How it Works

The userstat plugin works by keeping several hash tables in memory. All variables are incremented while the query is running. At the end of each statement the global values are updated.

Enabling the Plugin

By default statistics are not collected. This is to ensure that statistics collection does not cause any extra load on the server unless desired.

Set the system variable in a relevant server in an to enable the plugin. For example:

The value can also be changed dynamically. For example:

Using the Plugin

Using the Information Schema Table

The userstat plugin creates the , , the , and the tables in the database.

Using the SHOW Statements

As an alternative to the tables, the userstat plugin also adds the , the , the , and the statements.

These commands are another way to display the information stored in the information schema tables. WHERE clauses are accepted. LIKE clauses are accepted but ignored.

Flushing Plugin Data

The userstat plugin also adds the , , , and statements, which discard the information stored in the specified information schema table.

Versions

USER_STATISTICS

Version

Status

Introduced

CLIENT_STATISTICS

Version

Status

Introduced

INDEX_STATISTICS

Version

Status

Introduced

TABLE_STATISTICS

Version

Status

Introduced

System Variables

`userstat`

Description: If set to 1, user statistics will be activated.
Command line: --userstat=1
Scope: Global
Dynamic: Yes

Status Variables

User Statistics introduced a number of new status variables:

(requires to be set to be recorded)
(requires to be set to be recorded)

_{This page is licensed: CC BY-SA / Gnu FDL}

Histogram-Based Statistics

Histogram statistics are stored in the mysql.column_stats table, which stores data for engine-independent table statistics, and so are essentially a subset of engine-independent table statistics.

Consider this example, using the following query:

SELECT * FROM t1,t2 WHERE t1.a=t2.a AND t2.b BETWEEN 1 AND 3;

Let's assume that

table t1 contains 100 records
table t2 contains 1000 records
there is a primary index on t1(a)
there is a secondary index on t2(a)
there is no index defined on column t2.b
the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Before histograms were introduced, the optimizer would choose the plan that:

accesses t1 using a table scan
accesses t2 using index t2(a)
checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and performs 100 index look-ups.

With histograms available, the optimizer can choose the following, more efficient plan:

accesses table t2 in a table scan
checks the condition t2.b BETWEEN 1 AND 3
accesses t1 using index t1(a)

This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.

System Variables

There are a number of system variables that affect histograms.

histogram_size

histogram_type

From , JSON_HB, JSON-format histograms, are accepted.

optimizer_use_condition_selectivity

The controls which statistics can be used by the optimizer when looking for the best query execution plan.

1 Use selectivity of predicates as in .
2 Use selectivity of all range predicates supported by indexes.
3 Use selectivity of all range predicates estimated without histogram.

From , the default is 4. Until , the default is 1.

Example

Here is an example of the dramatic impact histogram-based statistics can make. The query is based on with 60 million records in the lineitem table.

First,

Next, a really bad plan, yet one sometimes chosen:

don't improve matters:

The default flags for do not help much:

Using statistics doesn't help either:

Now, taking into account the cost of the dependent subquery:

Finally, using as well:

Slow Query Log Extended Statistics

Overview

Added extra logging to slow log of 'Thread_id, Schema, Query Cache hit, Rows sent and Rows examined'
Added optional logging to slow log, through log_slow_verbosity, of query plan statistics
Added new session variables log_slow_rate_limit, log_slow_verbosity, log_slow_filter
Added log-slow-file as synonym for 'slow-log-file', as most slow-log variables start with 'log-slow'
Added log-slow-time as synonym for long-query-time.

Session Variables

log_slow_verbosity

You can set the verbosity of what's logged to the slow query log by setting the variable to a combination of the following values:

All (From )
- Enable all verbosity options.
Query_plan

Option

Description

Engine

Warnings (From )
- Print all errors, warnings and notes related to statement, up to log_slow_max_warnings lines.
full.

The default value for log_slow_verbosity is ' ', to be compatible with MySQL 5.1.

The possible values for log_slow_verbosity areinnodb,query_plan,explain,engine,warnings. Multiple options are separated by ','. log_slow_verbosity is not supported when log_output='TABLE'.

In the future we will add more engine statistics and also support for other engines.

log_slow_filter

You can define which queries to log to the slow query log by setting the variable to a combination of the following values:

All (From )
- Enable all filter options. log_slow_filter will be shown as having all options set.
admin

Multiple options are separated by ','. If you don't specify any options everything will be logged (same as setting the value to All.

log_slow_rate_limit

Note that in any case, only queries that takes longer than log_slow_time orlong_query_time' are logged (as before).

log_slow_max_warnings

MariaDB starting with

Credits

Part of this addition is based on the patch from .

Engine-Independent Table Statistics

Introduction

Before , the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:

Storage engines provided poor statistics (this was fixed to some degree with the introduction of Persistent Statistics).
The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

Engine-independent table statistics lift these limitations.

Statistics are stored in regular tables in the mysql database.
- it is possible for a DBA to read and update the values.
More data is collected/used.

are a subset of engine-independent table statistics that can improve the query plan chosen by the optimizer in certain situations.

Statistics are stored in three tables, , and .

Use or update of data from these tables is controlled by variable. Possible values are listed below:

Value

Meaning

Collecting Statistics with the ANALYZE TABLE Statement

Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.

The statement can be used to collect table statistics. However, simply running ANALYZE TABLE table_name does not collect engine-independent (or histogram) statistics by default.

The system variable is set to complementary or preferably.
The statement includes the PERSISTENT FOR clause.

Collecting Statistics for Specific Columns or Indexes

Statistics for columns using the and data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.

Examples of Statistics Collection

Manual Updates to Statistics Tables

Statistics are stored in three tables, , and .

A few scenarios where one might need to update statistics tables manually:

Deleting the statistics. Currently, the command will collect the statistics, but there is no special command to delete statistics.
Running ANALYZE on a different server. To collect engine-independent statistics ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.

Statistics for Optimizing Queries

Engine-Independent Table Statistics

Introduction

Collecting Statistics with the ANALYZE TABLE Statement

Collecting Statistics for Specific Columns or Indexes

Examples of Statistics Collection

Manual Updates to Statistics Tables

See Also

Histogram-Based Statistics

System Variables

histogram_size

histogram_type

optimizer_use_condition_selectivity

Example

See Also

InnoDB Persistent Statistics

See Also

Slow Query Log Extended Statistics

Overview

Session Variables

log_slow_verbosity

log_slow_filter

log_slow_rate_limit

log_slow_max_warnings

Credits

See also

User Statistics

How it Works

Enabling the Plugin

Using the Plugin

Using the Information Schema Table

Using the SHOW Statements

Flushing Plugin Data

Versions

USER_STATISTICS

CLIENT_STATISTICS

INDEX_STATISTICS

TABLE_STATISTICS

System Variables

userstat

Status Variables

Statistics for Optimizing Queries

Histogram-Based Statistics

System Variables

histogram_size

histogram_type

optimizer_use_condition_selectivity

Example

See Also

InnoDB Persistent Statistics

See Also

Slow Query Log Extended Statistics

Overview

Session Variables

log_slow_verbosity

log_slow_filter

log_slow_rate_limit

log_slow_max_warnings

Credits

See also

User Statistics

How it Works

Enabling the Plugin

Using the Plugin

Using the Information Schema Table

Using the SHOW Statements

Flushing Plugin Data

Versions

USER_STATISTICS

CLIENT_STATISTICS

INDEX_STATISTICS

TABLE_STATISTICS

System Variables

userstat

Status Variables

Engine-Independent Table Statistics

Introduction

Collecting Statistics with the ANALYZE TABLE Statement

Collecting Statistics for Specific Columns or Indexes

Examples of Statistics Collection

`userstat`

`userstat`