# Engine-Independent Table Statistics

## Introduction

The MySQL/MariaDB optimizer prior to MariaDB 10 used to rely on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:

* Storage engines provided poor statistics (this was fixed to some degree with the introduction of [Persistent Statistics](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/statistics-for-optimizing-queries/innodb-persistent-statistics)).
* The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
* There was little control of the statistics. There was no way to "pin" current statistic values, or provide values on your own, etc.

Engine-independent table statistics lift these limitations:

* Statistics are stored in regular tables in the `mysql` database. You can read and update the values.
* More data is collected/used.

[Histogram-based statistics](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/statistics-for-optimizing-queries/histogram-based-statistics) are a subset of engine-independent table statistics (EITS) that can improve the query plan chosen by the optimizer in certain situations.

Statistics are stored in three tables, [mysql.table\_stats](https://mariadb.com/docs/server/reference/system-tables/the-mysql-database-tables/mysql-table_stats-table), [mysql.column\_stats](https://mariadb.com/docs/server/reference/system-tables/the-mysql-database-tables/mysql-column_stats-table) and [mysql.index\_stats](https://mariadb.com/docs/server/reference/system-tables/the-mysql-database-tables/mysql-index_stats-table).

Usage or updating of data from these tables is controlled by [use\_stat\_tables](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#use_stat_tables) variable. Possible values are:

| Value                         | Meaning                                                                                                                                                                                          |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 'never'                       | The optimizer doesn't use data from statistics tables.                                                                                                                                           |
| 'complementary'               | The optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine.                                                                                   |
| 'preferably'                  | Prefer the data from statistics tables, if it's not available there, use the data from the storage engine.                                                                                       |
| 'complementary\_for\_queries' | Same as complementary, but for queries only (to avoid needlessly collecting for [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table)).       |
| 'preferably\_for\_queries'    | Same as preferably, but for queries only (to avoid needlessly collecting for [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table)). Default. |

## Collecting Statistics with the ANALYZE TABLE Statement

Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.

The [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) statement can be used to collect table statistics. However, simply running `ANALYZE TABLE table_name` does not collect engine-independent (or histogram) statistics by default.

When the [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) statement is executed, MariaDB makes a call to the table's storage engine, and the storage engine collects its own statistics for the table. The specific behavior depends on the storage engine. For the default [InnoDB](https://mariadb.com/docs/server/server-usage/storage-engines/innodb) storage engine, see [InnoDB Persistent Statistics](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/statistics-for-optimizing-queries/innodb-persistent-statistics) for more information.

[ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) may also collect engine-independent statistics for the table. The specific behavior depends on the value of the [use\_stat\_tables](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#use_stat_tables) system variable. Engine-independent statistics will only be collected if one of the following is true:

* The [use\_stat\_tables](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#use_stat_tables) system variable is set to `complementary` or `preferably`.
* The [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) statement includes the `PERSISTENT FOR` clause.

The [use\_stat\_tables](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#use_stat_tables) system variable is set to `preferably_for_queries` by default. With this value, engine-independent statistics are used by default if available, but they are not collected by default. If you want to use engine-independent statistics with the default configuration, then you will have to collect them by executing the [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) statement and by specifying the `PERSISTENT FOR` clause. It is recommended to collect engine-independent statistics on as-needed basis, so typically one will not have engine-independent statistics for all indexes/all columns.

When to collect statistics is very dependent on the dataset. If data changes frequently it may be necessary to collect statistics more frequently, and the benefits may be very noticeable (see [This one trick can make MariaDB 30x faster!](https://mariadb.org/mariadb-30x-faster/)). If the data distribution is relatively static, the costs of collecting may outweigh any benefits.

### Collecting Statistics for Specific Columns or Indexes

The syntax for the [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) statement has been extended with the `PERSISTENT FOR` clause. This clause allows one to collect engine-independent statistics only for particular columns or indexes. This clause also allows one to collect engine-independent statistics, regardless of the value of the [use\_stat\_tables](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#use_stat_tables) system variable. For example:

```sql
ANALYZE TABLE table_name PERSISTENT FOR ALL;
```

Statistics for columns using the [BLOB](https://mariadb.com/docs/server/reference/data-types/string-data-types/blob) and [TEXT](https://mariadb.com/docs/server/reference/data-types/string-data-types/text) data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.

### Examples of Statistics Collection

```sql
-- update all engine-independent statistics for all columns and indexes
ANALYZE TABLE tbl PERSISTENT FOR ALL;

-- update specific columns and indexes:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...);

-- empty lists are allowed:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES ();
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...);

-- the following will only update mysql.table_stats fields:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES ();

-- when use_stat_tables is set to 'COMPLEMENTARY' or 'PREFERABLY', 
-- a simple ANALYZE TABLE  collects engine-independent statistics for all columns and indexes.
SET SESSION use_stat_tables='COMPLEMENTARY';
ANALYZE TABLE tbl;
```

## Manual Updates to Statistics Tables

Statistics are stored in three tables, [mysql.table\_stats](https://mariadb.com/docs/server/reference/system-tables/the-mysql-database-tables/mysql-table_stats-table), [mysql.column\_stats](https://mariadb.com/docs/server/reference/system-tables/the-mysql-database-tables/mysql-column_stats-table) and [mysql.index\_stats](https://mariadb.com/docs/server/reference/system-tables/the-mysql-database-tables/mysql-index_stats-table).

It is possible to update statistics tables manually. One should modify the table(s) with regular [INSERT](https://mariadb.com/docs/server/reference/sql-statements/data-manipulation/inserting-loading-data/insert)/[UPDATE](https://mariadb.com/docs/server/reference/sql-statements/data-manipulation/changing-deleting-data/update)/[DELETE](https://mariadb.com/docs/server/reference/sql-statements/data-manipulation/changing-deleting-data/delete) statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue [FLUSH TABLES](https://mariadb.com/docs/server/reference/sql-statements/administrative-sql-statements/flush-commands/flush) command.

A few scenarios where one might need to update statistics tables manually:

* Deleting the statistics. Currently, the [ANALYZE TABLE](https://mariadb.com/docs/server/reference/sql-statements/table-statements/analyze-table) command will collect the statistics, but there is no special command to delete statistics.
* Running `ANALYZE` on a different server. To collect engine-independent statistics ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
* In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than `ANALYZE` does. One can compute the statistics manually and put it into the database.

## EITS[^1] vs. InnoDB Statistics

This section visually explains how MariaDB decides which statistics to use, and what happens during the `ANALYZE` stage. These flows are critical for understanding why toggling [`use_stat_tables`](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#use_stat_tables) affects both performance and optimizer behavior.

{% hint style="info" %}
**MariaDB 10.6 and later** include several optimizer enhancements that affect how these statistics are utilized. These improvements include:

* More accurate cost-based optimization decisions
* Improved index and condition selection
* Enhanced selectivity estimation during `ANALYZE` operations
  {% endhint %}

### Optimizer Statistics Selection Flow (Query Execution Time)

{% @mermaid/diagram content="graph TD
Start(\[Incoming SQL Query]) --> Eval{use\_stat\_tables value?}

```
Eval -- "use_stat_tables = NEVER" --> Ignore[Optimizer ignores EITS entirely]
Ignore --> InnoDB[Use InnoDB statistics from<br/>innodb_table_stats and innodb_index_stats]

Eval -- "use_stat_tables = PREFERABLY / PREFERABLY_FOR_QUERIES" --> CheckEITS{Are EITS present for this table?}

CheckEITS -- Yes --> UseEITS[Use EITS from<br/>mysql.table_stats, column_stats, and index_stats]
CheckEITS -- No --> Fallback[Fallback to InnoDB stats<br/>'EITS not collected yet']" %}
```

* NEVER: Optimizer always uses InnoDB stats, even if EITS exists.
* PREFERABLY: Optimizer prefers EITS and transparently falls back if missing.

### ANALYZE TABLE – Statistics Collection Flow

{% @mermaid/diagram content="graph TD
Start\[ANALYZE TABLE issued] --> Eval\[Check use\_stat\_tables value at runtime]

```
Eval -- "NEVER" --> InnoDB[InnoDB samples data<br/>— fast, approximate]
InnoDB --> UpdateInnoDB[Updates only:<br/>mysql.innodb_table_stats<br/>mysql.innodb_index_stats]

Eval -- "PREFERABLY / PERSISTENT" --> FullScan[Full scan or sampled scan<br/>— based on analyze_sample_percentage]
FullScan --> UpdateBoth[Updates BOTH:<br/>mysql.innodb_table_stats<br/>mysql.table_stats<br/>mysql.column_stats histograms<br/>mysql.index_stats]" %}
```

* NEVER: Fast, safe, but low precision.
* PERSISTENT: Explicit, predictable, and recommended.

### Column Statistics Collection Flow (analyze\_max\_length)

{% @mermaid/diagram content="graph TD
Start\[Column encountered during EITS collection] --> TypeCheck{Is column type CHAR or VARCHAR?}

```
TypeCheck -- No --> Always[Non-string column: Always collected]

TypeCheck -- Yes --> CalcLength[Calculate byte length: characters x charset bytes]

CalcLength --> LimitCheck{Is length <= analyze_max_length?}

LimitCheck -- Yes --> Stored[Column stats stored in mysql.column_stats]
LimitCheck -- No --> Skipped[Column skipped with warning to prevent long ANALYZE runtime]" %}
```

* `utf8mb4` multiplies size by 4 (compared to `latin1`)
* `VARCHAR (255)` ≈ 1020 bytes
* [`analyze_max_length`](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#analyze_max_length) = 2048 is a safe global default.

| Scenario / Environment                   | Recommended Mode                                              | Why This Works                                                      |
| ---------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------- |
| Small DB (<100 GB), simple queries       | NEVER                                                         | InnoDB stats are sufficient and fastest to maintain                 |
| Mixed workloads, general production      | PREFERABLY\_FOR\_QUERIES                                      | Safe default: optimizer uses EITS when present, ANALYZE is explicit |
| Analytics / reporting heavy              | PREFERABLY                                                    | Ensures rich column & index stats for complex joins                 |
| Large DB (1–10 TB), 24/7 traffic         | PREFERABLY\_FOR\_QUERIES + scheduled ANALYZE TABLE PERSISTENT | Avoids accidental heavy ANALYZE during peak hours                   |
| SaaS / multi-tenant systems              | PREFERABLY\_FOR\_QUERIES                                      | Predictable behavior across thousands of tables                     |
| Highly volatile tables (frequent writes) | NEVER or selective PERSISTENT                                 | EITS can go stale quickly on hot tables                             |
| Troubleshooting bad query plans          | PREFERABLY (temporarily)                                      | Forces optimizer to use detailed stats                              |
| Upgrade from MariaDB 10.6                | Start with PREFERABLY\_FOR\_QUERIES                           | Minimizes plan regressions                                          |

## See Also

* [Index Statistics](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/optimization-and-indexes/index-statistics)
* [InnoDB Persistent Statistics](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/statistics-for-optimizing-queries/innodb-persistent-statistics)
* [Histogram-based Statistics](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/statistics-for-optimizing-queries/histogram-based-statistics)
* [JSON histograms](https://mariadb.org/10-7-preview-feature-json-histograms/) (mariadb.org blog)
* [Improving MariaDB’s optimizer with better selectivity estimates - Sergey Petrunia - Server Fest 2021](https://www.youtube.com/watch?v=bsl7Fis0onE) (video)

<sub>*This page is licensed: CC BY-SA / Gnu FDL*</sub>

{% @marketo/form formId="4316" %}

[^1]: Engine-Independent Table Statistics
