All pages
Powered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Statistics for Optimizing Queries

Utilize statistics to optimize queries in MariaDB Server. This section explains how the database uses statistical information to generate efficient query execution plans and improve performance.

Histogram-Based Statistics

Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations. Before their introduction, all conditions on non-indexed columns were ignored when searching for the best execution plan. Histograms can be collected for both indexed and non-indexed columns, and are made available to the optimizer.

Histogram statistics are stored in the mysql.column_stats table, which stores data for engine-independent table statistics, and so are essentially a subset of engine-independent table statistics.

Histograms are used by default from if they are available. However, histogram statistics are not automatically collected, as collection is expensive, requiring a full table scan. See Collecting Statistics with the ANALYZE TABLE Statement for details.

Consider this example, using the following query:

Let's assume that

  • table t1 contains 100 records

  • table t2 contains 1000 records

  • there is a primary index on t1(a)

  • there is a secondary index on t2(a)

  • there is no index defined on column t2.b

  • the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Before histograms were introduced, the optimizer would choose the plan that:

  • accesses t1 using a table scan

  • accesses t2 using index t2(a)

  • checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and performs 100 index look-ups.

With histograms available, the optimizer can choose the following, more efficient plan:

  • accesses table t2 in a table scan

  • checks the condition t2.b BETWEEN 1 AND 3

  • accesses t1 using index t1(a)

This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.

System Variables

There are a number of system variables that affect histograms.

histogram_size

The variable determines the size, in bytes, from 0 to 255, used for a histogram. This is effectively the number of bins for histogram_type=SINGLE_PREC_HB or number of bins/2 for histogram_type=DOUBLE_PREC_HB. If it is set to 0 (the default for and below), no histograms are created when running an .

histogram_type

The variable determines whether single precision (SINGLE_PREC_HB) or double precision (DOUBLE_PREC_HB) height-balanced histograms are created. From , double precision is the default. For and below, single precision is the default.

From , JSON_HB, JSON-format histograms, are accepted.

optimizer_use_condition_selectivity

The controls which statistics can be used by the optimizer when looking for the best query execution plan.

  • 1 Use selectivity of predicates as in .

  • 2 Use selectivity of all range predicates supported by indexes.

  • 3 Use selectivity of all range predicates estimated without histogram.

From , the default is 4. Until , the default is 1.

Example

Here is an example of the dramatic impact histogram-based statistics can make. The query is based on with 60 million records in the lineitem table.

First,

Next, a really bad plan, yet one sometimes chosen:

don't improve matters:

The default flags for do not help much:

Using statistics doesn't help either:

Now, taking into account the cost of the dependent subquery:

Finally, using as well:

See Also

This page is licensed: CC BY-SA / Gnu FDL

SELECT * FROM t1,t2 WHERE t1.a=t2.a AND t2.b BETWEEN 1 AND 3;
4 Use selectivity of all range predicates estimated with histogram.
  • 5 Additionally use selectivity of certain non-range predicates calculated on record sample.

  • JSON Histograms (mariadb.org blog)
  • Improved histograms in MariaDB 10.8 - Sergei Petrunia - FOSDEM 2022 (video)

  • Improving MariaDB’s optimizer with better selectivity estimates - Sergey Petrunia - Server Fest 2021 (video)

  • histogram_size
    ANALYZE TABLE
    histogram_type
    optimizer_use_condition_selectivity
    Persistent statistics
    optimizer_switch
    join_buffer
    DECODE_HISTOGRAM()
    Index Statistics
    InnoDB Persistent Statistics
    Engine-independent Statistics
    SELECT SQL_CALC_FOUND_ROWS s_name, s_address FROM 
    supplier, nation WHERE 
      s_suppkey IN
        (SELECT ps_suppkey FROM partsupp WHERE
          ps_partkey IN (SELECT p_partkey FROM part WHERE 
             p_name LIKE 'forest%') AND 
        ps_availqty > 
          (SELECT 0.5 * SUM(l_quantity) FROM lineitem WHERE
            l_partkey = ps_partkey AND l_suppkey = ps_suppkey AND
            l_shipdate >= DATE('1994-01-01') AND
            l_shipdate < DATE('1994-01-01') + INTERVAL '1' YEAR ))
      AND s_nationkey = n_nationkey
      AND n_name = 'CANADA'
      ORDER BY s_name
      LIMIT 10;
    SET optimizer_switch='materialization=off,semijoin=off';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | ALL   |...| 25   |100.00 | Using where;...
    | 1 | PRIMARY | supplier | ref   |...| 1447 |100.00 | Using where; Subq
    | 2 | DEP SUBQ| partsupp | idxsq |...| 38   |100.00 | Using where
    | 4 | DEP SUBQ| lineitem | ref   |...| 3    |100.00 | Using where
    | 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (51.78 sec)
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | supplier | ALL   |...|100381|100.00 | Using where; Subq
    | 1 | PRIMARY | nation   | ref   |...| 1    |100.00 | Using where
    | 2 | DEP SUBQ| partsupp | idxsq |...| 38   |100.00 | Using where
    | 4 | DEP SUBQ| lineitem | ref   |...| 3    |100.00 | Using where
    | 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (7 min 33.42 sec)
    SET use_stat_tables='preferably';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | supplier | ALL   |...|10000 |100.00 | Using where;
    | 1 | PRIMARY | nation   | ref   |...| 1    |100.00 | Using where
    | 2 | DEP SUBQ| partsupp | idxsq |...| 80   |100.00 | Using where
    | 4 | DEP SUBQ| lineitem | ref   |...| 7    |100.00 | Using where
    | 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (7 min 40.44 sec)
    SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows  | filt  | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | supplier | ALL   |...|10000  |100.00 | Using where;
    | 1 | PRIMARY | nation   | ref   |...| 1     |100.00 | Using where
    | 1 | PRIMARY | <subq2>  | eq_ref|...| 1     |100.00 |
    | 2 | MATER   | part     | ALL   |.. |2000000|100.00 | Using where
    | 2 | MATER   | partsupp | ref   |...| 4     |100.00 | Using where; Subq
    | 4 | DEP SUBQ| lineitem | ref   |...| 7     |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (5 min 21.44 sec)
    SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
    SET optimizer_use_condition_selectivity=4;
    
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows  | filt  | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | ALL   |...| 25    |4.00   | Using where
    | 1 | PRIMARY | supplier | ref   |...| 4000  |100.00 | Using where;
    | 1 | PRIMARY | <subq2>  | eq_ref|...| 1     |100.00 |
    | 2 | MATER   | part     | ALL   |.. |2000000|1.56   | Using where
    | 2 | MATER   | partsupp | ref   |...| 4     |100.00 | Using where; Subq
    | 4 | DEP SUBQ| lineitem | ref   |...| 7     | 30.72 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (5 min 22.41 sec)
    SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
    SET optimizer_use_condition_selectivity=4;
    SET optimizer_switch='expensive_pred_static_pushdown=ON';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt  | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | ALL   |...| 25   | 4.00  | Using where
    | 1 | PRIMARY | supplier | ref   |...| 4000 |100.00 | Using where;
    | 2 | PRIMARY | partsupp | ref   |...| 80   |100.00 |
    | 2 | PRIMARY | part     | eq_ref|...| 1    | 1.56  | where; Subq; FM
    | 4 | DEP SUBQ| lineitem | ref   |...| 7    | 30.72 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (49.89 sec)
    SET optimizer_switch= 'materialization=DEFAULT,semijoin=DEFAULT';
    SET optimizer_use_condition_selectivity=4;
    SET optimizer_switch='expensive_pred_static_pushdown=ON';
    SET join_cache_level=6;
    SET optimizer_switch='mrr=ON';
    SET optimizer_switch='mrr_sort_keys=ON';
    SET join_buffer_size=1024*1024*16;
    SET join_buffer_space_limit=1024*1024*32;
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt |  Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | AL  L |...| 25   | 4.00  | Using where
    | 1 | PRIMARY | supplier | ref   |...| 4000 |100.00 | where; BKA
    | 2 | PRIMARY | partsupp | ref   |...| 80   |100.00 | BKA
    | 2 | PRIMARY | part     | eq_ref|...| 1    | 1.56  | where Sq; FM; BKA
    | 4 | DEP SUBQ| lineitem | ref   |...| 7    | 30.72 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (35.71 sec)

    InnoDB Persistent Statistics

    InnoDB statistics are stored on disk and are therefore persistent. Prior to , InnoDB statistics were not stored on disk, meaning that on server restarts the statistics would need to be recalculated, which is both needless computation, as well as leading to inconsistent query plans.

    There are a number of variables that control persistent statistics:

    • innodb_stats_persistent - when set (the default) enables InnoDB persistent statistics.

    • innodb_stats_auto_recalc - when set (the default), persistent statistics are automatically recalculated when the table changes significantly (more than 10% of the rows)

    • - Number of index pages sampled (default 20) when estimating cardinality and statistics for indexed columns. Increasing this value will increases index statistics accuracy, but use more I/O resources when running .

    These settings can be overwritten on a per-table basis by use of the , and clauses in a or statement.

    Details of the statistics are stored in two system tables in the :

    The statement can be used to recalculate InnoDB statistics.

    The statement triggers a reload of the statistics.

    MariaDB starting with

    Prior to , and , also caused InnoDB statistics to be reloaded. From , and , this is no longer the case.

    See Also

    This page is licensed: CC BY-SA / Gnu FDL

    innodb_stats_persistent_sample_pages
    ANALYZE TABLE
    STATS_PERSISTENT
    STATS_AUTO_RECALC
    STATS_SAMPLE_PAGES
    CREATE TABLE
    ALTER TABLE
    mysql database
    innodb_table_stats
    innodb_index_stats
    ANALYZE TABLE
    RENAME TABLE
    10.11.12
    MariaDB 10.11.12
    MariaDB 11.4.6
    MariaDB 11.8.2
    FLUSH TABLES
    MariaDB 10.11.12
    MariaDB 11.4.6
    MariaDB 11.8.2
    Index Statistics
    Engine-independent Statistics
    Histogram-based Statistics

    Slow Query Log Extended Statistics

    Overview

    • Added extra logging to slow log of 'Thread_id, Schema, Query Cache hit, Rows sent and Rows examined'

    • Added optional logging to slow log, through log_slow_verbosity, of query plan statistics

    • Added new session variables log_slow_rate_limit, log_slow_verbosity, log_slow_filter

    • Added log-slow-file as synonym for 'slow-log-file', as most slow-log variables start with 'log-slow'

    • Added log-slow-time as synonym for long-query-time.

    Session Variables

    log_slow_verbosity

    You can set the verbosity of what's logged to the slow query log by setting the variable to a combination of the following values:

    • All (From )

      • Enable all verbosity options.

    • Query_plan

    Option
    Description
    Engine
    • Warnings (From )

      • Print all errors, warnings and notes related to statement, up to log_slow_max_warnings lines.

    • full.

    The default value for log_slow_verbosity is ' ', to be compatible with MySQL 5.1.

    The possible values for log_slow_verbosity areinnodb,query_plan,explain,engine,warnings. Multiple options are separated by ','. log_slow_verbosity is not supported when log_output='TABLE'.

    In the future we will add more engine statistics and also support for other engines.

    log_slow_filter

    You can define which queries to log to the slow query log by setting the variable to a combination of the following values:

    • All (From )

      • Enable all filter options. log_slow_filter will be shown as having all options set.

    • admin

    Multiple options are separated by ','. If you don't specify any options everything will be logged (same as setting the value to All.

    log_slow_rate_limit

    The variable limits logging to the slow query log by not logging every query (only one query / log_slow_rate_limit is logged). This is mostly useful when debugging and you get too much information to the slow query log.

    Note that in any case, only queries that takes longer than log_slow_time orlong_query_time' are logged (as before).

    log_slow_max_warnings

    MariaDB starting with

    If one enables the warning option for log_slow_verbosity, all notes and warnings for a slow query will also be added to the slow query log. This is very usable when one has enabled warnings for . log_slow_max_warnings limits the number of warnings printed to the slow query log per query. The default value is 10.

    Credits

    Part of this addition is based on the patch from .

    See also

    This page is licensed: CC BY-SA / Gnu FDL

    For select queries, log information about the query plan. This includes "Full_scan", "Full_join", "Tmp_table", "Tmp_table_on_disk", "Filesort", "Filesort_on_disk" and number of "Merge_passes during sorting"
  • explain

    • EXPLAIN output is logged in the slow query log. See explain-in-the-slow-query-log for details.

  • Innodb (From . Before that this option did nothing)

    • Kept for compatibility. Same as engine.

  • engine (From )

    • Writes statistics from the storage engine. This includes:

  • Engine_time

    Milliseconds spent inside engine calls (read_row / read_next_row etc)

    All

    Old shortcut to enable all the verbosity options

    Log administrative statements (create, optimize, drop etc...)

  • log_slow_admin_statements maps to this option.

  • filesort

    • Log statement if it uses filesort

  • filesort_on_disk

    • Log statement if it uses filesort that needs temporary tables on disk

  • filesort_priority_queue (from )

    • Log statement if it uses filesort with priority_queue (filesort can either use disk or priority queue).

  • full_join

    • Log statements that don't uses indexes to join tables

  • full_scan

    • Log statements that use full table scans

  • not_using_index (From )

    • Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows

    • Disregards long_query_time, unlike other options!

    • log_queries_not_using_indexes maps to this option

  • query_cache

    • Log statements that are resolved by the query cache

  • query_cache_miss

    • Log statements that are not resolved by the query cache

  • tmp_table

    • Log statements that use in memory temporary tables

  • tmp_table_on_disk

    • Log statements that use temporary tables on disk

  • Pages_accessed

    Number of pages accessed from page buffer (innodb-buffer-pool / key cache)

    InnoDB

    Pages_updated

    Number of pages updated in memory

    InnoDB

    Pages_read_time

    Milliseconds spend reading pages from storage

    InnoDB

    Old_rows_read

    Number of retrievals of old versions of rows in the engine (versioning)

    InnoDB

    log_slow_verbosity
    log_slow_filter
    log_slow_rate_limit
    microslow
    Percona

    User Statistics

    The User Statistics (userstat) plugin creates the USER_STATISTICS, CLIENT_STATISTICS, the INDEX_STATISTICS, and the TABLE_STATISTICS tables in the INFORMATION_SCHEMA database. As an alternative to these tables, the plugin also adds the SHOW USER_STATISTICS, the SHOW CLIENT_STATISTICS, the SHOW INDEX_STATISTICS, and the SHOW TABLE_STATISTICS statements.

    These tables and commands can be used to understand the server activity better and to identify the sources of your database's load.

    The plugin also adds the FLUSH USER_STATISTICS, FLUSH CLIENT_STATISTICS, FLUSH INDEX_STATISTICS, and FLUSH TABLE_STATISTICS statements.

    The MariaDB implementation of this plugin is based on the userstatv2 patch from Percona and Ourdelta. The original code comes from Google (Mark Callaghan's team) with additional work from Percona, Ourdelta, and Weldon Whipple. The MariaDB implementation provides the same functionality as the userstatv2 patch but a lot of changes have been made to make it faster and to better fit the MariaDB infrastructure.

    How it Works

    The userstat plugin works by keeping several hash tables in memory. All variables are incremented while the query is running. At the end of each statement the global values are updated.

    Enabling the Plugin

    By default statistics are not collected. This is to ensure that statistics collection does not cause any extra load on the server unless desired.

    Set the system variable in a relevant server in an to enable the plugin. For example:

    The value can also be changed dynamically. For example:

    Using the Plugin

    Using the Information Schema Table

    The userstat plugin creates the , , the , and the tables in the database.

    Using the SHOW Statements

    As an alternative to the tables, the userstat plugin also adds the , the , the , and the statements.

    These commands are another way to display the information stored in the information schema tables. WHERE clauses are accepted. LIKE clauses are accepted but ignored.

    Flushing Plugin Data

    The userstat plugin also adds the , , , and statements, which discard the information stored in the specified information schema table.

    Versions

    USER_STATISTICS

    Version
    Status
    Introduced

    CLIENT_STATISTICS

    Version
    Status
    Introduced

    INDEX_STATISTICS

    Version
    Status
    Introduced

    TABLE_STATISTICS

    Version
    Status
    Introduced

    System Variables

    userstat

    • Description: If set to 1, user statistics will be activated.

    • Command line: --userstat=1

    • Scope: Global

    • Dynamic: Yes

    Status Variables

    User Statistics introduced a number of new status variables:

    • (requires to be set to be recorded)

    • (requires to be set to be recorded)

    This page is licensed: CC BY-SA / Gnu FDL

    Data Type: boolean

  • Default Value: OFF

  • empty_queries

  • rows_read

  • rows_sent

  • 2.0

    Stable

    2.0

    Gamma

    2.0

    Stable

    2.0

    Gamma

    2.0

    Stable

    2.0

    Gamma

    2.0

    Stable

    2.0

    Gamma

    userstat=ON
    option group
    option file
    USER_STATISTICS
    CLIENT_STATISTICS
    INDEX_STATISTICS
    TABLE_STATISTICS
    INFORMATION_SCHEMA
    INFORMATION_SCHEMA
    SHOW USER_STATISTICS
    SHOW CLIENT_STATISTICS
    SHOW INDEX_STATISTICS
    SHOW TABLE_STATISTICS
    FLUSH USER_STATISTICS
    FLUSH CLIENT_STATISTICS
    FLUSH INDEX_STATISTICS
    FLUSH TABLE_STATISTICS
    access_denied_errors
    binlog_bytes_written
    busy_time
    userstat
    cpu_time
    userstat
    [mariadb]
    ...
    userstat = 1
    SET GLOBAL userstat=1;
    SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS\G
    *************************** 1. row ***************************
                      USER: root
         TOTAL_CONNECTIONS: 1
    CONCURRENT_CONNECTIONS: 0
            CONNECTED_TIME: 297
                 BUSY_TIME: 0.001725
                  CPU_TIME: 0.001982
            BYTES_RECEIVED: 388
                BYTES_SENT: 2327
      BINLOG_BYTES_WRITTEN: 0
                 ROWS_READ: 0
                 ROWS_SENT: 12
              ROWS_DELETED: 0
             ROWS_INSERTED: 13
              ROWS_UPDATED: 0
           SELECT_COMMANDS: 4
           UPDATE_COMMANDS: 0
            OTHER_COMMANDS: 3
       COMMIT_TRANSACTIONS: 0
     ROLLBACK_TRANSACTIONS: 0
        DENIED_CONNECTIONS: 0
          LOST_CONNECTIONS: 0
             ACCESS_DENIED: 0
             EMPTY_QUERIES: 1
    SELECT * FROM INFORMATION_SCHEMA.CLIENT_STATISTICS\G
    *************************** 1. row ***************************
                    CLIENT: localhost
         TOTAL_CONNECTIONS: 3
    CONCURRENT_CONNECTIONS: 0
            CONNECTED_TIME: 4883
                 BUSY_TIME: 0.009722
                  CPU_TIME: 0.0102131
            BYTES_RECEIVED: 841
                BYTES_SENT: 13897
      BINLOG_BYTES_WRITTEN: 0
                 ROWS_READ: 0
                 ROWS_SENT: 214
              ROWS_DELETED: 0
             ROWS_INSERTED: 207
              ROWS_UPDATED: 0
           SELECT_COMMANDS: 10
           UPDATE_COMMANDS: 0
            OTHER_COMMANDS: 13
       COMMIT_TRANSACTIONS: 0
     ROLLBACK_TRANSACTIONS: 0
        DENIED_CONNECTIONS: 0
          LOST_CONNECTIONS: 0
             ACCESS_DENIED: 0
             EMPTY_QUERIES: 1
    1 row in set (0.00 sec)
    SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE TABLE_NAME = "author";
    +--------------+------------+------------+-----------+
    | TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | ROWS_READ |
    +--------------+------------+------------+-----------+
    | books        | author     | by_name    |        15 |
    +--------------+------------+------------+-----------+
    SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS WHERE TABLE_NAME='user';
    +--------------+------------+-----------+--------------+------------------------+
    | TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
    +--------------+------------+-----------+--------------+------------------------+
    | mysql        | user       |         5 |            2 |                      2 |
    +--------------+------------+-----------+--------------+------------------------+
    SHOW USER_STATISTICS
    SHOW CLIENT_STATISTICS
    SHOW INDEX_STATISTICS
    SHOW TABLE_STATISTICS
    FLUSH USER_STATISTICS
    FLUSH CLIENT_STATISTICS
    FLUSH INDEX_STATISTICS
    FLUSH TABLE_STATISTICS

    Engine-Independent Table Statistics

    Introduction

    Before , the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:

    • Storage engines provided poor statistics (this was fixed to some degree with the introduction of Persistent Statistics).

    • The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)

    • There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

    Engine-independent table statistics lift these limitations.

    • Statistics are stored in regular tables in the mysql database.

      • it is possible for a DBA to read and update the values.

    • More data is collected/used.

    are a subset of engine-independent table statistics that can improve the query plan chosen by the optimizer in certain situations.

    Statistics are stored in three tables, , and .

    Use or update of data from these tables is controlled by variable. Possible values are listed below:

    Value
    Meaning

    Collecting Statistics with the ANALYZE TABLE Statement

    Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.

    The statement can be used to collect table statistics. However, simply running ANALYZE TABLE table_name does not collect engine-independent (or histogram) statistics by default.

    When the statement is executed, MariaDB makes a call to the table's storage engine, and the storage engine collects its own statistics for the table. The specific behavior depends on the storage engine. For the default storage engine, see for more information.

    may also collect engine-independent statistics for the table. The specific behavior depends on the value of the system variable. Engine-independent statistics will only be collected if one of the following is true:

    • The system variable is set to complementary or preferably.

    • The statement includes the PERSISTENT FOR clause.

    The system variable is set to preferably_for_queries by default. With this value, engine-independent statistics are used by default if available, but they are not collected by default. If you want to use engine-independent statistics with the default configuration, then you will have to collect them by executing the statement and by specifying the PERSISTENT FOR clause. It is recommended to collect engine-independent statistics on as-needed basis, so typically one will not have engine-independent statistics for all indexes/all columns.

    When to collect statistics is very dependent on the dataset. If data changes frequently it may be necessary to collect statistics more frequently, and the benefits may be very noticeable (see ). If the data distribution is relatively static, the costs of collecting may outweigh any benefits.

    Collecting Statistics for Specific Columns or Indexes

    The syntax for the statement has been extended with the PERSISTENT FOR clause. This clause allows one to collect engine-independent statistics only for particular columns or indexes. This clause also allows one to collect engine-independent statistics, regardless of the value of the system variable. For example:

    Statistics for columns using the and data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.

    Examples of Statistics Collection

    Manual Updates to Statistics Tables

    Statistics are stored in three tables, , and .

    It is possible to update statistics tables manually. One should modify the table(s) with regular // statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue command.

    A few scenarios where one might need to update statistics tables manually:

    • Deleting the statistics. Currently, the command will collect the statistics, but there is no special command to delete statistics.

    • Running ANALYZE on a different server. To collect engine-independent statistics ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.

    • In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.

    See Also

    • (mariadb.org blog)

    This page is licensed: CC BY-SA / Gnu FDL

    Improving MariaDB’s optimizer with better selectivity estimates - Sergey Petrunia - Server Fest 2021 (video)

    'never'

    The optimizer doesn't use data from statistics tables.

    'complementary'

    The optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine.

    'preferably'

    Prefer the data from statistics tables, if it's not available there, use the data from the storage engine.

    'complementary_for_queries'

    Same as complementary, but for queries only (to avoid needlessly collecting for ANALYZE TABLE).

    'preferably_for_queries'

    Same as preferably, but for queries only (to avoid needlessly collecting for ANALYZE TABLE). Default.

    Histogram-based statistics
    mysql.table_stats
    mysql.column_stats
    mysql.index_stats
    use_stat_tables
    ANALYZE TABLE
    ANALYZE TABLE
    InnoDB
    InnoDB Persistent Statistics
    ANALYZE TABLE
    use_stat_tables
    use_stat_tables
    ANALYZE TABLE
    use_stat_tables
    ANALYZE TABLE
    This one trick can make MariaDB 30x faster!
    ANALYZE TABLE
    use_stat_tables
    BLOB
    TEXT
    mysql.table_stats
    mysql.column_stats
    mysql.index_stats
    INSERT
    UPDATE
    DELETE
    FLUSH TABLES
    ANALYZE TABLE
    Index Statistics
    InnoDB Persistent Statistics
    Histogram-based Statistics
    JSON histograms
    ANALYZE TABLE table_name PERSISTENT FOR ALL;
    -- update all engine-independent statistics for all columns and indexes
    ANALYZE TABLE tbl PERSISTENT FOR ALL;
    
    -- update specific columns and indexes:
    ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...);
    
    -- empty lists are allowed:
    ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES ();
    ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...);
    
    -- the following will only update mysql.table_stats fields:
    ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES ();
    
    -- when use_stat_tables is set to 'COMPLEMENTARY' or 'PREFERABLY', 
    -- a simple ANALYZE TABLE  collects engine-independent statistics for all columns and indexes.
    SET SESSION use_stat_tables='COMPLEMENTARY';
    ANALYZE TABLE tbl;
    DBT3 Benchmark Q20
    Notes when an index cannot be used
    Notes when an index cannot be used because of type conversions
    MariaDB 10.4.3
    MariaDB 10.4.2
    MariaDB 10.4.3
    MariaDB 10.4.2
    MariaDB 10.8
    MariaDB 5.5
    MariaDB 10.4.1
    MariaDB 10.4.0
    MariaDB 10.0
    MariaDB 10.6.16
    MariaDB 10.6.16
    MariaDB 10.6.16
    10.6.16
    MariaDB 10.6.15
    MariaDB 10.6.15
    MariaDB 10.3.2
    MariaDB 10.3.1
    MariaDB 10.1.18
    MariaDB 10.1.1
    MariaDB 10.1.13
    MariaDB 10.1.1
    MariaDB 10.1.13
    MariaDB 10.1.1
    MariaDB 10.1.18
    MariaDB 10.1.1
    MariaDB 10.0