All pages
Powered by GitBook
1 of 1

Loading...

Histogram-Based Statistics

Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations. Before their introduction, all conditions on non-indexed columns were ignored when searching for the best execution plan. Histograms can be collected for both indexed and non-indexed columns, and are made available to the optimizer.

Histogram statistics are stored in the mysql.column_stats table, which stores data for engine-independent table statistics, and so are essentially a subset of engine-independent table statistics.

Histograms are used by default from if they are available. However, histogram statistics are not automatically collected, as collection is expensive, requiring a full table scan. See Collecting Statistics with the ANALYZE TABLE Statement for details.

Consider this example, using the following query:

Let's assume that

  • table t1 contains 100 records

  • table t2 contains 1000 records

  • there is a primary index on t1(a)

  • there is a secondary index on t2(a)

  • there is no index defined on column t2.b

  • the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Before histograms were introduced, the optimizer would choose the plan that:

  • accesses t1 using a table scan

  • accesses t2 using index t2(a)

  • checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and performs 100 index look-ups.

With histograms available, the optimizer can choose the following, more efficient plan:

  • accesses table t2 in a table scan

  • checks the condition t2.b BETWEEN 1 AND 3

  • accesses t1 using index t1(a)

This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.

System Variables

There are a number of system variables that affect histograms.

histogram_size

The variable determines the size, in bytes, from 0 to 255, used for a histogram. This is effectively the number of bins for histogram_type=SINGLE_PREC_HB or number of bins/2 for histogram_type=DOUBLE_PREC_HB. If it is set to 0 (the default for and below), no histograms are created when running an .

histogram_type

The variable determines whether single precision (SINGLE_PREC_HB) or double precision (DOUBLE_PREC_HB) height-balanced histograms are created. From , double precision is the default. For and below, single precision is the default.

From , JSON_HB, JSON-format histograms, are accepted.

optimizer_use_condition_selectivity

The controls which statistics can be used by the optimizer when looking for the best query execution plan.

  • 1 Use selectivity of predicates as in .

  • 2 Use selectivity of all range predicates supported by indexes.

  • 3 Use selectivity of all range predicates estimated without histogram.

From , the default is 4. Until , the default is 1.

Example

Here is an example of the dramatic impact histogram-based statistics can make. The query is based on with 60 million records in the lineitem table.

First,

Next, a really bad plan, yet one sometimes chosen:

don't improve matters:

The default flags for do not help much:

Using statistics doesn't help either:

Now, taking into account the cost of the dependent subquery:

Finally, using as well:

See Also

This page is licensed: CC BY-SA / Gnu FDL

SELECT * FROM t1,t2 WHERE t1.a=t2.a AND t2.b BETWEEN 1 AND 3;
4 Use selectivity of all range predicates estimated with histogram.
  • 5 Additionally use selectivity of certain non-range predicates calculated on record sample.

  • JSON Histograms (mariadb.org blog)
  • Improved histograms in MariaDB 10.8 - Sergei Petrunia - FOSDEM 2022 (video)

  • Improving MariaDB’s optimizer with better selectivity estimates - Sergey Petrunia - Server Fest 2021 (video)

  • histogram_size
    ANALYZE TABLE
    histogram_type
    optimizer_use_condition_selectivity
    Persistent statistics
    optimizer_switch
    join_buffer
    DECODE_HISTOGRAM()
    Index Statistics
    InnoDB Persistent Statistics
    Engine-independent Statistics
    SELECT SQL_CALC_FOUND_ROWS s_name, s_address FROM 
    supplier, nation WHERE 
      s_suppkey IN
        (SELECT ps_suppkey FROM partsupp WHERE
          ps_partkey IN (SELECT p_partkey FROM part WHERE 
             p_name LIKE 'forest%') AND 
        ps_availqty > 
          (SELECT 0.5 * SUM(l_quantity) FROM lineitem WHERE
            l_partkey = ps_partkey AND l_suppkey = ps_suppkey AND
            l_shipdate >= DATE('1994-01-01') AND
            l_shipdate < DATE('1994-01-01') + INTERVAL '1' YEAR ))
      AND s_nationkey = n_nationkey
      AND n_name = 'CANADA'
      ORDER BY s_name
      LIMIT 10;
    SET optimizer_switch='materialization=off,semijoin=off';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | ALL   |...| 25   |100.00 | Using where;...
    | 1 | PRIMARY | supplier | ref   |...| 1447 |100.00 | Using where; Subq
    | 2 | DEP SUBQ| partsupp | idxsq |...| 38   |100.00 | Using where
    | 4 | DEP SUBQ| lineitem | ref   |...| 3    |100.00 | Using where
    | 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (51.78 sec)
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | supplier | ALL   |...|100381|100.00 | Using where; Subq
    | 1 | PRIMARY | nation   | ref   |...| 1    |100.00 | Using where
    | 2 | DEP SUBQ| partsupp | idxsq |...| 38   |100.00 | Using where
    | 4 | DEP SUBQ| lineitem | ref   |...| 3    |100.00 | Using where
    | 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (7 min 33.42 sec)
    SET use_stat_tables='preferably';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | supplier | ALL   |...|10000 |100.00 | Using where;
    | 1 | PRIMARY | nation   | ref   |...| 1    |100.00 | Using where
    | 2 | DEP SUBQ| partsupp | idxsq |...| 80   |100.00 | Using where
    | 4 | DEP SUBQ| lineitem | ref   |...| 7    |100.00 | Using where
    | 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (7 min 40.44 sec)
    SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows  | filt  | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | supplier | ALL   |...|10000  |100.00 | Using where;
    | 1 | PRIMARY | nation   | ref   |...| 1     |100.00 | Using where
    | 1 | PRIMARY | <subq2>  | eq_ref|...| 1     |100.00 |
    | 2 | MATER   | part     | ALL   |.. |2000000|100.00 | Using where
    | 2 | MATER   | partsupp | ref   |...| 4     |100.00 | Using where; Subq
    | 4 | DEP SUBQ| lineitem | ref   |...| 7     |100.00 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (5 min 21.44 sec)
    SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
    SET optimizer_use_condition_selectivity=4;
    
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows  | filt  | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | ALL   |...| 25    |4.00   | Using where
    | 1 | PRIMARY | supplier | ref   |...| 4000  |100.00 | Using where;
    | 1 | PRIMARY | <subq2>  | eq_ref|...| 1     |100.00 |
    | 2 | MATER   | part     | ALL   |.. |2000000|1.56   | Using where
    | 2 | MATER   | partsupp | ref   |...| 4     |100.00 | Using where; Subq
    | 4 | DEP SUBQ| lineitem | ref   |...| 7     | 30.72 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (5 min 22.41 sec)
    SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
    SET optimizer_use_condition_selectivity=4;
    SET optimizer_switch='expensive_pred_static_pushdown=ON';
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt  | Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | ALL   |...| 25   | 4.00  | Using where
    | 1 | PRIMARY | supplier | ref   |...| 4000 |100.00 | Using where;
    | 2 | PRIMARY | partsupp | ref   |...| 80   |100.00 |
    | 2 | PRIMARY | part     | eq_ref|...| 1    | 1.56  | where; Subq; FM
    | 4 | DEP SUBQ| lineitem | ref   |...| 7    | 30.72 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (49.89 sec)
    SET optimizer_switch= 'materialization=DEFAULT,semijoin=DEFAULT';
    SET optimizer_use_condition_selectivity=4;
    SET optimizer_switch='expensive_pred_static_pushdown=ON';
    SET join_cache_level=6;
    SET optimizer_switch='mrr=ON';
    SET optimizer_switch='mrr_sort_keys=ON';
    SET join_buffer_size=1024*1024*16;
    SET join_buffer_space_limit=1024*1024*32;
    +---+-------- +----------+-------+...+------+----------+------------
    | id| sel_type| table    | type  |...| rows | filt |  Extra
    +---+-------- +----------+-------+...+------+----------+------------
    | 1 | PRIMARY | nation   | AL  L |...| 25   | 4.00  | Using where
    | 1 | PRIMARY | supplier | ref   |...| 4000 |100.00 | where; BKA
    | 2 | PRIMARY | partsupp | ref   |...| 80   |100.00 | BKA
    | 2 | PRIMARY | part     | eq_ref|...| 1    | 1.56  | where Sq; FM; BKA
    | 4 | DEP SUBQ| lineitem | ref   |...| 7    | 30.72 | Using where
    +---+-------- +----------+-------+...+------+----------+------------
    
    10 ROWS IN SET
    (35.71 sec)
    DBT3 Benchmark Q20
    MariaDB 10.4.3
    MariaDB 10.4.2
    MariaDB 10.4.3
    MariaDB 10.4.2
    MariaDB 10.8
    MariaDB 5.5
    MariaDB 10.4.1
    MariaDB 10.4.0