Optimizer statistics in MyRocks
This article describes how MyRocks storage engine provides statistics to the query optimizer.
There are three kinds of statistics:
- Table statistics (number of rows in the table, average row size)
- Index cardinality (how distinct values are in the index)
- records-in-range estimates (how many rows are in a certain range "const1 < tbl.key < const2".
How MyRocks computes statistics
MyRocks (actually RocksDB) uses LSM files which are written once and never updated. When an LSM file is written, MyRocks will compute index cardinalities and number-of-rows for the data in the file. (The file generally has rows, index records and/or tombstones for multiple tables/indexes).
For performance reasons, statistics are computed based on a fraction of rows in the LSM file. The percentage of rows used is controlled by @@rocksdb_table_stats_sampling_pct
, the default value is 10%.
Before the data is dumped into LSM file, it is stored in the MemTable. MemTable doesn't allow to compute index cardinalities, but it can provide an approximate number of rows in the table. Use of MemTable data for statistics is controlled by @@rocksdb_force_compute_memtable_stats
, the default value is ON
.
Is index statistics predictable
Those who create/run MTR tests, need to know whether EXPLAIN output is deterministic. For MyRocks tables, the answer is NO (just like for InnoDB).
Statistics are computed using sampling and GetApproximateMemTableStats() which means that #rows column in EXPLAIN output may vary slightly.
Records-in-range estimates
MyRocks uses RocksDB's GetApproximateSizes() call to produce an estimate for the number of rows in the certain range. The data in MemTable is also taken into account by issuing a GetApproximateMemTableStats call.
Debugging helper variables
There are a few variables that will cause MyRocks to report certain pre-defined estimate numbers to the optimizer:
- @@ocksdb_records_in_range - if not 0, report that any range has this many rows
- @@rocksdb_force_index_records_in_range - if not 0, and FORCE INDEX hint is used, report that any range has this many rows.
- @@rocksdb_debug_optimizer_n_rows - if not 0, report that any MyRocks table has this many rows.