The new cost model in MariaDB 11.0
Before MariaDB 11.0 the MariaDB Query optimizer used a 'basic cost' of 1 for:
- One disk access
- Fetching a key
- Fetching a row based on the rowid (= unique row identifier) from the key
There where some smaller costs:
- filter lookup: 0.01
- Examining a where clause: 0.20
- Comparing two keys: 0.05
- Fetching a row trough an index from temporary memory table: 0.05
The above costs are reasonable for finding out the best index to use. However they where not good for finding out if we should use a table scan, index scan or range lookup. The cost for the different engines where not properly calibrated.
New cost model
In MariaDB 11.0 we have fixed the above shortcomings by changing the
basic cost for 'storage engine operations' to be 1 millisecond. This
means that for most queries the query cost (
LAST_QUERY_COST) should be
close (or at least proportional) to the time the server is spending in
the storage engine + join_cache + sorting.
Note that the user level cost are in microseconds (as milliseconds would have so many zero's that it makes it hard to compare values)!
The engine costs has also been separated in smaller parts to make things more accurate.
The "disk"-read cost now assumes a mid level SSD disk with 400MB/second!
(this can be changed by the end user by modifying
All engine specific costs are visible in
The "default" cost for an engine can be found with:
select * from information_schema.optimizer_costs where engine="default"\G *************************** 1. row *************************** ENGINE: default OPTIMIZER_DISK_READ_COST: 10.240000 OPTIMIZER_INDEX_BLOCK_COPY_COST: 0.035600 OPTIMIZER_KEY_COMPARE_COST: 0.011361 OPTIMIZER_KEY_COPY_COST: 0.015685 OPTIMIZER_KEY_LOOKUP_COST: 0.435777 OPTIMIZER_KEY_NEXT_FIND_COST: 0.082347 OPTIMIZER_DISK_READ_RATIO: 0.020000 OPTIMIZER_ROW_COPY_COST: 0.060866 OPTIMIZER_ROW_LOOKUP_COST: 0.130839 OPTIMIZER_ROW_NEXT_FIND_COST: 0.045916 OPTIMIZER_ROWID_COMPARE_COST: 0.002653 OPTIMIZER_ROWID_COPY_COST: 0.002653
The above costs are the default for an engine that does not have a clustered index (like MyISAM, Aria etc).
An engine can tune it's own cost in the storage engine interface. Here follows the cost for the InnoDB storage engine.
select * from information_schema.optimizer_costs where engine="innodb"\G *************************** 1. row *************************** ENGINE: InnoDB OPTIMIZER_DISK_READ_COST: 10.240000 OPTIMIZER_INDEX_BLOCK_COPY_COST: 0.035600 OPTIMIZER_KEY_COMPARE_COST: 0.011361 OPTIMIZER_KEY_COPY_COST: 0.015685 OPTIMIZER_KEY_LOOKUP_COST: 0.791120 OPTIMIZER_KEY_NEXT_FIND_COST: 0.099000 OPTIMIZER_DISK_READ_RATIO: 0.020000 OPTIMIZER_ROW_COPY_COST: 0.060870 OPTIMIZER_ROW_LOOKUP_COST: 0.765970 OPTIMIZER_ROW_NEXT_FIND_COST: 0.070130 OPTIMIZER_ROWID_COMPARE_COST: 0.002653 OPTIMIZER_ROWID_COPY_COST: 0.002653
As can be seen, the
ROW_LOOKUP_COST is close to the
which is because InnoDB has clustered primary key indexes and is using it
to find the row from a secondary index.
Some engines, like
MEMORY implements their own cost
functions as different index in the same engine can have different costs. This is
why some of the cost numbers for these engines are 0.
There are also some SQL level costs that are independent of the storage engine:
MariaDB [test]> select * from information_schema.global_variables where variable_name like "%where%cost%" or variable_name like "%scan%cost%"; +---------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +---------------------------+----------------+ | OPTIMIZER_SCAN_SETUP_COST | 10.000000 | | OPTIMIZER_WHERE_COST | 0.032000 | +---------------------------+----------------+
Description of the different cost variables
Time and cost are quite interchangeable in the new cost model. Below we will use cost for most
things, except for
OPTIMIZER_DISK_READ_COST as one should use published/tested timeings for the SSD/harddisk if one wants to change the value..
|Engine||Time in microseconds to read a 4K block from a disk/SSD. The default is set for a 400MB/second ssd|
|Engine||Cost to lock and a copy a block from the global cache to a local cache. This cost is added for every block accessed, independent if they are cached or not|
|Engine||Cost to compare two keys|
|Engine||Cost to copy a key from the index to a local buffer as part of searching for a key|
|Engine||Cost to find a key entry in the index (index read)|
|Engine||Cost to find the next key in the index (index next)|
|Engine||The ratio of BLOCK_NOT_IN_CACHE/CACHE_READS. The cost of disk usage is calculated as |
|Engine||Cost of copying a row to a local buffer. Should be slightly more than |
|Engine||Cost to find a row based on the rowid (Rowid is stored in the index together with the key)|
|Engine||Cost of finding the next row|
|Engine||Cost of comparing two rowid's|
|Engine||Cost of copying a rowid from the index|
|Session||Cost of starting a table or index scan. This has a low value to encourage the optimizer to use index lookup also tables with very few rows|
|Session||Cost to execute the |
There is more information of the cost and how they where calculated can be found in the
Docs/optimizer_costs.txt file in the MariaDB Source distributions.
Other optimizer cost changes
- When counting disk accesses, we assume that all rows and index data are cached for the duration of the query. This is the avoid the following problem:
- table t1 with 1 million_rows is scanned
- For each row we do a lookup in table t2 with has only 10 rows
- table t1 with 1 million_rows is scanned
If we would count all lookups in t2, there would be 1 million lookups. If this would be the case the optimizer would choose to use a join cache on the rows in t1 and do a table scan over t2.
- Cost of sorting (filesort) is now more accurate, which allows the optimizer to better choose between index scan and filesort for
ORDER BY/GROUP BYqueries.
A lot of rule based cost has been changed to be cost based:
- The decision to use an index (and which index) for resolving
ORDER BY/GROUP BYwhere only partly cost based before.
- The old optimizer would limit the number of ‘expected key lookups’ to 10% of the number of rows. This would cause the optimizer to use an index to scan a big part of a table when a full table scan would be much faster. This code is now removed.
- InnoDB would limit the number of rows in a range to 50% of the total rows, which would confuse the optimizer for big ranges. The cap is now removed. -*If there was a usable filter for an index, it was sometimes used without checking the complete cost of the filter.
- ‘Aggregate distinct optimization with indexes’ is now cost based. This will change many queries from "Using index for group-by (scanning)” to “Using index for group-by”.
Other notable plan changes
- Indexes can now be used for
ORDER BY/GROUP BYin sub queries (instead of filesort)
- Derived tables and queries with
UNIONcan now create a distinct key (instead a key with duplicates) to speed up key accesses.
- Indexes with more used key parts are preferred if the number of resulting rows is the same:
WHERE key_part_1 = 1 and key_part_2 < 10
- This will now use a
RANGEover both key parts instead of using lookups on key_part_1.
- For very small tables, index lookup is preferred over table scan.
EXPLAINdoes not report "Using index" for scans using a clustered primary key as technically this a table scan.
When does the optimizer changes matter to you
The new improved optimizer should be able to find a better plan
- If you are using queries with more than two tables.
- If you have indexes with a lot of identical values.
- If you are using ranges that cover more than 10% of a table.
WHERE key between 1 and 1000 -- Table has values 1-2000
- If you have complex queries when not all used columns are or can be indexed.
- In which case you may need to depend on selectivity to get the right plan.
- If you are using queries mixing different storage engines.
- Like using tables from both InnoDB and Memory in the same query.
- If you have had to use
FORCE INDEXto get a good plan.
- If using
ANALYZE TABLEmade your plans worse (or not good enough).
- If your queries have lots of derived tables (subselects).
- You are using
ORDER BY / GROUP BYthat could be resolved via indexes.
All engine and “SQL level” cost variables can be changed via MariaDB startup options, in configuration files or dynamically using SQL.
In configuration files (and command line)
[[mariadbd]] # Archive is using a hard disk (typical seek is 8-10 ms) archive.OPTIMIZER_DISK_READ_COST=8000 # All other engines are using an SSD. OPTIMIZER_DISK_READ_COST=10.240000
# Tell optimizer to find a plan with as few accepted rows as possible SET SESSION OPTIMIZER_WHERE_COST=1.0; # Inform the optimizer that InnoDB buffer pool has a 80% hit rate SET GLOBAL innodb.OPTIMIZER_DISK_READ_RATIO=0.20;
- Note engine costs are
GLOBALwhile other costs can also be
- To keep things fast, engine specific costs are stored in the table definition (TABLE_SHARE). One affect if thus is that if one changes the cost for an engine, it will only take effect when new, not previously cached tables are accessed. You can use
FLUSH TABLESto force the table to use the new costs at next access.
Examples of the changing costs
OPTIMIZER_WHERE_COSTis added as a cost for for every 'accepted rows'. Increasing this variable will cause the optimizer to choose plans with less estimated rows.
- One can specify the kind of disk used by the system by changing
OPTIMIZER_DISK_READ_COST. This should be the time to do a random read of a 4096 byte block.
- The cost of a potential disk read is calculated as
OPTIMIZER_DISK_READ_COST * OPTIMIZER_DISK_READ_RATIO. Increasing
OPTIMIZER_DISK_READ_RATIOwill inform the optimizer that not all data is cached.
OPTIMIZER_SCAN_SETUP_COSTwill increase the cost of table scan. One can increase this to avoid using table scans.
For storage engine developers
The costs for an engine are set the following way when the engine plugin is loaded/initialized:
- Copy the "default" storage engine costs to the plugin engine costs.
points to the cost data.
handlerton->update_optimizer_costs()to let the storage engine update the costs.
- Apply all user change costs (from configuration files/startup) to the engine costs.
- When a TABLE_SHARE is created, the cost are copied from
handlerton->coststo the TABLE_SHARE. This is done to avoid any "cost" mutex while running queries.
- User changes to engine costs are stored in the data pointed to by
handlerton->costs. This is why a
FLUSH TABLESis needed to activate new engine costs.
- To speed up cost access for the optimizer,
handler::set_optimizer_costs()is called for each query to copy
OPTIMIZER_SCAN_SETUP_COSTto the engine cost structure.