1 of 100

HA & Performance

Optimize MariaDB Server for high availability and performance. Learn about replication, clustering, load balancing, and configuration tuning for robust and efficient database solutions.

Optimization and Tuning

Optimize MariaDB Server for high availability and performance. Learn about replication, clustering, load balancing, and configuration tuning for robust and efficient database solutions.

Covers essential configurations to maximize throughput and responsiveness for your database workloads.

Delves into how the database engine enhances query execution, data storage, and overall performance through its core architecture.

Buffers, Caches and Threads

Optimize MariaDB Server performance by tuning buffers, caches, and threads. This section covers essential configurations to maximize throughput and responsiveness for your database workloads.

Thread Pool

Optimize MariaDB Server with the thread pool. This section explains how to manage connections and improve performance by efficiently handling concurrent client requests, reducing resource overhead.

Thread Pool in MariaDB 5.1 - 5.3

This page describes the old thread pool implementation in MariaDB up to version 5.3.

It's left here because some refer to it.

For the current implementation, refer to the Thread Pool in MariaDB page.

About Pool of Threads

This is an extended version of the pool-of-threads code from MySQL 6.0. This allows you to use a limited set of threads to handle all queries, instead of the old 'one-thread-per-connection' style. In recent times, its also been referred to as "thread pool" or "thread pooling" as this feature (in a different implementation) is available in Enterprise editions of MySQL (not in the Community edition).

This can be a very big win if most of your queries are short running queries and there are few table/row locks in your system.

Instructions

To enable pool-of-threads you must first run configure with the--with-libevent option. (This is automatically done if you use any 'max' scripts in the BUILD directory):

When starting mysqld with the pool of threads code you should use:

Default values are:

One issue with pool-of-threads is that if all worker threads are doing work (like running long queries) or are locked by a row/table lock no new connections can be established and you can't login and find out what's wrong or login and kill queries.

To help this, we have introduced two new options for mysqld; and :

If is <> 0 then you can connect max_connections number of normal threads + 1 extra SUPER user through the 'extra-port' TCP/IP port. These connections use the old one-thread-per-connection method.

To connect with through the extra port, use:

This allows you to freely use, on connection bases, the optimal connection/thread model.

Thread States

Understand MariaDB Server thread states. This section explains the different states a thread can be in, helping you monitor and troubleshoot query execution and server performance.

Delayed Insert Connection Thread States

This article documents thread states that are related to the connection thread that processes statements.

These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .

Value

Description

Delayed Insert Handler Thread States

This article documents thread states that are related to the handler thread that inserts the results of INSERT DELAYED statements.

These correspond to the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

Value

Description

insert

About to insert rows into the table.

reschedule

Sleeping in order to let other threads function, after inserting a number of rows into the table.

_{This page is licensed: CC BY-SA / Gnu FDL}

Event Scheduler Thread States

This article documents thread states that are related to scheduling and execution. These include the Event Scheduler thread, threads that terminate the Event Scheduler, and threads for executing events.

These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the

Value

Description

Master Thread States

This article documents thread states that are related to replication master threads. These correspond to the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

Value

Description

Finished reading one binlog; switching to next binlog

After completing one , the next is being opened for sending to the slave.

Master has sent all binlog to slave; waiting for binlog to be updated

All events have been read from the binary logs and sent to the slave. Now waiting for the binary log to be updated with new events.

Sending binlog event to slave

_{This page is licensed: CC BY-SA / Gnu FDL}

Query Cache Thread States

This article documents thread states that are related to the . These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .

Value

Description

Replica I/O Thread States

This article documents thread states that are related to replica I/O threads. These correspond to the Slave_IO_State shown by SHOW REPLICA STATUS and the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

Value

Description

Checking master version

Checking the primary's version, which only occurs very briefly after establishing a connection with the primary.

Connecting to master

Attempting to connect to primary.

_{This page is licensed: CC BY-SA / Gnu FDL}

Replica Connection Thread States

This article documents thread states that are related to connection threads that occur on a replicatioin replica. These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .

Value

Description

Replica SQL Thread States

This article documents thread states that are related to slave SQL threads. These correspond to the Slave_SQL_State shown by as well as the STATE values listed by the statement and the as well as the PROCESSLIST_STATE value listed in the .

Value

Description

MariaDB Internal Optimizations

Explore MariaDB Server's internal optimizations. This section delves into how the database engine enhances query execution, data storage, and overall performance through its core architecture.

Fair Choice Between Range and Index_merge Optimizations

index_merge is a method used by the optimizer to retrieve rows from a single table using several index scans. The results of the scans are then merged.

When using , if index_merge is the plan chosen by the optimizer, it will show up in the "type" column. For example:

The "rows" column gives us a way to compare efficiency between index_merge and other plans.

It is sometimes necessary to discard index_merge in favor of a different plan to avoid a combinatorial explosion of possible range and/or index_merge strategies. But, the old logic in MySQL for when index_merge was rejected caused some good index_merge plans to not even be considered. Specifically, additional AND predicates in WHERE clauses could cause an index_merge plan to be rejected in favor of a less efficient plan. The slowdown could be anywhere from 10x to over 100x. Here are two examples (based on the previous query) using MySQL:

Operating System Optimizations

Optimize MariaDB Server performance with operating system tuning. This section covers configuring your OS for improved I/O, memory management, and network settings to maximize database efficiency.

Filesystem Optimizations

Suitability of Filesystems

The filesystem is not the most important aspect of MariaDB performance. More important are the available memory (RAM), the drive speed, and the system variable settings (see Hardware Optimization and System Variables).

Optimizing the filesystem can, however, make a noticeable difference in some cases. Among the best suited Linux filesystems are ext4, XFS and Btrfs. They are all included in the mainline Linux kernel and are widely supported and available on most Linux distributions.

The following theoretical file size and filesystem size limits apply to those filesystems:

Limit

ext4

XFS

Btrfs

Each has unique characteristics that are worth understanding to get the most from their usage.

Disabling Access Time

It's unlikely you'll need to record file access time on a database server, and mounting your filesystem with this disabled can give an easy improvement in performance. To do so, use the noatime option.

If you want to keep access time for or other system files, these can be stored on a separate drive.

Using NFS

Generally, we recommend not to use (Network File System) with MariaDB, for these reasons:

MariaDB data and log files on NFS volumes can become locked and unavailable for use. Locking issues may occur in cases where multiple instances of MariaDB access the same data directory, or when MariaDB is shut down improperly, for instance, due to a power outage. In particular, sharing a data directory among MariaDB instances is not recommended.
Data inconsistencies due to messages received out of order or lost network traffic. To avoid this issue, use TCP with hard and intr mount options.

Using NFS within a professional SAN environment or other storage system tends to offer greater reliability than using NFS outside of such an environment. However, NFS within a SAN environment may be slower than directly attached or bus-attached non-rotational storage.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimization and Indexes

Optimize MariaDB Server queries with indexes. This section covers index types, creation, and best practices for leveraging them to significantly improve query performance and data retrieval speed.

Ignored Indexes

Ignored indexes allow indexes to be visible and maintained without being used by the optimizer. This feature is comparable to MySQL 8’s "invisible indexes."

This feature is available from MariaDB 10.6.

Ignored indexes are indexes that are visible and maintained, but which are not used by the optimizer. MySQL 8 has a similar feature which they call "invisible indexes".

Syntax

Index Statistics

Index statistics provide crucial insights to the MariaDB query optimizer, guiding it in executing queries efficiently. Up-to-date index statistics ensure optimized query performance.

How Index Statistics Help the Query Optimizer

Understanding index statistics is crucial for the MariaDB query optimizer to efficiently execute queries. Accurate and current statistics guide the optimizer in choosing the best way to access data, similar to using a personal address book for quicker searches rather than a larger phone book. Up-to-date index statistics ensure optimized query performance.

Storage Engine Index Types

This refers to the index_type definition when creating an index, i.e. BTREE, HASH or RTREE.

For more information on general types of indexes, such as primary keys, unique indexes etc, go to Getting Started with Indexes.

Storage Engine

Permitted Indexes

BTREE, RTREE

BTREE

BTREE is generally the default index type. For tables, HASH is the default. uses a particular data structure called fractal trees, which is optimized for data that do not entirely fit memory.

Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes. B-Tree Index Characteristics

B-tree Indexes

B-tree indexes are used for column comparisons using the >, >=, =, >=, < or BETWEEN operators, as well as for LIKE comparisons that begin with a constant.

For example, the query SELECT * FROM Employees WHERE First_Name LIKE 'Maria%'; can make use of a B-tree index, while SELECT * FROM Employees WHERE First_Name LIKE '%aria'; cannot.

B-tree indexes also permit leftmost prefixing for searching of rows.

If the number or rows doesn't change, hash indexes occupy a fixed amount of memory, which is lower than the memory occupied by BTREE indexes.

Hash Indexes

Hash indexes, in contrast, can only be used for equality comparisons, so those using the = or <=> operators. They cannot be used for ordering, and provide no information to the optimizer on how many rows exist between two values.

Hash indexes do not permit leftmost prefixing - only the whole index can be used.

R-tree Indexes

See for more information.

_{This page is licensed: CC BY-SA / Gnu FDL}

Full-Text Indexes

Implement full-text indexes in MariaDB Server for efficient text search. This section guides you through creating and utilizing these indexes to optimize queries on large text datasets.

Optimizer Hints

This section details special comments you can add to SQL statements to influence the query optimizer, helping you manually select better execution plans for improved performance and query tuning.

Optimizer hints are options available that affect the execution plan.

have been in MariaDB for a long time, while were introduced in MariaDB 12.0 and 12.1.

Compression

Optimize MariaDB Server performance and storage with compression. This section details how to apply data compression at various levels to reduce disk space and improve I/O efficiency.

Compression Plugins

MariaDB starting with

Compressions plugins were added in a preview release.

The various MariaDB storage engines, such as InnoDB, RocksDB, Mroonga, can use different compression libraries.

Before , each separate library would have to be compiled in order to be available for use, resulting in numerous runtime/rpm/deb dependencies, most of which would never be used by users.

From , five additional MariaDB compression libraries (besides the default zlib) are available as plugins (note that these affect InnoDB and Mroonga only; RocksDB still uses the compression algorithms from its own library):

bzip2
lzma
lz4
lzo
snappy

Installing

Depending on how MariaDB was installed, the libraries may already be available for installation, or may first need to be installed as .deb or .rpm packages, for example:

Once available, , for example:

The compression algorithm can then be used, for example, in :

Upgrading

When upgrading from a release without compression plugins, if a non-zlib compression algorithm was used, those tables will be unreadable until the appropriate compression library is installed. should be run. The --force option (to run ) or mariadb-check itself will indicate any problems with compression, for example:

In this case, the appropriate compression plugin should be installed, and the server restarted.

Optimizing Data Structure

Optimize MariaDB Server performance by refining your data structure. This section covers schema design, data types, and normalization techniques to improve query efficiency and storage utilization.

Numeric vs String Fields

A large numeric value is stored in far fewer bytes than the equivalent string value. It is therefore faster to move and compare numeric data, so it's best to choose numeric columns for unique id's and other similar fields.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizing MEMORY Tables

MEMORY tables are a good choice for data that needs to be accessed often, and is rarely updated. Being in memory, it's not suitable for critical data or for storage, but if data can be moved to memory for reading without needing to be regenerated often, if at all, it can provide a significant performance boost.

The MEMORY Storage Engine has a key feature in that it permits its indexes to be either B-tree or Hash. Choosing the best index type can lead to better performance. See Storage Engine index types for more on the characteristics of each index type.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizing String and Character Fields

Comparing String Columns

When values from different columns are compared, the comparison runs more quickly when the columns are of the same character set and collation. If they are different, the strings need to be converted while the query runs. So, where possible, declare string columns using the same character set and collation when you may need to compare them.

VARCHAR vs BLOB

ORDER BY and GROUP BY clauses can generate temporary tables in memory (see ) if the original table doesn't contain any BLOB fields. If a column is less than 8KB, you can make use of a Binary VARCHAR rather than a BLOB.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizing Tables

Optimize tables for enhanced performance. This section covers various techniques, including proper indexing, data types, and storage engine choices, to improve query speed and efficiency.

Optimizing Queries

Optimize queries for peak performance. This section provides techniques for writing efficient SQL, understanding query execution plans, and leveraging indexes effectively to speed up your queries.

Aborting Statements that Exceed a Certain Time to Execute

Overview

introduced the system variable. When set to a non-zero value, the server attempts to abort any queries taking longer than this time in seconds.

The abortion is not immediate; the server checks the timer status at specific intervals during execution. Consequently, a query may run slightly longer than the specified time before being detected and stopped.

DISTINCT removal in aggregate functions

Basics

One can use DISTINCT keyword to de-duplicate the arguments of an aggregate function. For example:

In order to compute this, MariaDB has to collect the values of col1 and remove the duplicates. This may be computationally expensive.

After the fix for MDEV-30660 (available from , MariaDB 10.6.18, MariaDB 10.11.8, , , , MariaDB 11.4.2), the optimizer can detect certain cases when argument of aggregate function will not have duplicates and so de-duplication can be skipped.

When one can skip de-duplication

A basic example: if we're doing a select from one table, then the values of primary_key are already distinct:

If the SELECT has other constant tables, that's also ok, as they will not create duplicates.

The next step: a part of the primary key can be "bound" by the GROUP BY clause. Consider a query:

Suppose the table has PRIMARY KEY(pk1, pk2). Grouping by pk2 fixes the value of pk2 within each group. Then, the values of pk1 must be unique within each group, and de-duplication is not necessary.

Observability

EXPLAIN or EXPLAIN FORMAT=JSON do not show any details about how aggregate functions are computed. One has to look at the Optimizer Trace. Search for aggregator_type:

When de-duplication is necessary, it will show:

When de-duplication is not necessary, it will show:

_{This page is licensed: CC BY-SA / Gnu FDL}

Equality propagation optimization

Basic idea

Consider a query with a WHERE clause:

the WHERE clause will compute to true only if col1=col2. This means that in the rest of the WHERE clause occurrences of col1 can be substituted with col2 (with some limitations which are discussed in the next section). This allows the optimizer to infer additional restrictions.

For example:

allows to infer a new equality:

FORCE INDEX

Description

Forcing an index to be used is mostly useful when the optimizer decides to do a table scan even if you know that using an index would be better. (The optimizer could decide to do a table scan even if there is an available index when it believes that most or all rows will match and it can avoid the overhead of using the index).

FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.

FORCE INDEX cannot force an ignored index to be used - it will be treated as if it doesn't exist.

Example

This produces:

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

hash_join_cardinality optimizer_switch Flag

MariaDB starting with 10.6.13

The hash_join_cardinality optimizer_switch flag was added in , MariaDB 10.11.3, , , and MariaDB 10.6.13.

In MySQL and MariaDB, the output cardinality of a part of query has historically been tied to the used access method(s). This is different from the approach used in database textbooks. There, the cardinality "x JOIN y" is the same regardless of which access methods are used to compute it.

Example

Consider a query joining customers with their orders:

Suppose, table orders has an index IDX on orders.customer_id.

If the query plan is using this index to fetch orders for each customer, the optimizer will use index statistics from IDX to estimate the number of rows in the customer-joined-with-orders.

On the other hand, if the optimizer considers a query plan that joins customer with orders without use of indexes, it will ignore the customer.id = orders.customer_id equality completely and will compute the output cardinality as if customer was cross-joined with orders.

Hash Join

MariaDB supports . It is not enabled by default, one needs to set it to 3 or a bigger value to enable it.

Before , Query optimization for Block Hash Join would work as described in the above example: It would assume that the join operation is a cross join.

introduces a new flag, hash_join_cardinality. In MariaDB versions before 11.0, it is off by default.

If one sets it to ON, the optimizer will make use of column histograms when computing the cardinality of hash join operation output.

One can see the computation in the Optimizer Trace, search for hash_join_cardinality.

_{This page is licensed: CC BY-SA / Gnu FDL}

IGNORE INDEX

Syntax

Description

You can tell the optimizer to not consider a particular index with the IGNORE INDEX option.

The benefit of using IGNORE_INDEX instead of is that it will not disable a new index which you may add later.

index_merge sort_intersection

Prior to , the index_merge access method supported union,sort-union, and intersection operations. Starting from , thesort-intersection operation is also supported. This allows the use ofindex_merge in a broader number of cases.

This feature is disabled by default. To enable it, turn on the optimizer switch index_merge_sort_intersection like so:

Limitations of index_merge/intersection

Prior to , the index_merge access method had one intersection strategy called intersection. That strategy can only be used when merged index scans produced rowid-ordered streams. In practice this means that anintersection could only be constructed from equality (=) conditions.

For example, the following query will use intersection:

but if you replace OriginState ='CA' with OriginState IN ('CA', 'GB') (which matches the same number of records), then intersection is not usable anymore:

The latter query would also run 5.x times slower (from 2.2 to 10.8 seconds) in our experiments.

How index_merge/sort_intersection improves the situation

In , when index_merge_sort_intersection is enabled,index_merge intersection plans can be constructed from non-equality conditions:

In our tests, this query ran in 3.2 seconds, which is not as good as the case with two equalities, but still much better than 10.8 seconds we were getting without sort_intersect.

The sort_intersect strategy has higher overhead than intersect but is able to handle a broader set of WHERE conditions.

When to Use

index_merge/sort_intersection works best on tables with lots of records and where intersections are sufficiently large (but still small enough to make a full table scan overkill).

The benefit is expected to be bigger for io-bound loads.

_{This page is licensed: CC BY-SA / Gnu FDL}

LIMIT ROWS EXAMINED

Syntax

Similar to the parameters of LIMIT, rows_limit can be both a prepared statement parameter, or a stored program parameter.

Description

The purpose of this optimization is to provide the means to terminate the execution of statements which examine too many rows, and thus use too many resources. This is achieved through an extension of the clause —LIMIT ROWS EXAMINED number_of_rows. Whenever possible the semantics of LIMIT ROWS EXAMINED is the same as that of normal LIMIT (for instance for aggregate functions).

The LIMIT ROWS EXAMINED clause is taken into account by the query engine only during query execution. Thus the clause is ignored in the following cases:

If a query is -ed.
During query optimization.
During auxiliary operations such as writing to system tables (e.g. logs).

The clause is not applicable to or statements, and if used in those statements produces a syntax error.

The effects of this clause are as follows:

The server counts the number of read, inserted, modified, and deleted rows during query execution. This takes into account the use of temporary tables, and sorting for intermediate query operations.
Once the counter exceeds the value specified in the LIMIT ROWS EXAMINED clause, query execution is terminated as soon as possible.
The effects of terminating the query because of LIMIT ROWS EXAMINED are as follows:
- The result of the query is a subset of the complete query, depending on when the query engine detected that the limit was reached. The result may be empty if no result rows could be computed before reaching the limit.

The LIMIT ROWS EXAMINED clause cannot be specified on a per-subquery basis. There can be only one LIMIT ROWS EXAMINED clause for the whole SELECT statement. If a SELECT statement contains several subqueries with LIMIT ROWS EXAMINED, the one that is parsed last is taken into account.

Examples

A simple example of the clause is:

The LIMIT ROWS EXAMINED clause is global for the whole statement.

If a composite query (such as UNION, or query with derived tables or with subqueries) contains more than one LIMIT ROWS EXAMINED, the last one parsed is taken into account. In this manner either the last or the outermost one is taken into account. For instance, in the query:

The limit that is taken into account is 11, not 0.

_{This page is licensed: CC BY-SA / Gnu FDL}

not_null_range_scan Optimization

The NOT NULL range scan optimization enables the optimizer to construct range scans from NOT NULL conditions that it was able to infer from the WHERE clause.

The optimization appeared in . It is not enabled by default; one needs to set an optimizer_switch flag to enable it.

Description

A basic (but slightly artificial) example:

The WHERE condition in this form cannot be used for range scans. However, one can infer that it will reject rows that NULL for weight. That is, infer an additional condition that

optimizer_join_limit_pref_ratio Optimization

Basics

Off (0) by default, when enabling this optimization, MariaDB will consider a join order that may shorten query execution time based on the ORDER BY ... LIMIT n clause. For small values of n, this may improve performance.

Set the value of optimizer_join_limit_pref_ratio to a non-zero value to enable this option (higher values are more conservative, recommended value is 100), or set to 0 (the default value) to disable it.

Detailed description

Problem setting

By default, the MariaDB optimizer picks a join order without considering the ORDER BY ... LIMIT clause, when present.

For example, consider a query looking at latest 10 orders together with customers who made them:

The two possible plans are:customer->orders:

and orders->customer:

The customer->orders plan computes a join between all customers and orders, saves that result into a temporary table, and then uses filesort to get the 10 most recent orders. This query plan doesn't benefit from the fact that just 10 orders are needed.

However, in contrast, the orders->customers plan uses an index to read rows in the ORDER BY order. The query can stop execution once it finds 10 order-and-customer combinations. This is much faster than computing the entire join. Under this plan, and when this new optimization, we can leverage ORDER BY ... LIMIT to stop early, when we have the 10 combinations.

Plans with LIMIT shortcuts are difficult to estimate

It is fundamentally difficult to produce a reliable estimate for ORDER BY ... LIMIT shortcuts. Let's take an example from the previous section to see why. This query searches for last 10 orders that were shipped by air:

Suppose we know beforehand that 50% of orders are shipped by air. Assuming there's no correlation between date and shipping method, orders->customer plan will need to scan 20 orders before we find 10 that are shipped by air. But if there is correlation, then we may need to scan up to (total_orders*0.5 + 10) before we find first 10 orders that are shipped by air. Scanning about 50% of all orders can be expensive.

This situation worsens when the query has constructs whose selectivity is not known. For example, suppose the WHERE condition was

in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.

Providing guidelines to the optimizer

Due to these challenges, the optimization is not enabled by default.

When running a mostly OLTP workload such that query WHERE conditions have suitable indexes or are not very selective, then any ORDER BY ... LIMIT queries will typically find matching rows quickly. In this case, it makes sense to give the following guidance to the optimizer:

The value of X is given to the optimizer via optimizer_join_limit_pref_ratio setting. Higher values carry less risk. The recommended value is 100: prefer the LIMIT join order if it promises at least 100x speedup.

References

introduces optimizer_join_limit_pref_ratio optimization
is about future development that would make the optimizer handle such cases without user guidance.

_{This page is licensed: CC BY-SA / Gnu FDL}

Query Limits and Timeouts

This article describes the different methods MariaDB provides to limit/timeout a query:

LIMIT

The LIMIT clause restricts the number of returned rows.

Stops the query after 'rows_limit' number of rows have been examined.

sql_safe_updates

If the variable is set, one can't execute an or statement unless one specifies a key constraint in the WHERE clause or provide a LIMIT clause (or both).

sql_select_limit

acts as an automatic LIMIT row_count to any query.

The above is the same as:

max_join_size

If the variable (also called sql_max_join_size) is set, then it will limit any SELECT statements that probably need to examine more thanMAX_JOIN_SIZE rows.

max_statement_time

If the variable is set, any query (excluding stored procedures) taking longer than the value of max_statement_time (specified in seconds) to execute will be aborted. This can be set globally, by session, as well as per user and per query. See .

Rollup Unique User Counts

The Problem

The normal way to count "Unique Users" is to take large log files, sort by userid, dedup, and count. This requires a rather large amount of processing. Furthermore, the count derived cannot be rolled up. That is, daily counts cannot be added to get weekly counts -- some users will be counted multiple times.

So, the problem is to store the counts is such a way as to allow rolling up.

Sargable DATE and YEAR

Starting from , conditions in the form

are sargable, provided that

CMP is any of =, <=>, <, <=, >, >= .
indexed_date_col has a type of DATE, DATETIME or TIMESTAMP and is a part of some index.

One can swap the left and right hand sides of the equality: const_value CMP {DATE|YEAR}(indexed_date_col) is also handled.

Sargable here means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or use them to perform partition pruning.

Implementation

Internally, the optimizer rewrites the condition to an equivalent condition which doesn't use YEAR or DATE functions.

For example, YEAR(date_col)=2023 is rewritten intodate_col between '2023-01-01' and '2023-12-31'.

Similarly, DATE(datetime_col) <= '2023-06-01' is rewritten intodatetime_col <= '2023-06-01 23:59:59'.

Controlling the Optimization

The optimization is always ON, there is no Optimizer Switch flag to control it.

Optimizer Trace

The rewrite is logged as date_conds_into_sargable transformation. Example:

References

: Allow index usage for DATE(datetime_column) = const

_{This page is licensed: CC BY-SA / Gnu FDL}

USE INDEX

You can limit which indexes are considered with the USE INDEX option.

Syntax

Description

Optimization Strategies

Discover effective optimization strategies for MariaDB Server queries. This section provides a variety of techniques and approaches to enhance query performance and overall database efficiency.

Improvements to ORDER BY Optimization

Available tuning for ORDER BY with small LIMIT

In 2024, fix for MDEV-34720 has added optimizer_join_limit_pref_ratio optimization into MariaDB starting from 10.6. It allows one to enable extra optimization for ORDER BY with small LIMIT.

Older optimizations

MariaDB brought several improvements to the optimizer.

The fixes were made as a response to complaints by MariaDB customers, so they fix real-world optimization problems. The fixes are a bit hard to describe (as the ORDER BY optimizer is complicated), but here's a short description:

The optimizer:

Doesn’t make stupid choices when several multi-part keys and potential range accesses are present ().
- This also fixes .
Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT ().

Extra optimizations

The optimizer takes multiple-equalities into account (). This optimization is enabled by default.

Comparison with MySQL 5.7

In , one can find this passage:

Make switching of index due to small limit cost-based () : We have made the decision in make_join_select() of whether to switch to a new index in order to support "ORDER BY ... LIMIT N" cost-based. This work fixes Bug#73837.

MariaDB is not using Oracle's fix (we believe make_join_select is not the right place to do ORDER BY optimization), but the effect is the same.

LooseScan Strategy

LooseScan is an execution strategy for .

The idea

We will demonstrate the LooseScan strategy by example. Suppose, we're looking for countries that have satellites. We can get them using the following query (for the sake of simplicity we ignore satellites that are owned by consortiums of multiple countries):

Suppose, there is an index on Satellite.country_code. If we use that index, we will get satellites in the order of their owner country:

Optimizations for Derived Tables

Optimize derived tables in MariaDB Server queries. This section provides techniques and strategies to improve the performance of subqueries and complex joins, enhancing overall query efficiency.

Data Warehousing High Speed Ingestion

The problem

You are ingesting lots of data. Performance is bottlenecked in the INSERT area.

This will be couched in terms of Data Warehousing, with a huge Fact table and Summary (aggregation) tables.

Overview of solution

Have a separate staging table.
Inserts go into Staging.
Normalization and Summarization reads Staging, not Fact.
After normalizing, the data is copied from Staging to Fact.

Staging is one (or more) tables in which the data lives only long enough to be handed off to Normalization, Summary, and the Fact tables.

Since we are probably talking about a billion-row table, shrinking the width of the Fact table by normalizing (as mentioned here). Changing an to a will save a GB. Replacing a string by an id (normalizing) saves many GB. This helps disk space and cacheability, hence speed.

Injection speed

Some variations:

Big dump of data once an hour, versus continual stream of records.
The input stream could be single-threaded or multi-threaded.
You might have 3rd party software tying your hands.

Generally the fastest injection rate can be achieved by "staging" the INSERTs in some way, then batch processing the staged records. This blog discusses various techniques for staging and batch processing.

Normalization

Let's say your Input has a host_name column, but you need to turn that into a smaller host_id in the Fact table. The "Normalization" table, as I call it, looks something like

Here's how you can use Staging as an efficient way achieve the swap from name to id.

Staging has two fields (for this normalization example):

Meawhile, the Fact table has:

SQL #1 (of 2):

By isolating this as its own transaction, we get it finished in a hurry, thereby minimizing blocking. By saying IGNORE, we don't care if other threads are 'simultaneously' inserting the same host_names.

There is a subtle reason for the LEFT JOIN. If, instead, it were INSERT IGNORE..SELECT DISTINCT, then the INSERT would preallocate auto_increment ids for as many rows as the SELECT provides. This is very likely to "burn" a lot of ids, thereby leading to overflowing MEDIUMINT unnecessarily. The LEFT JOIN leads to finding just the new ids that are needed (except for the rare possibility of a 'simultaneous' insert by another thread). More rationale:

SQL #2:

This gets the IDs, whether already existing, set by another thread, or set by SQL #1.

If the size of Staging changes depending on the busy versus idle times of the day, this pair of SQL statements has another comforting feature. The more rows in Staging, the more efficient the SQL runs, thereby helping compensate for the "busy" times.

The companion folds SQL #2 into the INSERT INTO Fact. But you may need host_id for further normalization steps and/or Summarization steps, so this explicit UPDATE shown here is often better.

Flip-flop staging

The simple way to stage is to ingest for a while, then batch-process what is in Staging. But that leads to new records piling up waiting to be staged. To avoid that issue, have 2 processes:

one process (or set of processes) for INSERTing into Staging;
one process (or set of processes) to do the batch processing (normalization, summarization).

To keep the processes from stepping on each other, we have a pair of staging tables:

Staging is being INSERTed into;
StageProcess is one being processed for normalization, summarization, and moving to the Fact table. A separate process does the processing, then swaps the tables:

This may not seem like the shortest way to do it, but has these features:

The DROP + CREATE might be faster than TRUNCATE, which is the desired effect.
The RENAME is atomic, so the INSERT process(es) never find that Staging is missing.

A variant on the 2-table flip-flop is to have a separate Staging table for each Insertion process. The Processing process would run around to each Staging in turn.

A variant on that would be to have a separate processing process for each Insertion process.

The choice depends on which is faster (insertion or processing). There are tradeoffs; a single processing thread avoids some locks, but lacks some parallelism.

Engine choice

Fact table -- , if for no other reason than that a system crash would not need a REPAIR TABLE. (REPAIRing a billion-row table can take hours or days.)

Normalization tables -- InnoDB, primarily because it can be done efficiently with 2 indexes, whereas, MyISAM would need 4 to achieve the same efficiency.

Staging -- Lots of options here.

If you have multiple Inserters and a single Staging table, InnoDB is desirable due to row-level, not table-level, locking.
MEMORY may be the fastest and it avoids I/O. This is good for a single staging table.
For multiple Inserters, a separate Staging table for each Inserter is desired.
For multiple Inserters into a single Staging table, InnoDB may be faster. (MEMORY does table-level locking.)

Confused? Lost? There are enough variations in applications that make it impractical to predict what is best. Or, simply good enough. Your ingestion rate may be low enough that you don't hit the brick walls that I am helping you avoid.

Should you do "CREATE TEMPORARY TABLE"? Probably not. Consider Staging as part of the data flow, not to be DROPped.

Summarization

This is mostly covered here: Summarize from the Staging table instead of the Fact table.

Replication Issues

Row Based Replication (RBR) is probably the best option.

The following allows you to keep more of the Ingestion process in the Master, thereby not bogging down the Slave(s) with writes to the Staging table.

RBR
Staging is in a separate database
That database is not replicated (binlog-ignore-db on Master)
In the Processing steps, USE that database, reach into the main db via syntax like "MainDb.Hosts". (Otherwise, the binlog-ignore-db does the wrong thing.)

That way

Writes to Staging are not replicated.
Normalization sends only the few updates to the normalization tables.
Summarization sends only the updates to the summary tables.
Flip-flop does not replicate the DROP, CREATE or RENAME.

Sharding

You could possibly spread the data you are trying ingest across multiple machines in a predictable way (sharding on hash, range, etc). Running "reports" on a sharded Fact table is a challenge unto itself. On the other hand, Summary Tables rarely get too big to manage on a single machine.

For now, Sharding is beyond the scope of this blog.

Push me vs pull me

I have implicitly assumed the data is being pushed into the database. If, instead, you are "pulling" data from some source(s), then there are some different considerations.

Case 1: An hourly upload; run via cron

Grab the upload, parse it
Put it into the Staging table
Normalize -- each SQL in its own transaction (autocommit)
BEGIN

If you need parallelism in Summarization, you will have to sacrifice the transactional integrity of steps 4-7.

Caution: If these steps add up to more than an hour, you are in deep dodo.

Case 2: You are polling for the data

It is probably reasonable to have multiple processes doing this, so it will be detailed about locking.

Create a Staging table for this polling processor. Loop:
With some locked mechanism, decide which 'thing' to poll.
Poll for the data, pull it in, parse it. (Potentially polling and parsing are significantly costly)
Put it into the process-specific Staging table

iblog_file_size should be larger than the change in the STATUS "Innodb_os_log_written" across the BEGIN...COMMIT transaction (for either Case).

Thread Groups in the Unix Implementation of the Thread Pool

This article does not apply to the thread pool implementation on Windows. On Windows, MariaDB uses a native thread pool created with the CreateThreadpool APl, which has its own methods to distribute threads between CPUs.

On Unix, the thread pool implementation uses objects called thread groups to divide up client connections into many independent sets of threads. The thread_pool_size system variable defines the number of thread groups on a system. Generally speaking, the goal of the thread group implementation is to have one running thread on each CPU on the system at a time. Therefore, the default value of the thread_pool_size system variable is auto-sized to the number of CPUs on the system.

When setting the thread_pool_size system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting its value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher. It can be changed dynamically with SET GLOBAL. For example:

It can also be set in a server option group in an option file prior to starting up the server. For example:

If you do not want MariaDB to use all CPUs on the system for some reason, then you can set it to a lower value than the number of CPUs. For example, this would make sense if the MariaDB Server process is limited to certain CPUs with the utility on Linux.

If you set the value to the number of CPUs and if you find that the CPUs are still underutilized, then try increasing the value.

The system variable tends to have the most visible performance effect. It is roughly equivalent to the number of threads that can run at the same time. In this case, run means use CPU, rather than sleep or wait. If a client connection needs to sleep or wait for some reason, then it wakes up another client connection in the thread group before it does so.

One reason that CPU underutilization may occur in rare cases is that the thread pool is not always informed when a thread is going to wait. For example, some waits, such as a page fault or a miss in the OS buffer cache, cannot be detected by MariaDB.

Distributing Client Connections Between Thread Groups

When a new client connection is created, its thread group is determined using the following calculation:

The connection_id value in the above calculation is the same monotonically increasing number that you can use to identify connections in output or the table.

This calculation should assign client connections to each thread group in a round-robin manner. In general, this should result in an even distribution of client connections among thread groups.

Types of Threads

Thread Group Threads

Thread groups have two different kinds of threads: a listener thread and worker threads.

A thread group's worker threads actually perform work on behalf of client connections. A thread group can have many worker threads, but usually, only one will be actively running at a time. This is not always the case. For example, the thread group can become oversubscribed if the thread pool's timer thread detects that the thread group is stalled. This is explained more in the sections below.
A thread group's listener thread listens for I/O events and distributes work to the worker threads. If it detects that there is a request that needs to be worked on, then it can wake up a sleeping worker thread in the thread group, if any exist. If the listener thread is the only thread in the thread group, then it can also create a new worker thread. If there is only one request to handle, and if the system variable is not enabled, then the listener thread can also become a worker thread and handle the request itself. This helps decrease the overhead that may be introduced by excessively waking up sleeping worker threads and excessively creating new worker threads.

Global Threads

The thread pool has one global thread: a timer thread. The timer thread performs tasks, such as:

Checks each thread group for stalls.
Ensures that each thread group has a listener thread.

Thread Creation

A new thread is created in a thread group in the scenarios listed below.

In all of the scenarios below, the thread pool implementation prefers to wake up a sleeping worker thread that already exists in the thread group, rather than to create a new thread.

Worker Thread Creation by Listener Thread

A thread group's listener thread can create a new worker thread when it has more client connection requests to distribute, but no pre-existing worker threads are available to work on the requests. This can help to ensure that the thread group always has enough threads to keep one worker thread active at a time.

A thread group's listener thread creates a new worker thread if all of the following conditions are met:

The listener thread receives a client connection request that needs to be worked on.
There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads, so the listener thread should not become a worker thread.
There are no active worker threads in the thread group.
There are no sleeping worker threads in the thread group that the listener thread can wake up.

Thread Creation by Worker Threads During Waits

A thread group's worker thread can create a new worker thread when the thread has to wait on something, and the thread group has more client connection requests queued, but no pre-existing worker threads are available to work on them. This can help to ensure that the thread group always has enough threads to keep one worker thread active at a time. For most workloads, this tends to be the primary mechanism that creates new worker threads.

A thread group's worker thread creates a new thread if all of the following conditions are met:

The worker thread has to wait on some request. For example, it might be waiting on disk I/O, or it might be waiting on a lock, or it might just be waiting for a query that called the function to finish.
There are no active worker threads in the thread group.
There are no sleeping worker threads in the thread group that the worker thread can wake up.
And one of the following conditions is also met:

Listener Thread Creation by Timer Thread

The thread pool's timer thread can create a new listener thread for a thread group when the thread group has more client connection requests that need to be distributed, but the thread group does not currently have a listener thread to distribute them. This can help to ensure that the thread group does not miss client connection requests because it has no listener thread.

The thread pool's timer thread creates a new listener thread for a thread group if all of the following conditions are met:

The thread group has not handled any I/O events since the last check by the timer thread.
There is currently no listener thread in the thread group. For example, if the system variable is not enabled, then the thread group's listener thread can became a worker thread, so that it could handle some client connection request. In this case, the new thread can become the thread group's listener thread.
There are no sleeping worker threads in the thread group that the timer thread can wake up.
And one of the following conditions is also met:

Worker Thread Creation by Timer Thread during Stalls

The thread pool's timer thread can create a new worker thread for a thread group when the thread group is stalled. This can help to ensure that a long query can't monopole its thread group.

The thread pool's timer thread creates a new worker thread for a thread group if all of the following conditions are met:

The timer thread thinks that the thread group is stalled. This means that the following conditions have been met:
- There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads.
- No client connection requests have been allowed to be dequeued to run since the last stall check by the timer thread.
There are no sleeping worker threads in the thread group that the timer thread can wake up.

Thread Creation Throttling

In some of the scenarios listed above, a thread is only created within a thread group if no new threads have been created for the thread group within the throttling interval. The throttling interval depends on the number of threads that are already in the thread group.

In and later, thread creation is not throttled until a thread group has more than 1 + threads:

Number of Threads in Thread Group

Throttling Interval (milliseconds)

The throttling factor is calculated like this (see for more information):

Thread Group Stalls

The thread pool has a feature that allows it to detect if a client connection is executing a long-running query that may be monopolizing its thread group. If a client connection were to monopolize its thread group, then that could prevent other client connections in the thread group from running their queries. In other words, the thread group would appear to be stalled.

This stall detection feature is implemented by creating a timer thread that periodically checks if any of the thread groups are stalled. There is only a single timer thread for the entire thread pool. The system variable defines the number of milliseconds between each stall check performed by the timer thread. The default value is 500. It can be changed dynamically with . For example:

It can also be set in a server in an prior to starting up the server. For example:

The timer thread considers a thread group to be stalled if the following is true:

There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads.
No client connection requests have been allowed to be dequeued to run since the last stall check by the timer thread.

This indicates that the one or more client connections currently using the active worker threads may be monopolizing the thread group, and preventing the queued client connections from performing work. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel.

The system variable essentially defines the limit for what a "fast query" is. If a query takes longer than , then the thread pool is likely to think that it is too slow, and it will either wake up a sleeping worker thread or create a new worker thread to let another client connection in the thread group run a query in parallel.

In general, changing the value of the system variable has the following effect:

Setting it to higher values can help avoid starting too many parallel threads if you expect a lot of client connections to execute long-running queries.
Setting it to lower values can help prevent deadlocks.

Thread Group Oversubscription

If the timer thread were to detect a stall in a thread group, then it would either wake up a sleeping worker thread or create a new worker thread in that thread group. At that point, the thread group would have multiple active worker threads. In other words, the thread group would be oversubscribed.

You might expect that the thread pool would shutdown one of the worker threads when the stalled client connection finished what it was doing, so that the thread group would only have one active worker thread again. However, this does not always happen. Once a thread group is oversubscribed, the system variable defines the upper limit for when worker threads start shutting down after they finish work for client connections. The default value is 3. It can be changed dynamically with . For example:

It can also be set in a server in an prior to starting up the server. For example:

To clarify, the system variable does not play any part in the creation of new worker threads. The system variable is only used to determine how many worker threads should remain active in a thread group, once a thread group is already oversubscribed due to stalls.

In general, the default value of 3 should be adequate for most users. Most users should not need to change the value of the system variable.

_{This page is licensed: CC BY-SA / Gnu FDL}

Query Cache

The query cache stores results of SELECT queries so that if the identical query is received in future, the results can be quickly returned.

This is extremely useful in high-read, low-write environments (such as most websites). It does not scale well in environments with high throughput on multi-core machines, so it is disabled by default.

Note that the query cache cannot be enabled in certain environments. See Limitations.

Setting Up the Query Cache

Unless MariaDB has been specifically built without the query cache, the query cache will always be available, although inactive. The have_query_cache server variable will show whether the query cache is available.

If this is set to NO, you cannot enable the query cache unless you rebuild or reinstall a version of MariaDB with the cache available.

To see if the cache is enabled, view the server variable. It is disabled by default — enable it by setting query_cache_type to 1 :

The is set to 1MB by default. Set the cache to a larger size if needed, for example:

The query_cache_type is automatically set to ON if the server is started with the query_cache_size set to a non-zero (and non-default) value.

See below for details.

How the Query Cache Works

When the query cache is enabled and a new SELECT query is processed, the query cache is examined to see if the query appears in the cache.

Queries are considered identical if they use the same database, same protocol version and same default character set. Prepared statements are always considered as different to non-prepared statements, see for more info.

If the identical query is not found in the cache, the query will be processed normally and then stored, along with its result set, in the query cache. If the query is found in the cache, the results will be pulled from the cache, which is much quicker than processing it normally.

Queries are examined in a case-sensitive manner, so :

Is different from :

Comments are also considered and can make the queries differ, so :

Is different from :

See the server variable for an option to strip comments before searching.

Each time changes are made to the data in a table, all affected results in the query cache are cleared. It is not possible to retrieve stale data from the query cache.

When the space allocated to query cache is exhausted, the oldest results will be dropped from the cache.

When using query_cache_type=ON, and the query specifies SQL_NO_CACHE (case-insensitive), the server will not cache the query and will not fetch results from the query cache.

When using query_cache_type=DEMAND and the query specifies SQL_CACHE, the server will cache the query.

Queries Stored in the Query Cache

If the system variable is set to 1, or ON, all queries fitting the size constraints will be stored in the cache unless they contain a SQL_NO_CACHE clause, or are of a nature that caching makes no sense, for example making use of a function that returns the current time. Queries with SQL_NO_CACHE will not attempt to acquire query cache lock.

If any of the following functions are present in a query, it will not be cached. Queries with these functions are sometimes called 'non-deterministic' — don't get confused with the use of this term in other contexts.

A query will also not be added to the cache if:

It is of the form:
- SELECT SQL_NO_CACHE ...
- SELECT ... INTO OUTFILE ...
- SELECT ... INTO DUMPFILE ...

The query itself can also specify that it is not to be stored in the cache by using the SQL_NO_CACHE attribute. Query-level control is an effective way to use the cache more optimally.

It is also possible to specify that no queries must be stored in the cache unless the query requires it. To do this, the server variable must be set to 2, or DEMAND. Then, only queries with the SQL_CACHE attribute are cached.

Limiting the Size of the Query Cache

There are two main ways to limit the size of the query cache. First, the overall size in bytes is determined by the server variable. About 40KB is needed for various query cache structures.

The query cache size is allocated in 1024 byte-blocks, thus it should be set to a multiple of 1024.

The query result is stored using a minimum block size of . Check two conditions to use a good value of this variable: Query cache insert result blocks with locks, each new block insert lock query cache, a small value will increase locks and fragmentation and waste less memory for small results, a big value will increase memory use wasting more memory for small results but it reduce locks. Test with your workload for fine tune this variable.

If the is enabled, setting the query cache size to an invalid value will cause an error. Otherwise, it will be set to the nearest permitted value, and a warning will be triggered.

The ideal size of the query cache is very dependent on the specific needs of each system. Setting a value too small will result in query results being dropped from the cache when they could potentially be re-used later. Setting a value too high could result in reduced performance due to lock contention, as the query cache is locked during updates.

The second way to limit the cache is to have a maximum size for each set of query results. This prevents a single query with a huge result set taking up most of the available memory and knocking a large number of smaller queries out of the cache. This is determined by the server variable.

If you attempt to set a query cache that is too small (the amount depends on the architecture), the resizing will fail and the query cache will be set to zero, for example :

Examining the Query Cache

A number of status variables provide information about the query cache.

Qcache_inserts contains the number of queries added to the query cache, Qcache_hits contains the number of queries that have made use of the query cache, while Qcache_lowmem_prunes contains the number of queries that were dropped from the cache due to lack of memory.

The above example could indicate a poorly performing cache. More queries have been added, and more queries have been dropped, than have actually been used.

Results returned by the query cache count towards Com_select (see ).

The creates the table in the , allowing you to examine the contents of the query cache.

Query Cache Fragmentation

The Query Cache uses blocks of variable length, and over time may become fragmented. A high Qcache_free_blocks relative to Qcache_total_blocks may indicate fragmentation. will defragment the query cache without dropping any queries :

After this, there will only be one free block :

Emptying and disabling the Query Cache

To empty or clear all results from the query cache, use . will have the same effect.

Setting either or to 0 will disable the query cache, but to free up the most resources, set both to 0 when you wish to disable caching.

Limitations

The query cache needs to be disabled in order to use .
The query cache is not used by the storage engine (amongst others).

LOCK TABLES and the Query Cache

The query cache can be used when tables have a write lock (which may seem confusing since write locks should avoid table reads). This behaviour can be changed by setting the system variable to ON, in which case each write lock will invalidate the table query cache. Setting to OFF, the default, means that cached queries can be returned even when a table lock is being held. For example:

Transactions and the Query Cache

The query cache handles transactions. Internally a flag (FLAGS_IN_TRANS) is set to 0 when a query was executed outside a transaction, and to 1 when the query was inside a transaction ( / / ). This flag is part of the "query cache hash", in others words one query inside a transaction is different from a query outside a transaction.

Queries that change rows ( / / / ) inside a transaction will invalidate all queries from the table, and turn off the query cache to the changed table. Transactions that don't end with COMMIT / ROLLBACK check that even without COMMIT / ROLLBACK, the query cache is turned off to allow row level locking and consistency level.

Examples:

Query Cache Internal Structure

Internally, each flag that can change a result using the same query is a different query. For example, using the latin1 charset and using the utf8 charset with the same query are treated as different queries by the query cache.

Some fields that differentiate queries are (from "Query_cache_query_flags" internal structure) :

query (string)
current database schema name (string)
client long flag (0/1)
client protocol 4.1 (0/1)

Timeout and Mutex Contention

When searching for a query inside the query cache, a try_lock function waits with a timeout of 50ms. If the lock fails, the query isn't executed via the query cache. This timeout is hard-coded ( include two variables to tune this timeout).

From the sql_cache.cc, function "try_lock" using TIMEOUT :

When inserting a query inside the query cache or aborting a query cache insert (using the command for example), a try_lock function waits until the query cache returns; no timeout is used in this case.

When two processes execute the same query, only the last process stores the query result. All other processes increase the status variable.

SQL_NO_CACHE and SQL_CACHE

There are two aspects to the query cache: placing a query in the cache, and retrieving it from the cache.

Adding a query to the query cache. This is done automatically for cacheable queries (see () when the system variable is set to 1, or ON and the query contains no SQL_NO_CACHE clause, or when the system variable is set to 2, or DEMAND, and the query contains the SQL_CACHE clause.
Retrieving a query from the cache. This is done after the server receives the query and before the query parser. In this case one point should be considered:

When using SQL_NO_CACHE, it should be after the first SELECT hint:

Don't use it like this:

The second query will be checked. The query cache only checks if SQL_NO_CACHE/SQL_CACHE exists after the first SELECT. (More info at )

_{This page is licensed: CC BY-SA / Gnu FDL}

Building the best INDEX for a given SELECT

The problem

You have a SELECT and you want to build the best INDEX for it. This blog is a "cookbook" on how to do that task.

A short algorithm that works for many simpler SELECTs and helps in complex queries.
Examples of the algorithm, plus digressions into exceptions and variants
Finally a long list of "other cases".

The hope is that a newbie can quickly get up to speed, and his/her INDEXes will no longer smack of "newbie".

Many edge cases are explained, so even an expert may find something useful here.

Algorithm

Here's the way to approach creating an INDEX, given a SELECT. Follow the steps below, gathering columns to put in the INDEX in order. When the steps give out, you usually have the 'perfect' index.

Given a WHERE with a bunch of expressions connected by AND: Include the columns (if any), in any order, that are compared to a constant and not hidden in a function.
You get one more chance to add to the INDEX; do the first of these that applies:

2a. One column used in a 'range' -- BETWEEN, '>', LIKE w/o leading wildcard, etc.
2b. All columns, in order, of the GROUP BY.
2c. All columns, in order, of the ORDER BY if there is no mixing of ASC and DESC.

Digression

This blog assumes you know the basic idea behind having an INDEX. Here is a refresher on some of the key points.

Virtually all INDEXes in MySQL are structured as BTrees BTrees allow very efficient for

Given a key, find the corresponding row(s);
"Range scans" -- That is start at one value for the key and repeatedly find the "next" (or "previous") row.

A PRIMARY KEY is a UNIQUE KEY; a UNIQUE KEY is an INDEX. ("KEY" == "INDEX".)

InnoDB "clusters" the PRIMARY KEY with the data. Hence, given the value of the PK ("PRIMARY KEY"), after drilling down the BTree to find the index entry, you have all the columns of the row when you get there. A "secondary key" (any UNIQUE or INDEX other than the PK) in InnoDB first drills down the BTree for the secondary index, where it finds a copy of the PK. Then it drills down the PK to find the row.

Every InnoDB table has a PRIMARY KEY. While there is a default if you do not specify one, it is best to explicitly provide a PK.

For completeness: MyISAM works differently. All indexes (including the PK) are in separate BTrees. The leaf node of such BTrees have a pointer (usually a byte offset) into the data file.

All discussion here assumes InnoDB tables, however most statements apply to other Engines.

First, some examples

Think of a list of names, sorted by last_name, then first_name. You have undoubtedly seen such lists, and they often have other information such as address and phone number. Suppose you wanted to look me up. If you remember my full name ('James' and 'Rick'), it is easy to find my entry. If you remembered only my last name ('James') and first initial ('R'). You would quickly zoom in on the Jameses and find the Rs in them. There, you might remember 'Rick' and ignore 'Ronald'. But, suppose you remembered my first name ('Rick') and only my last initial ('J'). Now you are in trouble. You would be scanning all the Js -- Jones, Rick; Johnson, Rick; Jamison, Rick; etc, etc. That's much less efficient.

Those equate to

Think about this example as I talk about "=" versus "range" in the Algorithm, below.

Algorithm, step 1 (WHERE "column = const")

WHERE aaa = 123 AND ... : an INDEX starting with aaa is good.
WHERE aaa = 123 AND bbb = 456 AND ... : an INDEX starting with aaa and bbb is good. In this case, it does not matter whether aaa or bbb comes first in the INDEX.
xxx IS NULL : this acts like "= const" for this discussion.

Note that the expression must be of the form of column_name = (constant). These do not apply to this step in the Algorithm: DATE(dt) = '...', LOWER(s) = '...', CAST(s ...) = '...', x='...' COLLATE...

(If there are no "=" parts AND'd in the WHERE clause, move on to step 2 without any columns in your putative INDEX.)

Algorithm, step 2

Find the first of 2a / 2b / 2c that applies; use it; then quit. If none apply, then you are through gathering columns for the index.

In some cases it is optimal to do step 1 (all equals) plus step 2c (ORDER BY).

Algorithm, step 2a (one range)

A "range" shows up as

aaa >= 123 -- any of <, <=, >=, >; but not <>, !=
aaa BETWEEN 22 AND 44
sss LIKE 'blah%' -- but not sss LIKE '%blah'

If there are more parts to the WHERE clause, you must stop now.

Complete examples (assume nothing else comes after the snippet)

WHERE aaa >= 123 AND bbb = 1 ⇒ INDEX(bbb, aaa) (WHERE order does not matter; INDEX order does)
WHERE aaa >= 123 ⇒ INDEX(aaa)
WHERE aaa >= 123 AND ccc > 'xyz' ⇒ INDEX(aaa) or INDEX(ccc) (only one range)

Algorithm, step 2b (GROUP BY)

If there is a GROUP BY, all the columns of the GROUP BY should now be added, in the specified order, to the INDEX you are building. (I do not know what happens if one of the columns is already in the INDEX.)

If you are GROUPing BY an expression (including function calls), you cannot use the GROUP BY; stop.

Complete examples (assume nothing else comes after the snippet)

WHERE aaa = 123 AND bbb = 1 GROUP BY ccc ⇒ INDEX(bbb, aaa, ccc) or INDEX(aaa, bbb, ccc) (='s first, in any order; then the GROUP BY)
WHERE aaa >= 123 GROUP BY xxx ⇒ INDEX(aaa) (You should have stopped with Step 2a)
GROUP BY x,y ⇒ INDEX(x,y) (no WHERE)

Algorithm, step 2c (ORDER BY)

If there is a ORDER BY, all the columns of the ORDER BY should now be added, in the specified order, to the INDEX you are building.

If there are multiple columns in the ORDER BY, and there is a mixture of ASC and DESC, do not add the ORDER BY columns; they won't help; stop.

If you are ORDERing BY an expression (including function calls), you cannot use the ORDER BY; stop.

Complete examples (assume nothing else comes after the snippet)

WHERE aaa = 123 GROUP BY ccc ORDER BY ddd ⇒ INDEX(aaa, ccc) -- should have stopped with Step 2b
WHERE aaa = 123 GROUP BY ccc ORDER BY ccc ⇒ INDEX(aaa, ccc) -- the ccc will be used for both GROUP BY and ORDER BY
WHERE aaa = 123 ORDER BY xxx ASC, yyy DESC ⇒ INDEX(aaa) -- mixture of ASC and DESC.

The following are especially good. Normally a LIMIT cannot be applied until after lots of rows are gathered and then sorted according to the ORDER BY. But, if the INDEX gets all they way through the ORDER BY, only (OFFSET + LIMIT) rows need to be gathered. So, in these cases, you win the lottery with your new index:

WHERE aaa = 123 GROUP BY ccc ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)
WHERE aaa = 123 ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)
ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc)

(It does not make much sense to have a LIMIT without an ORDER BY, so I do not discuss that case.)

Algorithm end

You have collected a few columns; put them in INDEX and ADD that to the table. That will often produce a "good" index for the SELECT you have. Below are some other suggestions that may be relevant.

An example of the Algorithm being 'wrong':

This would (according to the Algorithm) call for INDEX(flag). However, indexing a column that has two (or a small number of) values is almost always useless. This is called 'low cardinality'. The Optimizer would prefer to do a table scan than to bounce between the index BTree and the data.

On the other hand, the Algorithm is 'right' with

That would call for a compound index starting with a flag: INDEX(flag, date). Such an index is likely to be very beneficial. And it is likely to be more beneficial than INDEX(date).

If your resulting INDEX include column(s) that are likely to be UPDATEd, note that the UPDATE will have extra work to remove a 'row' from one place in the INDEX's BTree and insert a 'row' back into the BTree. For example:

There are too many variables to say whether it is better to keep the index or to toss it.

In this case, shortening the index may be beneficial:

Changing to INDEX(z) would make for less work for the UPDATE, but might hurt some SELECT. It depends on the frequency of each, plus many more factors.

Limitations

(There are exceptions to some of these.)

You may not create an index bigger than 3KB.
You may not include a column that equates to bigger than some value (767 bytes -- VARCHAR(255) CHARACTER SET utf8).
You can deal with big fields using "prefix" indexing; but see below.
You should not have more than 5 columns in an index. (This is just a Rule of Thumb; nothing prevents having more.)

Flags and low cardinality

INDEX(flag) is almost never useful if flag has very few values. More specifically, when you say WHERE flag = 1 and "1" occurs more than 20% of the time, such an index will be shunned. The Optimizer would prefer to scan the table instead of bouncing back and forth between the index and the data for more than 20% of the rows.

("20%" is really somewhere between 10% and 30%, depending on the phase of the moon.)

"Covering" indexes

A "Covering" index is an index that contains all the columns in the SELECT. It is special in that the SELECT can be completed by looking only at the INDEX BTree. (Since InnoDB's PRIMARY KEY is clustered with the data, "covering" is of no benefit when considering at the PRIMARY KEY.)

Mini-cookbook:

Gather the list of column(s) according to the "Algorithm", above.
Add to the end of the list the rest of the columns seen in the SELECT, in any order.

Examples:

SELECT x FROM t WHERE y = 5; ⇒ INDEX(y,x) -- The algorithm said just INDEX(y)
SELECT x,z FROM t WHERE y = 5 AND q = 7; ⇒ INDEX(y,q,x,z) -- y and q in either order (Algorithm), then x and z in either order (covering).
SELECT x FROM t WHERE y > 5 AND q > 7; ⇒ INDEX(y,q,x) -- y or q first (that's as far as the Algorithm goes), then the other two fields afterwards.

The speedup you get might be minor, or it might be spectacular; it is hard to predict.

But...

It is not wise to build an index with lots of columns. Let's cut it off at 5 (Rule of Thumb).
Prefix indexes cannot 'cover', so don't use them anywhere in a 'covering' index.
There are limits (3KB?) on how 'wide' an index can be, so "covering" may not be possible.

Redundant/excessive indexes

INDEX(a,b) can find anything that INDEX(a) could find. So you don't need both. Get rid of the shorter one.

If you have lots of SELECTs and they generate lots of INDEXes, this may cause a different problem. Each index must be updated (sooner or later) for each INSERT. More indexes ⇒ slower INSERTs. Limit the number of indexes on a table to about 6 (Rule of Thumb).

Notice in the cookbook how it says "in any order" in a few places. If, for example, you have both of these (in different SELECTs):

WHERE a=1 AND b=2 begs for either INDEX(a,b) or INDEX(b,a)
WHERE a>1 AND b=2 begs only for INDEX(b,a) Include only INDEX(b,a) since it handles both cases with only one INDEX.

Suppose you have a lot of indexes, including (a,b,c,dd) and (a,b,c,ee). Those are getting rather long. Consider either picking one of them, or having simply (a,b,c). Sometimes the selectivity of (a,b,c) is so good that tacking on 'dd' or 'ee' does make enough difference to matter.

Optimizer picks ORDER BY

The main cookbook skips over an important optimization that is sometimes used. The optimizer will sometimes ignore the WHERE and, instead, use an INDEX that matches the ORDER BY. This, of course, needs to be a perfect match -- all columns, in the same order. And all ASC or all DESC.

This becomes especially beneficial if there is a LIMIT.

But there is a problem. There could be two situations, and the Optimizer is sometimes not smart enough to see which case applies:

If the WHERE does very little filtering, fetching the rows in ORDER BY order avoids a sort and has little wasted effort (because of 'little filtering'). Using the INDEX matching the ORDER BY is better in this case.
If the WHERE does a lot of filtering, the ORDER BY is wasting a lot of time fetching rows only to filter them out. Using an INDEX matching the WHERE clause is better.

What should you do? If you think the "little filtering" is likely, then create an index with the ORDER BY columns in order and hope that the Optimizer uses it when it should.

OR

Cases...

WHERE a=1 OR a=2 -- This is turned into WHERE a IN (1,2) and optimized that way.
WHERE a=1 OR b=2 usually cannot be optimized.
WHERE x.a=1 OR y.b=2 This is even worse because of using two different tables.

A workaround is to use UNION. Each part of the UNION is optimized separately. For the second case:

Now the query can take good advantage of two different indexes. Note: "Index merge" might kick in on the original query, but it is not necessarily any faster than the UNION. Sister blog on compound indexes, including 'Index Merge'

The third case (OR across 2 tables) is similar to the second.

If you originally had a LIMIT, UNION gets complicated. If you started with ORDER BY z LIMIT 190, 10, then the UNION needs to be

TEXT / BLOB

You cannot directly index a TEXT or BLOB or large VARCHAR or large BINARY column. However, you can use a "prefix" index: INDEX(foo(20)). This says to index the first 20 characters of foo. But... It is rarely useful.

Example of a prefix index:

The index for me would contain 'Ja', 'Rick'. That's not useful for distinguishing between 'Jamison', 'Jackson', 'James', etc., so the index is so close to useless that the optimizer often ignores it.

Probably never do UNIQUE(foo(20)) because this applies a uniqueness constraint on the first 20 characters of the column, not the whole column!

Dates

DATE, DATETIME, etc. are tricky to compare against.

Some tempting, but inefficient, techniques:

date_col LIKE '2016-01%' -- must convert date_col to a string, so acts like a functionLEFT(date_col, 4) = '2016-01' -- hiding the column in functionDATE(date_col) = 2016 -- hiding the column in function

All must do a full scan. (On the other hand, it can handy to use GROUP BY LEFT(date_col, 7) for monthly grouping, but that is not an INDEX issue.)

This is efficient, and can use an index:

This case works because both right-hand values are converted to constants, then it is a "range". I like the design pattern with INTERVAL because it avoids computing the last day of the month. And it avoids tacking on '23:59:59', which is wrong if you have microsecond times. (And other cases.)

EXPLAIN Key_len

Perform EXPLAIN SELECT... (and EXPLAIN FORMAT=JSON SELECT... if you have 5.6.5). Look at the Key that it chose, and the Key_len. From those you can deduce how many columns of the index are being used for filtering. (JSON makes it easier to get the answer.) From that you can decide whether it is using as much of the INDEX as you thought. Caveat: Key_len only covers the WHERE part of the action; the non-JSON output won't easily say whether GROUP BY or ORDER BY was handled by the index.

IN

IN (1,99,3) is sometimes optimized as efficiently as "=", but not always. Older versions of MySQL did not optimize it as well as newer versions. (5.6 is possibly the main turning point.)

IN ( SELECT ... )

From version 4.1 through 5.5, IN ( SELECT ... ) was very poorly optimized. The SELECT was effectively re-evaluated every time. Often it can be transformed into a JOIN, which works much faster. Heres is a pattern to follow:

The SELECT expressions will need "a." prefixing the column names.

Alas, there are cases where the pattern is hard to follow.

5.6 does some optimizing, but probably not as good as the JOIN.

If there is a JOIN or GROUP BY or ORDER BY LIMIT in the subquery, that complicates the JOIN in new format. So, it might be better to use this pattern:

Caveat: If you end up with two subqueries JOINed together, note that neither has any indexes, hence performance can be very bad. (5.6 improves on it by dynamically creating indexes for subqueries.)

There is work going on in MariaDB and Oracle 5.7, in relation to "NOT IN", "NOT EXISTS", and "LEFT JOIN..IS NULL"; here is an old discussion on the topic So, what I say here may not be the final word.

Explode/Implode

When you have a JOIN and a GROUP BY, you may have the situation where the JOIN exploded more rows than the original query (due to many:many), but you wanted only one row from the original table, so you added the GROUP BY to implode back to the desired set of rows.

This explode + implode, itself, is costly. It would be better to avoid them if possible.

Sometimes the following will work.

Using DISTINCT or GROUP BY to counteract the explosion

When using second table just to check for existence:

Many-to-many mapping table

Do it this way.

Notes:

Lack of an AUTO_INCREMENT id for this table -- The PK given is the 'natural' PK; there is no good reason for a surrogate.
"MEDIUMINT" -- This is a reminder that all INTs should be made as small as is safe (smaller ⇒ faster). Of course the declaration here must match the definition in the table being linked to.
"UNSIGNED" -- Nearly all INTs may as well be declared non-negative
"NOT NULL" -- Well, that's true, isn't it?

To conditionally INSERT new links, use

Note that if you had an AUTO_INCREMENT in this table, IODKU would "burn" ids quite rapidly.

Subqueries and UNIONs

Each subquery SELECT and each SELECT in a UNION can be considered separately for finding the optimal INDEX.

Exception: In a "correlated" ("dependent") subquery, the part of the WHERE that depends on the outside table is not easily factored into the INDEX generation. (Cop out!)

JOINs

The first step is to decide what order the optimizer will go through the tables. If you cannot figure it out, then you may need to be pessimistic and create two indexes for each table -- one assuming the table will be used first, one assiming that it will come later in the table order.

The optimizer usually starts with one table and extracts the data needed from it. As it finds a useful (that is, matches the WHERE clause, if any) row, it reaches into the 'next' table. This is called NLJ ("Nested Loop Join"). The process of filtering and reaching to the next table continues through the rest of the tables.

The optimizer usually picks the "first" table based on these hints:

STRAIGHT_JOIN forces the table order.
The WHERE clause limits which rows needed (whether indexed or not).
The table to the "left" in a LEFT JOIN usually comes before the "right" table. (By looking at the table definitions, the optimizer may decide that "LEFT" is irrelevant.)
The current INDEXes will encourage an order.

Running EXPLAIN tells you the table order that the Optimizer is very likely to use today. After adding a new INDEX, the optimizer may pick a different table order. You should anticipate the order changing, guess at what order makes the most sense, and build the INDEXes accordingly. Then rerun EXPLAIN to see if the Optimizer's brain was on the same wavelength you were on.

You should build the INDEX for the "first" table based on any parts of the WHERE, GROUP BY, and ORDER BY clauses that are relevant to it. If a GROUP/ORDER BY mentions a different table, you should ignore that clause.

The second (and subsequent) table will be reached into based on the ON clause. (Instead of using commajoin, please write JOINs with the JOIN keyword and ON clause!) In addition, there could be parts of the WHERE clause that are relevant. GROUP/ORDER BY are not to be considered in writing the optimal INDEX for subsequent tables.

PARTITIONing

PARTITIONing is rarely a substitute for a good INDEX.

PARTITION BY RANGE is a technique that is sometimes useful when indexing fails to be good enough. In a two-dimensional situation such as nearness in a geographical sense, one dimension can partially be handled by partition pruning; then the other dimension can be handled by a regular index (preferrably the PRIMARY KEY). This goes into more detail: .

FULLTEXT

FULLTEXT is now implemented in InnoDB as well as MyISAM. It provides a way to search for "words" in TEXT columns. This is much faster (when it is applicable) than col LIKE '%word%'.

always(?) uses the FULLTEXT index first. That is, the whole Algorithm is invalidated when one of the ANDs is a MATCH.

Signs of a Newbie

No "compound" (aka "composite") indexes
No PRIMARY KEY
Redundant indexes (especially blatant is PRIMARY KEY(id), KEY(id))
Most or all columns individually indexes ("But I indexed everything")

Speeding up wp_postmeta

The published table (see Wikipedia) is

The problems:

The AUTO_INCREMENT provides no benefit; in fact it slows down most queries and clutters disk.
Much better is PRIMARY KEY(post_id, meta_key) -- clustered, handles both parts of usual JOIN.
BIGINT is overkill, but that can't be fixed without changing other tables.
VARCHAR(255) can be a problem in 5.6 with utf8mb4; see workarounds below.

The solutions:

Postlog

Initial posting: March, 2015; Refreshed Feb, 2016; Add DATE June, 2016; Add WP example May, 2017.

The tips in this document apply to MySQL, MariaDB, and Percona.

HA & Performance

Optimization and Tuning

Buffers, Caches and Threads

Thread Pool

Thread Pool in MariaDB 5.1 - 5.3

About Pool of Threads

Instructions

See also

Thread States

Delayed Insert Connection Thread States

Delayed Insert Handler Thread States

Event Scheduler Thread States

Master Thread States

Query Cache Thread States

Replica I/O Thread States

Replica Connection Thread States

Replica SQL Thread States

MariaDB Internal Optimizations

Fair Choice Between Range and Index_merge Optimizations

Operating System Optimizations

Filesystem Optimizations

Suitability of Filesystems

Disabling Access Time

Using NFS

Optimization and Indexes

Ignored Indexes

Syntax

Index Statistics

How Index Statistics Help the Query Optimizer

Storage Engine Index Types

B-tree Indexes

Hash Indexes

R-tree Indexes

Full-Text Indexes

Optimizer Hints

See Also

Compression

Compression Plugins

Installing

Upgrading

See Also

Optimizing Data Structure

Numeric vs String Fields

Optimizing MEMORY Tables

Optimizing String and Character Fields

Comparing String Columns

VARCHAR vs BLOB

Optimizing Tables

Optimizing Queries

Aborting Statements that Exceed a Certain Time to Execute

Overview

DISTINCT removal in aggregate functions

Basics

When one can skip de-duplication

Observability

Equality propagation optimization

Basic idea

FORCE INDEX

Description

Example

Index Prefixes

See Also

hash_join_cardinality optimizer_switch Flag

Example

Hash Join

IGNORE INDEX

Syntax

Description

index_merge sort_intersection

Limitations of index_merge/intersection

How index_merge/sort_intersection improves the situation

When to Use

LIMIT ROWS EXAMINED

Syntax

Description

Examples

not_null_range_scan Optimization

Description

optimizer_join_limit_pref_ratio Optimization

Basics