All pages
Powered by GitBook
Couldn't generate the PDF for 226 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

HA & Performance

Optimize MariaDB Server for high availability and performance. Learn about replication, clustering, load balancing, and configuration tuning for robust and efficient database solutions.

Optimization and Tuning

Optimize MariaDB Server for high availability and performance. Learn about replication, clustering, load balancing, and configuration tuning for robust and efficient database solutions.

Buffers, Caches and Threads

Covers essential configurations to maximize throughput and responsiveness for your database workloads.

MariaDB Internal Optimizations

Delves into how the database engine enhances query execution, data storage, and overall performance through its core architecture.

Covers configuring your OS for improved I/O, memory management, and network settings to maximize database efficiency.

Covers index types, creation, and best practices for leveraging them to significantly improve query performance and data retrieval speed.

Optimizer hints are options available that affect the execution plan.

Details how to apply data compression at various levels to reduce disk space and improve I/O efficiency.

Covers schema design, data types, and normalization techniques to improve query efficiency and storage utilization.

Covers various techniques, including proper indexing, data types, and storage engine choices, to improve query speed and efficiency.

Provides techniques for writing efficient SQL, understanding query execution plans, and leveraging indexes effectively to speed up your queries.

Optimize MariaDB Server with system variables, configuring various parameters to fine-tune performance, manage resources, and adapt the database to your specific workload requirements.

Operating System Optimizations
Optimization and Indexes
Optimizer Hints
Compression
Optimizing Data Structure
Optimizing Tables
Optimizing Queries
System and Status Variables

Buffers, Caches and Threads

Optimize MariaDB Server performance by tuning buffers, caches, and threads. This section covers essential configurations to maximize throughput and responsiveness for your database workloads.

Thread Pool

Optimize MariaDB Server with the thread pool. This section explains how to manage connections and improve performance by efficiently handling concurrent client requests, reducing resource overhead.

Thread States

Understand MariaDB Server thread states. This section explains the different states a thread can be in, helping you monitor and troubleshoot query execution and server performance.

MariaDB Internal Optimizations

Explore MariaDB Server's internal optimizations. This section delves into how the database engine enhances query execution, data storage, and overall performance through its core architecture.

Optimization and Indexes

Optimize MariaDB Server queries with indexes. This section covers index types, creation, and best practices for leveraging them to significantly improve query performance and data retrieval speed.

Full-Text Indexes

Implement full-text indexes in MariaDB Server for efficient text search. This section guides you through creating and utilizing these indexes to optimize queries on large text datasets.

Compression

Optimize MariaDB Server performance and storage with compression. This section details how to apply data compression at various levels to reduce disk space and improve I/O efficiency.

Optimizing Data Structure

Optimize MariaDB Server performance by refining your data structure. This section covers schema design, data types, and normalization techniques to improve query efficiency and storage utilization.

Optimizations for Derived Tables

Optimize derived tables in MariaDB Server queries. This section provides techniques and strategies to improve the performance of subqueries and complex joins, enhancing overall query efficiency.

Optimizing Tables

Optimize tables for enhanced performance. This section covers various techniques, including proper indexing, data types, and storage engine choices, to improve query speed and efficiency.

Numeric vs String Fields

A large numeric value is stored in far fewer bytes than the equivalent string value. It is therefore faster to move and compare numeric data, so it's best to choose numeric columns for unique id's and other similar fields.

This page is licensed: CC BY-SA / Gnu FDL

Optimizing String and Character Fields

Comparing String Columns

When values from different columns are compared, the comparison runs more quickly when the columns are of the same character set and collation. If they are different, the strings need to be converted while the query runs. So, where possible, declare string columns using the same character set and collation when you may need to compare them.

VARCHAR vs BLOB

ORDER BY and GROUP BY clauses can generate temporary tables in memory (see ) if the original table doesn't contain any BLOB fields. If a column is less than 8KB, you can make use of a Binary VARCHAR rather than a BLOB.

This page is licensed: CC BY-SA / Gnu FDL

Event Scheduler Thread States

This article documents thread states that are related to scheduling and execution. These include the Event Scheduler thread, threads that terminate the Event Scheduler, and threads for executing events.

These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the

Value
Description

Query Cache Thread States

This article documents thread states that are related to the . These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .

Value
Description

Replica Connection Thread States

This article documents thread states that are related to connection threads that occur on a replicatioin replica. These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .

Value
Description

Optimizing MEMORY Tables

are a good choice for data that needs to be accessed often, and is rarely updated. Being in memory, it's not suitable for critical data or for storage, but if data can be moved to memory for reading without needing to be regenerated often, if at all, it can provide a significant performance boost.

The has a key feature in that it permits its indexes to be either B-tree or Hash. Choosing the best index type can lead to better performance. See for more on the characteristics of each index type.

This page is licensed: CC BY-SA / Gnu FDL

Optimizing Queries

Optimize queries for peak performance. This section provides techniques for writing efficient SQL, understanding query execution plans, and leveraging indexes effectively to speed up your queries.

Optimization Strategies

Discover effective optimization strategies for MariaDB Server queries. This section provides a variety of techniques and approaches to enhance query performance and overall database efficiency.

Operating System Optimizations

Optimize MariaDB Server performance with operating system tuning. This section covers configuring your OS for improved I/O, memory management, and network settings to maximize database efficiency.

Waiting for next activation

The event queue contains items, but the next activation is at some time in the future.

Waiting for scheduler to stop

Waiting for the event scheduler to stop after issuing SET GLOBAL event_scheduler=OFF.

Waiting on empty queue

Sleeping, as the event scheduler's queue is empty.

This page is licensed: CC BY-SA / Gnu FDL

Clearing

Thread is terminating.

Initialized

event
SHOW PROCESSLIST
Information Schema PROCESSLIST Table
Performance Schema threads Table

Thread has be initialized.

sending cached result to client

A result found in the query cache is being sent to the client.

storing result in query cache

Saving the result of a query into the query cache.

Waiting for query cache lock

Waiting to take a query cache lock.

This page is licensed: CC BY-SA / Gnu FDL

checking privileges on cached query

Checking whether the user has permission to access a result in the query cache.

checking query cache for query

Checking whether the current query exists in the query cache.

invalidating query cache entries

Query Cache
SHOW PROCESSLIST
Information Schema PROCESSLIST Table
Performance Schema threads Table

Marking query cache entries as invalid as the underlying tables have changed.

Reading master dump table data

After the table created by a master dump (the Opening master dump table state) the table is now being read.

Rebuilding the index on master dump table

After the table created by a master dump has been opened and read (the Reading master dump table data state), the index is built.

This page is licensed: CC BY-SA / Gnu FDL

Changing master

Processing a CHANGE MASTER TO statement.

Killing slave

Processing a STOP SLAVE statement.

Opening master dump table

SHOW PROCESSLIST
Information Schema PROCESSLIST Table
Performance Schema threads Table

A table has been created from a master dump and is now being opened.

MEMORY Storage Engine
MEMORY tables
MEMORY Storage Engine
Storage Engine index types

Delayed Insert Handler Thread States

This article documents thread states that are related to the handler thread that inserts the results of INSERT DELAYED statements.

These correspond to the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

Value
Description

insert

About to insert rows into the table.

reschedule

Sleeping in order to let other threads function, after inserting a number of rows into the table.

This page is licensed: CC BY-SA / Gnu FDL

Master Thread States

This article documents thread states that are related to replication master threads. These correspond to the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

Value
Description

Finished reading one binlog; switching to next binlog

After completing one , the next is being opened for sending to the slave.

Master has sent all binlog to slave; waiting for binlog to be updated

All events have been read from the binary logs and sent to the slave. Now waiting for the binary log to be updated with new events.

Sending binlog event to slave

This page is licensed: CC BY-SA / Gnu FDL

Replica SQL Thread States

This article documents thread states that are related to replication slave SQL threads. These correspond to the Slave_SQL_State shown by SHOW SLAVE STATUS as well as the STATE values listed by the SHOW PROCESSLIST statement and the Information Schema PROCESSLIST as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

Value
Description

Apply log event

Log event is being applied.

Making temp file

Creating a temporary file containing the row data as part of a statement.

This page is licensed: CC BY-SA / Gnu FDL

Filesystem Optimizations

Suitability of Filesystems

The filesystem is not the most important aspect of MariaDB performance. More important are the available memory (RAM), the drive speed, and the system variable settings (see Hardware Optimization and System Variables).

Optimizing the filesystem can, however, make a noticeable difference in some cases. Among the best suited Linux filesystems are ext4, XFS and Btrfs. They are all included in the mainline Linux kernel and are widely supported and available on most Linux distributions.

The following theoretical file size and filesystem size limits apply to those filesystems:

Limit
ext4
XFS
Btrfs

Each has unique characteristics that are worth understanding to get the most from their usage.

Disabling Access Time

It's unlikely you'll need to record file access time on a database server, and mounting your filesystem with this disabled can give an easy improvement in performance. To do so, use the noatime option.

If you want to keep access time for or other system files, these can be stored on a separate drive.

Using NFS

Generally, we recommend not to use (Network File System) with MariaDB, for these reasons:

  • MariaDB data and log files on NFS volumes can become locked and unavailable for use. Locking issues may occur in cases where multiple instances of MariaDB access the same data directory, or when MariaDB is shut down improperly, for instance, due to a power outage. In particular, sharing a data directory among MariaDB instances is not recommended.

  • Data inconsistencies due to messages received out of order or lost network traffic. To avoid this issue, use TCP with hard and intr mount options.

Using NFS within a professional SAN environment or other storage system tends to offer greater reliability than using NFS outside of such an environment. However, NFS within a SAN environment may be slower than directly attached or bus-attached non-rotational storage.

This page is licensed: CC BY-SA / Gnu FDL

Primary Keys with Nullable Columns

MariaDB deals with primary keys over nullable columns according to the SQL standards.

Take the following table structure:

CREATE TABLE t1(
  c1 INT NOT NULL AUTO_INCREMENT, 
  c2 INT NULL DEFAULT NULL, 
  PRIMARY KEY(c1,c2)
);

Column c2 is part of a primary key, and thus it cannot be NULL.

Before , MariaDB (as well as versions of MySQL before MySQL 5.7) would silently convert it into a NOT NULL column with a default value of 0.

Since , the column is converted to NOT NULL, but without a default value. If we then attempt to insert a record without explicitly setting c2, a warning (or, in strict mode, an error), will be thrown, for example:

MySQL, since 5.7, will abort such a CREATE TABLE with an error.

The behavior adheres to the SQL 2003 standard.

SQL-2003, Part II, “Foundation” says:

**11.7 **Syntax Rules

…

5) If the specifies PRIMARY KEY, then for each in the explicit or implicit for which NOT NULL is not specified, NOT NULL is implicit in the .

Essentially this means that all PRIMARY KEY columns are automatically converted to NOT NULL. Furthermore:

11.5 General Rules

…

3) When a site S is set to its default value,

…

b) If the data descriptor for the site includes a , then S is set to the value specified by that .

…

e) Otherwise, S is set to the null value.

There is no concept of “no default value” in the standard. Instead, a column always has an implicit default value of NULL. On insertion it might however fail the NOT NULL constraint. MariaDB and MySQL instead mark such a column as “not having a default value”. The end result is the same — a value must be specified explicitly or an INSERT will fail.

MariaDB since 10.1.7 behaves in a standard compatible manner — being part of a PRIMARY KEY, the nullable column gets an automatic NOT NULL constraint, on insertion one must specify a value for such a column. MariaDB before 10.1.7 was automatically assigning a default value of 0 — this behavior was non-standard. Issuing an error at CREATE TABLE time is also non-standard.

See Also

  • describes an edge-case that may result in replication problems when replicating from a master server before this change to a slave server after this change.

This page is licensed: CC BY-SA / Gnu FDL

SELECT Modifier Hints

HIGH PRIORITY

HIGH_PRIORITY gives the statement a higher priority. If the table is locked, high priority SELECTs will be executed as soon as the lock is released, even if other statements are queued. HIGH_PRIORITY applies only if the storage engine only supports table-level locking (MyISAM, MEMORY, MERGE). See for details.

SQL_CACHE / SQL_NO_CACHE

If the system variable is set to 2 or DEMAND, and the current statement is cacheable, SQL_CACHE causes the query to be cached and SQL_NO_CACHE causes the query not to be cached. For UNIONs, SQL_CACHE or SQL_NO_CACHE should be specified for the first query. See also for more detail and a list of the types of statements that aren't cacheable.

SQL_BUFFER_RESULT

SQL_BUFFER_RESULT forces the optimizer to use a temporary table to process the result. This is useful to free locks as soon as possible.

SQL_SMALL_RESULT / SQL_BIG_RESULT

SQL_SMALL_RESULT and SQL_BIG_RESULT tell the optimizer whether the result is very big or not. Usually, GROUP BY and DISTINCT operations are performed using a temporary table. Only if the result is very big, using a temporary table is not convenient. The optimizer automatically knows if the result is too big, but you can force the optimizer to use a temporary table with SQL_SMALL_RESULT, or avoid the temporary table using SQL_BIG_RESULT.

STRAIGHT_JOIN

STRAIGHT_JOIN applies to the queries, and tells the optimizer that the tables must be read in the order they appear in the SELECT. For const and system table this options is sometimes ignored.

SQL_CALC_FOUND_ROWS

SQL_CALC_FOUND_ROWS is only applied when using the LIMIT clause. If this option is used, MariaDB will count how many rows would match the query, without the LIMIT clause. That number can be retrieved in the next query, using .

USE/FORCE/IGNORE INDEX

USE INDEX, FORCE INDEX and IGNORE INDEX constrain the query planning to a specific index. For further information about some of these options, see .

FORCE INDEX

Description

Forcing an index to be used is mostly useful when the optimizer decides to do a table scan even if you know that using an index would be better. (The optimizer could decide to do a table scan even if there is an available index when it believes that most or all rows will match and it can avoid the overhead of using the index).

FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.

FORCE INDEX cannot force an ignored index to be used - it will be treated as if it doesn't exist.

Example

This produces:

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

See Also

  • for more details

This page is licensed: CC BY-SA / Gnu FDL

USE INDEX

You can limit which indexes are considered with the USE INDEX option.

Syntax

Description

The default is 'FOR JOIN', which means that the hint only affects how the WHERE clause is optimized.

USE INDEX is used after the table name in the FROM clause.

USE INDEX cannot use an - it will be treated as if it doesn't exist.

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

Example

This produces:

If we had not used USE INDEX, the Name index would have been in possible keys.

See Also

  • for more details

This page is licensed: CC BY-SA / Gnu FDL

IGNORE INDEX

Syntax

Description

You can tell the optimizer to not consider a particular index with the IGNORE INDEX option.

The benefit of using IGNORE_INDEX instead of USE_INDEX is that it will not disable a new index which you may add later.

Also see for an option to specify in the index definition that indexes should be ignored.

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

Example

This is used after the table name in the FROM clause:

This produces:

See Also

  • See for more details

This page is licensed: CC BY-SA / Gnu FDL

Thread Pool in MariaDB 5.1 - 5.3

This page describes the old thread pool implementation in MariaDB up to version 5.3.

It's left here because some refer to it.

For the current implementation, refer to the page.

Delayed Insert Connection Thread States

This article documents thread states that are related to the connection thread that processes statements.

These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .

Value
Description

Fair Choice Between Range and Index_merge Optimizations

index_merge is a method used by the optimizer to retrieve rows from a single table using several index scans. The results of the scans are then merged.

When using , if index_merge is the plan chosen by the optimizer, it will show up in the "type" column. For example:

The "rows" column gives us a way to compare efficiency between index_merge and other plans.

It is sometimes necessary to discard index_merge in favor of a different plan to avoid a combinatorial explosion of possible range and/or index_merge strategies. But, the old logic in MySQL for when index_merge was rejected caused some good index_merge plans to not even be considered. Specifically, additional AND predicates in WHERE clauses could cause an index_merge plan to be rejected in favor of a less efficient plan. The slowdown could be anywhere from 10x to over 100x. Here are two examples (based on the previous query) using MySQL:

Improvements to ORDER BY Optimization

Available tuning for ORDER BY with small LIMIT

  • In 2024, fix for has added into MariaDB starting from 10.6. It allows one to enable extra optimization for ORDER BY with small LIMIT.

LooseScan Strategy

LooseScan is an execution strategy for .

The idea

We will demonstrate the LooseScan strategy by example. Suppose, we're looking for countries that have satellites. We can get them using the following query (for the sake of simplicity we ignore satellites that are owned by consortiums of multiple countries):

Suppose, there is an index on Satellite.country_code. If we use that index, we will get satellites in the order of their owner country:

index_merge sort_intersection

Prior to , the index_merge access method supported union,sort-union, and intersection operations. Starting from , thesort-intersection operation is also supported. This allows the use ofindex_merge in a broader number of cases.

This feature is disabled by default. To enable it, turn on the optimizer switch index_merge_sort_intersection like so:

Storage Engine Index Types

This refers to the index_type definition when creating an index, i.e. BTREE, HASH or RTREE.

For more information on general types of indexes, such as primary keys, unique indexes etc, go to .

Storage Engine
Permitted Indexes

Compression Plugins

MariaDB starting with

Compressions plugins were added in a preview release.

The various MariaDB storage engines, such as , , , can use different compression libraries.

Before , each separate library would have to be compiled in order to be available for use, resulting in numerous runtime/rpm/deb dependencies, most of which would never be used by users.

From , five additional MariaDB compression libraries (besides the default zlib) are available as plugins (note that these affect InnoDB and Mroonga only; RocksDB still uses the compression algorithms from its own library):

  • bzip2

DISTINCT removal in aggregate functions

Basics

One can use DISTINCT keyword to de-duplicate the arguments of an aggregate function. For example:

In order to compute this, MariaDB has to collect the values of col1 and remove the duplicates. This may be computationally expensive.

After the fix for (available from , , , , , , ), the optimizer can detect certain cases when argument of aggregate function will not have duplicates and so de-duplication can be skipped.

hash_join_cardinality optimizer_switch Flag

MariaDB starting with

The hash_join_cardinality optimizer_switch flag was added in , , , , and .

In MySQL and MariaDB, the output cardinality of a part of query has historically been tied to the used access method(s). This is different from the approach used in database textbooks. There, the cardinality "x JOIN y" is the same regardless of which access methods are used to compute it.

Example

Consider a query joining customers with their orders:

Sargable DATE and YEAR

Starting from , conditions in the form

are sargable, provided that

  • CMP is any of =, <=>, <, <=, >, >= .

INSERT INTO t1() VALUES();
Query OK, 1 row affected, 1 warning (0.00 sec)
Warning (Code 1364): Field 'c2' doesn't have a default value

SELECT * FROM t1;
+----+----+
| c1 | c2 |
+----+----+
|  1 |  0 |
+----+----+
USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
HIGH_PRIORITY and LOW_PRIORITY clauses
query_cache_type
The Query Cache
JOIN
FOUND_ROWS()
How to force query plans

Split Materialized Optimization

This is another name for Lateral Derived Optimization.

This page is licensed: CC BY-SA / Gnu FDL

upgrading lock

Attempting to get lock on the table in order to insert rows.

Waiting for INSERT

Waiting for the delayed-insert connection thread to add rows to the queue.

An event has been read from the binary log, and is now being sent to the slave.

Waiting to finalize termination

State that only occurs very briefly while the thread is terminating.

binary log

Reading event from the relay log

Reading an event from the relay log in order to process the event.

Slave has read all relay log, waiting for the slave I/O thread to update it

All relay log events have been processed, now waiting for the I/O thread to write new events to the relay log.

Waiting for work from SQL thread

In parallel replication the worker thread is waiting for more things from the SQL thread.

Waiting for prior transaction to start commit before starting next transaction

In parallel replication the worker thread is waiting for conflicting things to end before starting executing.

Waiting for worker threads to be idle

Happens in parallel replication when moving to a new binary log after a master restart. All slave temporary files are deleted, and worker threads are restarted.

Waiting due to global read lock

In parallel replication when worker threads are waiting for a global read lock to be released.

Waiting for worker threads to pause for global read lock

FLUSH TABLES WITH READ LOCK is waiting for worker threads to finish what they are doing.

Waiting while replication worker thread pool is busy

Happens in parallel replication during a FLUSH TABLES WITH READ LOCK or when changing number of parallel workers.

Waiting for other master connection to process GTID received on multiple master connections

A worker thread noticed that there is already another thread executing the same GTID from another connection and it's waiting for the other to complete.

Waiting for slave mutex on exit

Thread is stopping. Only occurs very briefly.

Waiting for the next event in relay log

State before reading next event from the relay log.

LOAD DATA INFILE

Links

ext4

XFS

Btrfs

Max file size

16-256 TB

8 EB

16 EB

Max filesystem size

1 EB

8 EB

16 EB

log files
NFS

Lock to access the delayed-insert handler thread has been received. Follows from the waiting for handler lock state and before the allocating local table state.

got old table

The initialization phase is over. Follows from the waiting for handler open state.

storing row into queue

Adding new row to the list of rows to be inserted by the delayed-insert handler thread.

waiting for delay_list

Initializing (trying to find the delayed-insert handler thread).

waiting for handler insert

Waiting for new inserts, as all inserts have been processed.

waiting for handler lock

Waiting for delayed insert-handler lock to access the delayed-insert handler thread.

waiting for handler open

Waiting for the delayed-insert handler thread to initialize. Follows from the Creating delayed handler state and before the got old table state.

This page is licensed: CC BY-SA / Gnu FDL

allocating local table

Preparing to allocate rows to the delayed-insert handler thread. Follows from the got handler lock state.

Creating delayed handler

Creating a handler for the delayed-inserts.

INSERT DELAYED
SHOW PROCESSLIST
Information Schema PROCESSLIST Table
Performance Schema threads Table

got handler lock

BTREE is generally the default index type. For MEMORY tables, HASH is the default. TokuDB uses a particular data structure called fractal trees, which is optimized for data that do not entirely fit memory.

Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes. B-Tree Index Characteristics

B-tree Indexes

B-tree indexes are used for column comparisons using the >, >=, =, >=, < or BETWEEN operators, as well as for LIKE comparisons that begin with a constant.

For example, the query SELECT * FROM Employees WHERE First_Name LIKE 'Maria%'; can make use of a B-tree index, while SELECT * FROM Employees WHERE First_Name LIKE '%aria'; cannot.

B-tree indexes also permit leftmost prefixing for searching of rows.

If the number or rows doesn't change, hash indexes occupy a fixed amount of memory, which is lower than the memory occupied by BTREE indexes.

Hash Indexes

Hash indexes, in contrast, can only be used for equality comparisons, so those using the = or <=> operators. They cannot be used for ordering, and provide no information to the optimizer on how many rows exist between two values.

Hash indexes do not permit leftmost prefixing - only the whole index can be used.

R-tree Indexes

See SPATIAL for more information.

This page is licensed: CC BY-SA / Gnu FDL

Aria

BTREE, RTREE

MyISAM

BTREE, RTREE

InnoDB

BTREE

MEMORY/HEAP

Getting Started with Indexes

HASH, BTREE

Index Hints: How to Force Query Plans
USE INDEX
IGNORE INDEX
Ignored Indexes
ignored index
Index Hints: How to Force Query Plans
IGNORE INDEX
FORCE INDEX
Ignored Indexes
Ignored Indexes
Index Hints: How to Force Query Plans
USE INDEX
FORCE INDEX
Ignored Indexes
CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,CountryCode FROM City FORCE INDEX (Name)
WHERE name>="A" AND CountryCode >="A";
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	City	range	Name	Name	35	NULL	4079	Using where
CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City USE INDEX (CountryCode)
WHERE name="Helsingborg" AND countrycode="SWE";
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	City	ref	CountryCode	CountryCode	3	const	14	Using where
CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City IGNORE INDEX (Name)
WHERE name="Helsingborg" AND countrycode="SWE";
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	City	ref	CountryCode	CountryCode	3	const	14	Using where
About Pool of Threads

This is an extended version of the pool-of-threads code from MySQL 6.0. This allows you to use a limited set of threads to handle all queries, instead of the old 'one-thread-per-connection' style. In recent times, its also been referred to as "thread pool" or "thread pooling" as this feature (in a different implementation) is available in Enterprise editions of MySQL (not in the Community edition).

This can be a very big win if most of your queries are short running queries and there are few table/row locks in your system.

Instructions

To enable pool-of-threads you must first run configure with the--with-libevent option. (This is automatically done if you use any 'max' scripts in the BUILD directory):

When starting mysqld with the pool of threads code you should use:

Default values are:

One issue with pool-of-threads is that if all worker threads are doing work (like running long queries) or are locked by a row/table lock no new connections can be established and you can't login and find out what's wrong or login and kill queries.

To help this, we have introduced two new options for mysqld; extra_port and extra_max_connections:

If extra-port is <> 0 then you can connect max_connections number of normal threads + 1 extra SUPER user through the 'extra-port' TCP/IP port. These connections use the old one-thread-per-connection method.

To connect with through the extra port, use:

This allows you to freely use, on connection bases, the optimal connection/thread model.

See also

  • Thread-handling and thread-pool-size variables

  • How MySQL Uses Threads for Client Connections

This page is licensed: CC BY-SA / Gnu FDL

Thread Pool in MariaDB
In the above output, the "rows" column shows that the first is almost 10x less efficient and the second is over 15x less efficient than index_merge.

Starting in , the optimizer will delay discarding potentialindex_merge plans until the point where it is really necessary.

By not discarding potential index_merge plans until absolutely necessary, the two queries stay just as efficient as the original:

This new behavior is always on and there is no need to enable it. There are no known issues or gotchas with this new optimization.

This page is licensed: CC BY-SA / Gnu FDL

MariaDB [ontime]> SELECT COUNT(*) FROM ontime;
+--------+
|count(*)|
+--------+
| 1578171|
+--------+

MySQL [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA');
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+--------------------------------------+
|id|select_type|table |type       |possible_keys|key        |key_len|ref |rows |Extra                                 |
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+--------------------------------------+
| 1|SIMPLE     |ontime|index_merge|Origin,Dest  |Origin,Dest|6,6    |NULL|92800|Using union (Origin,Dest); Using where|
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+--------------------------------------+
EXPLAIN
Older optimizations

MariaDB brought several improvements to the ORDER BY optimizer.

The fixes were made as a response to complaints by MariaDB customers, so they fix real-world optimization problems. The fixes are a bit hard to describe (as the ORDER BY optimizer is complicated), but here's a short description:

The ORDER BY optimizer:

  • Doesn’t make stupid choices when several multi-part keys and potential range accesses are present (MDEV-6402).

    • This also fixes MySQL Bug#12113.

  • Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT (MDEV-6657).

  • Tries hard to be smart and use cost/number of records estimates from other parts of the optimizer (, ).

    • This change also fixes .

  • Takes full advantage of InnoDB’s Extended Keys feature when checking if filesort() can be skipped ().

Extra optimizations

  • The ORDER BY optimizer takes multiple-equalities into account (MDEV-8989). This optimization is enabled by default.

Comparison with MySQL 5.7

In MySQL 5.7 changelog, one can find this passage:

Make switching of index due to small limit cost-based (WL#6986) : We have made the decision in make_join_select() of whether to switch to a new index in order to support "ORDER BY ... LIMIT N" cost-based. This work fixes Bug#73837.

MariaDB is not using Oracle's fix (we believe make_join_select is not the right place to do ORDER BY optimization), but the effect is the same.

See Also

  • Blog post MariaDB 10.1: Better query optimization for ORDER BY … LIMIT

This page is licensed: CC BY-SA / Gnu FDL

MDEV-34720
optimizer_join_limit_pref_ratio optimization
The LooseScan strategy doesn't really need ordering, what it needs is grouping. In the above figure, satellites are grouped by country. For instance, all satellites owned by Australia come together, without being mixed with satellites of other countries. This makes it easy to select just one satellite from each group, which you can join with its country and get a list of countries without duplicates:
loosescan-diagram-no-where

LooseScan in action

The EXPLAIN output for the above query looks as follows:

Factsheet

  • LooseScan avoids the production of duplicate record combinations by putting the subquery table first and using its index to select one record from multiple duplicates

  • Hence, in order for LooseScan to be applicable, the subquery should look like:

or

  • LooseScan can handle correlated subqueries

  • LooseScan can be switched off by setting the loosescan=off flag in the optimizer_switch variable.

This page is licensed: CC BY-SA / Gnu FDL

Semi-join subqueries
loosescan-satellites-ordered-r2
Limitations of index_merge/intersection

Prior to , the index_merge access method had one intersection strategy called intersection. That strategy can only be used when merged index scans produced rowid-ordered streams. In practice this means that anintersection could only be constructed from equality (=) conditions.

For example, the following query will use intersection:

but if you replace OriginState ='CA' with OriginState IN ('CA', 'GB') (which matches the same number of records), then intersection is not usable anymore:

The latter query would also run 5.x times slower (from 2.2 to 10.8 seconds) in our experiments.

How index_merge/sort_intersection improves the situation

In , when index_merge_sort_intersection is enabled,index_merge intersection plans can be constructed from non-equality conditions:

In our tests, this query ran in 3.2 seconds, which is not as good as the case with two equalities, but still much better than 10.8 seconds we were getting without sort_intersect.

The sort_intersect strategy has higher overhead than intersect but is able to handle a broader set of WHERE conditions.

intersect-vs-sort-intersect

When to Use

index_merge/sort_intersection works best on tables with lots of records and where intersections are sufficiently large (but still small enough to make a full table scan overkill).

The benefit is expected to be bigger for io-bound loads.

This page is licensed: CC BY-SA / Gnu FDL

lzma

  • lz4

  • lzo

  • snappy

  • Installing

    Depending on how MariaDB was installed, the libraries may already be available for installation, or may first need to be installed as .deb or .rpm packages, for example:

    Once available, install as a plugin, for example:

    The compression algorithm can then be used, for example, in InnoDB compression:

    Upgrading

    When upgrading from a release without compression plugins, if a non-zlib compression algorithm was used, those tables will be unreadable until the appropriate compression library is installed. mariadb-upgrade should be run. The --force option (to run mariadb-check) or mariadb-check itself will indicate any problems with compression, for example:

    or

    In this case, the appropriate compression plugin should be installed, and the server restarted.

    See Also

    • 10.7 preview feature: Compression Provider Plugins (mariadb.org blog)

    • Add zstd as a compression plugin - MDEV-34290

    This page is licensed: CC BY-SA / Gnu FDL

    InnoDB
    RocksDB
    Mroonga

    When one can skip de-duplication

    A basic example: if we're doing a select from one table, then the values of primary_key are already distinct:

    If the SELECT has other constant tables, that's also ok, as they will not create duplicates.

    The next step: a part of the primary key can be "bound" by the GROUP BY clause. Consider a query:

    Suppose the table has PRIMARY KEY(pk1, pk2). Grouping by pk2 fixes the value of pk2 within each group. Then, the values of pk1 must be unique within each group, and de-duplication is not necessary.

    Observability

    EXPLAIN or EXPLAIN FORMAT=JSON do not show any details about how aggregate functions are computed. One has to look at the Optimizer Trace. Search for aggregator_type:

    When de-duplication is necessary, it will show:

    When de-duplication is not necessary, it will show:

    This page is licensed: CC BY-SA / Gnu FDL

    MDEV-30660
    MariaDB 10.6.18
    MariaDB 10.11.8
    MariaDB 11.4.2
    Suppose, table orders has an index
    IDX
    on
    orders.customer_id
    .

    If the query plan is using this index to fetch orders for each customer, the optimizer will use index statistics from IDX to estimate the number of rows in the customer-joined-with-orders.

    On the other hand, if the optimizer considers a query plan that joins customer with orders without use of indexes, it will ignore the customer.id = orders.customer_id equality completely and will compute the output cardinality as if customer was cross-joined with orders.

    Hash Join

    MariaDB supports . It is not enabled by default, one needs to set it join_cache_level to 3 or a bigger value to enable it.

    Before MDEV-30812, Query optimization for Block Hash Join would work as described in the above example: It would assume that the join operation is a cross join.

    MDEV-30812 introduces a new optimizer_switch flag, hash_join_cardinality. In MariaDB versions before 11.0, it is off by default.

    If one sets it to ON, the optimizer will make use of column histograms when computing the cardinality of hash join operation output.

    One can see the computation in the Optimizer Trace, search for hash_join_cardinality.

    This page is licensed: CC BY-SA / Gnu FDL

    10.6.13
    MariaDB 10.11.3
    MariaDB 10.6.13

    indexed_date_col has a type of DATE, DATETIME or TIMESTAMP and is a part of some index.

    One can swap the left and right hand sides of the equality: const_value CMP {DATE|YEAR}(indexed_date_col) is also handled.

    Sargable here means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or use them to perform partition pruning.

    Implementation

    Internally, the optimizer rewrites the condition to an equivalent condition which doesn't use YEAR or DATE functions.

    For example, YEAR(date_col)=2023 is rewritten intodate_col between '2023-01-01' and '2023-12-31'.

    Similarly, DATE(datetime_col) <= '2023-06-01' is rewritten intodatetime_col <= '2023-06-01 23:59:59'.

    Controlling the Optimization

    The optimization is always ON, there is no Optimizer Switch flag to control it.

    Optimizer Trace

    The rewrite is logged as date_conds_into_sargable transformation. Example:

    References

    • MDEV-8320: Allow index usage for DATE(datetime_column) = const

    This page is licensed: CC BY-SA / Gnu FDL

    MDEV-12248

    Replica I/O Thread States

    This article documents thread states that are related to replica I/O threads. These correspond to the Slave_IO_State shown by SHOW REPLICA STATUS and the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.

    Value
    Description

    Checking master version

    Checking the primary's version, which only occurs very briefly after establishing a connection with the primary.

    Connecting to master

    Attempting to connect to primary.

    This page is licensed: CC BY-SA / Gnu FDL

    Index Statistics

    Index statistics provide crucial insights to the MariaDB query optimizer, guiding it in executing queries efficiently. Up-to-date index statistics ensure optimized query performance.

    How Index Statistics Help the Query Optimizer

    Understanding index statistics is crucial for the MariaDB query optimizer to efficiently execute queries. Accurate and current statistics guide the optimizer in choosing the best way to access data, similar to using a personal address book for quicker searches rather than a larger phone book. Up-to-date index statistics ensure optimized query performance.

    Value Groups

    The statistics primarily focus on groups of index elements with identical values. In a primary key, each index is unique, resulting in a group size of one. In a non-unique index, multiple keys may share the same value. The worst-case scenario involves large groups with identical values, such as an index on a boolean field.

    MariaDB makes heavy use of the average group size statistic. For example, if there are 100 rows, and twenty groups with the same index values, the average group size would be five.

    However, averages can be skewed by extremes, and the usual culprit is NULL values. The row of 100 may have 19 groups with an average size of 1, while the other 81 values are all NULL. MariaDB may think five is a good average size and choose to use that index, and then end up having to read through 81 rows with identical keys, taking longer than an alternative.

    Dealing with NULLs

    There are three main approaches to the problem of NULLs. NULL index values can be treated as a single group (nulls_equal). This is usually fine, but if you have large numbers of NULLs the average group size is slanted higher, and the optimizer may miss using the index for ref accesses when it would be useful. This is the default used by . The opposite approach is nulls_unequal, with each NULL forming its own group of one. Conversely, the average group size is slanted lower, and the optimizer may use the index for ref accesses when not suitable. This is the default used by the and storage engines. A third option, nulls_ignored, sees NULLs ignored altogether from index group calculations.

    The default approaches can be changed by setting the , and server variables.

    Null-Safe and Regular Comparisons

    The comparison operator used plays an important role. If two values are compared with <=> (see the comparison operator), and both are null, 1 is returned. If the same values are compared with = (see the comparison operator) null is returned. For example:

    Engine-Independent Statistics

    introduced a way to gather statistics independently of the storage engine. See .

    Histogram-Based Statistics

    were introduced in , and are collected by default from .

    See Also

    • . This plugin provides user, client, table and index usage statistics.

    This page is licensed: CC BY-SA / Gnu FDL

    Aborting Statements that Exceed a Certain Time to Execute

    Overview

    introduced the max_statement_time system variable. When set to a non-zero value, the server attempts to abort any queries taking longer than this time in seconds.

    The abortion is not immediate; the server checks the timer status at specific intervals during execution. Consequently, a query may run slightly longer than the specified time before being detected and stopped.

    The default is zero, and no limits are then applied. The aborted query has no effect on any larger transaction or connection contexts. The variable is of type double, thus you can use subsecond timeout. For example you can use value 0.01 for 10 milliseconds timeout.

    The value can be set globally or per session, as well as per user or per query (see below). Replicas are not affected by this variable, however from , there is which serves the same purpose on replicas only.

    An associated status variable, , stores the number of queries that have exceeded the execution time specified by , and a MAX_STATEMENT_TIME_EXCEEDED column was added to the and Information Schema tables.

    The feature was based upon a patch by Davi Arnaut.

    Important Note on Reliability

    MAX_STATEMENT_TIME relies on the execution thread checking the "killed" flag, which happens intermittently.

    • Long Running Operations: If a query enters a long processing phase where the flag is not checked (e.g., certain storage engine operations or complex calculations), it may continue running significantly past the limit.

    User

    can be stored per user with the syntax.

    Per-query

    By using in conjunction with , it is possible to limit the execution time of individual queries. For example:

    max_statement_time per query Individual queries can also be limited by adding a MAX_STATEMENT_TIME clause to the query. For example:

    Limitations

    • does not work in embedded servers.

    • does not work for statements in a Galera cluster (see for discussion).

    • Check Intervals: The timeout is checked only at specific points during query execution. Queries stuck in operations where the check code path is not hit will not abort until they reach a checkpoint. This can result in query times exceeding the MAX_STATEMENT_TIME value.

    Differences Between the MariaDB and MySQL Implementations

    MySQL 5.7.4 introduced similar functionality, but the MariaDB implementation differs in a number of ways.

    • The MySQL version of (max_execution_time) is defined in millseconds, not seconds

    • MySQL's implementation can only kill SELECTs, while MariaDB's can kill any queries (excluding stored procedures).

    • MariaDB only introduced the status variable, while MySQL also introduced a number of other variables which were not seen as necessary in MariaDB.

    See Also

    • variable

    This page is licensed: CC BY-SA / Gnu FDL

    Condition Pushdown into Derived Table Optimization

    If a query uses a derived table (or a view), the first action that the query optimizer will attempt is to apply the derived-table-merge-optimization and merge the derived table into its parent select. However, that optimization is only applicable when the select inside the derived table has a join as the top-level operation. If it has a GROUP-BY, DISTINCT, or uses window functions, then derived-table-merge-optimization is not applicable.

    In that case, the Condition Pushdown optimization is applicable.

    Introduction to Condition Pushdown

    Consider an example

    The naive way to execute the above is to

    1. Compute the OCT_TOTALS contents (for all customers).

    2. The, select the line with customer_id=1

    This is obviously inefficient, if there are 1000 customers, then one will be doing up to 1000 times more work than necessary.

    However, the optimizer can take the condition customer_id=1 and push it down into the OCT_TOTALS view.

    Inside the OCT_\TOTALS, the added condition is put into its HAVING clause, so we end up with:

    Then, parts of HAVING clause that refer to GROUP BY columns are moved into the WHERE clause:

    Once a restriction like customer_id=1 is in the WHERE, the query optimizer can use it to construct efficient table access paths.

    Controlling the Optimization

    The optimization is enabled by default. One can disable it by setting the flag condition_pushdown_for_derived to OFF.

    The pushdown from HAVING to WHERE part is controlled by condition_pushdown_from_having flag in .

    From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

    No optimizer hint is available.

    See Also

    • Condition Pushdown through Window Functions (since )

    • (since )

    • The Jira task for the feature is .

    This page is licensed: CC BY-SA / Gnu FDL

    Derived Table Merge Optimization

    Background

    Users of "big" database systems are used to using FROM subqueries as a way to structure their queries. For example, if one's first thought was to select cities with population greater than 10,000 people, and then that from these cities to select those that are located in Germany, one could write this SQL:

    For MySQL, using such syntax was taboo. If you run EXPLAIN for this query, you can see why:

    It plans to do the following actions:

    derived-inefficent

    From left to right:

    1. Execute the subquery: (SELECT * FROM City WHERE Population > 1*1000), exactly as it was written in the query.

    2. Put result of the subquery into a temporary table.

    3. Read back, and apply a WHERE condition from the upper select, big_city.Country='DEU'

    Executing a subquery like this is very inefficient, because the highly-selective condition from the parent select, (Country='DEU') is not used when scanning the base table City. We read too many records from theCity table, and then we have to write them into a temporary table and read them back again, before finally filtering them out.

    Derived table merge in action

    If one runs this query in MariaDB/MySQL 5.6, they get this:

    From the above, one can see that:

    1. The output has only one line. This means that the subquery has been merged into the top-level SELECT.

    2. Table City is accessed through an index on the Country column. Apparently, the Country='DEU' condition was used to construct ref access on the table.

    Factsheet

    • Derived tables (subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same as requirements for VIEWs to allow algorithm=merge.

    • The optimization is enabled by default. It can be disabled with:

    • Versions of MySQL and MariaDB which do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. ) of EXPLAIN statements taking a very long time. Starting from + and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.

    See Also

    • FAQ entry:

    This page is licensed: CC BY-SA / Gnu FDL

    Derived Table with Key Optimization

    The idea

    If a derived table cannot be merged into its parent SELECT, it will be materialized in a temporary table, and then parent select will treat it as a regular base table.

    Before /MySQL 5.6, the temporary table would never have any indexes, and the only way to read records from it would be a full table scan. Starting from the mentioned versions of the server, the optimizer has an option to create an index and use it for joins with other tables.

    Example

    Consider a query: we want to find countries in Europe, that have more than one million people living in cities. This is accomplished with this query:

    The EXPLAIN output for it will show:

    One can see here that

    • table <derived2> is accessed through key0.

    • ref column shows world.Country.Code

    • if we look that up in the original query, we find the equality that was used to construct ref access: Country.Code=cities_in_country.Country

    Factsheet

    • The idea of "derived table with key" optimization is to let the materialized derived table have one key which is used for joins with other tables.

    • The optimization is applied then the derived table could not be merged into its parent SELECT

      • which happens when the derived table doesn't meet criteria for mergeable VIEW

    • The optimization is ON by default; it can be switched off like so:

    See Also

    • in MySQL 5.6 manual

    This page is licensed: CC BY-SA / Gnu FDL

    Filesort with Small LIMIT Optimization

    Optimization Description

    When n is sufficiently small, the optimizer will use a priority queue for sorting. Before the optimization's porting to , the alternative was, roughly speaking, to sort the entire output and then pick only first n rows.

    NOTE: The problem of choosing which index to use for query with ORDER BY ... LIMIT is a different problem, see optimizer_join_limit_pref_ratio-optimization.

    Optimization Visibility in MariaDB

    There are two ways to check whether filesort has used a priority queue.

    Status Variable

    The first way is to check the status variable. It shows the number of times that sorting was done through a priority queue. (The total number of times sorting was done is a sum and ).

    Slow Query Log

    The second way is to check the slow query log. When one uses and specifies , entries look like this

    Note the "Priority_queue: Yes" on the last comment line. (pt-query-digest is able to parse slow query logs with the Priority_queue field)

    As for EXPLAIN, it will give no indication whether filesort uses priority queue or the generic quicksort and merge algorithm. Using filesort will be shown in both cases, by both MariaDB and MySQL.

    See Also

    • page in the MySQL 5.6 manual (search for "priority queue").

    • MySQL WorkLog entry,

    • ,

    This page is licensed: CC BY-SA / Gnu FDL

    Sargable UPPER

    Starting from , expressions in the form

    are sargable if key_col uses either the utf8mb3_general_ci or utf8mb4_general_ci collation.

    UCASE is a synonym for UPPER so is covered as well.

    Sargable means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or perform partition pruning.

    Example

    Note that ref access is used.

    An example with join:

    Here, the optimizer was able to construct ref access.

    Controlling the Optimization

    The variable has the flag sargable_casefold to turn the optimization on and off. The default is ON.

    Optimizer Trace

    The optimization is implemented as a rewrite for a query's WHERE/ON conditions. It uses the sargable_casefold_removal object name in the trace:

    References

    • : Make optimizer handle UCASE(varchar_col)=...

    • An analog for is not possible. See : Make optimizer handle LCASE(varchar_col)=... for details.

    This page is licensed: CC BY-SA / Gnu FDL

    Thread Command Values

    A thread can have any of the following COMMAND values (displayed by the COMMAND field listed by the statement or in the , as well as the PROCESSLIST_COMMAND value listed in the ). These indicate the nature of the thread's activity.

    Value
    Description

    Storage-Engine Independent Column Compression

    Storage-engine independent column compression enables , , , , , , , , and columns to be compressed.

    This is performed by means of a new COMPRESSED :COMPRESSED[=<compression_method>]

    Currently the only supported compression method is zlib.

    Field Length Compatibility

    FirstMatch Strategy

    FirstMatch is an execution strategy for .

    The idea

    It is very similar to how IN/EXISTS subqueries were executed in MySQL 5.x.

    Let's take the usual example of a search for countries with big cities:

    Suppose, our execution plan is to find countries in Europe, and then, for each found country, check if it has any big cities. Regular inner join execution will look as follows:

    Optimizing for "Latest News"-style Queries

    The problem space

    Let's say you have "news articles" (rows in a table) and want a web page showing the latest ten articles about a particular topic.

    Variants on "topic":

    • Category

    OPTIMIZE TABLE

    Syntax

    Description

    OPTIMIZE TABLE has two main functions. It can either be used to defragment tables, or to update the InnoDB fulltext index.

    Equality propagation optimization

    Basic idea

    Consider a query with a WHERE clause:

    the WHERE clause will compute to true only if col1=col2. This means that in the rest of the WHERE clause occurrences of col1 can be substituted with col2 (with some limitations which are discussed in the next section). This allows the optimizer to infer additional restrictions.

    For example:

    allows to infer a new equality:

    Rollup Unique User Counts

    The Problem

    The normal way to count "Unique Users" is to take large log files, sort by userid, dedup, and count. This requires a rather large amount of processing. Furthermore, the count derived cannot be rolled up. That is, daily counts cannot be added to get weekly counts -- some users will be counted multiple times.

    So, the problem is to store the counts is such a way as to allow rolling up.

    ./configure --with-libevent
    mysqld --thread-handling=pool-of-threads --thread-pool-size=20
    thread-handling=  one-thread-per-connection
    thread-pool-size= 20
    --extra-port=#             (Default 0)
    --extra-max-connections=#  (Default 1)
    mysql --port='number-of-extra-port' --protocol=tcp
    MySQL [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND securitydelay=0;
    +--+-----------+------+----+-------------------------+-------------+-------+-----+------+-----------+
    |id|select_type|table |type|possible_keys            |key          |key_len|ref  |rows  |Extra      |
    +--+-----------+------+----+-------------------------+-------------+-------+-----+------+-----------+
    | 1|SIMPLE     |ontime|ref |Origin,Dest,SecurityDelay|SecurityDelay|5      |const|791546|Using where|
    +--+-----------+------+----+-------------------------+-------------+-------+-----+------+-----------+
    
    MySQL [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND depdelay < 12*60;
    +--+-----------+------+----+--------------------+----+-------+----+-------+-----------+
    |id|select_type|table |type|possible_keys       |key |key_len|ref |rows   |Extra      |
    +--+-----------+------+----+--------------------+----+-------+----+-------+-----------+
    | 1|SIMPLE     |ontime|ALL |Origin,DepDelay,Dest|NULL|NULL   |NULL|1583093|Using where|
    +--+-----------+------+----+--------------------+----+-------+----+-------+-----------
    MariaDB [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA');
    +--+-----------+------+-----------+-------------+-----------+-------+----+-----+-------------------------------------+
    |id|select_type|table |type       |possible_keys|key        |key_len|ref |rows |Extra                                |
    +--+-----------+------+-----------+-------------+-----------+-------+----+-----+-------------------------------------+
    | 1|SIMPLE     |ontime|index_merge|Origin,Dest  |Origin,Dest|6,6    |NULL|92800|Using union(Origin,Dest); Using where|
    +--+-----------+------+-----------+-------------+-----------+-------+----+-----+-------------------------------------+
    
    MariaDB [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND securitydelay=0;
    +--+-----------+------+-----------+-------------------------+-----------+-------+----+-----+-------------------------------------+
    |id|select_type|table |type       |possible_keys            |key        |key_len|ref |rows |Extra                                |
    +--+-----------+------+-----------+-------------------------+-----------+-------+----+-----+-------------------------------------+
    | 1|SIMPLE     |ontime|index_merge|Origin,Dest,SecurityDelay|Origin,Dest|6,6    |NULL|92800|Using union(Origin,Dest); Using where|
    +--+-----------+------+-----------+-------------------------+-----------+-------+----+-----+-------------------------------------+
    
    MariaDB [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND depdelay < 12*60;
    +--+-----------+------+-----------+--------------------+-----------+-------+----+-----+-------------------------------------+
    |id|select_type|table |type       |possible_keys       |key        |key_len|ref |rows |Extra                                |
    +--+-----------+------+-----------+--------------------+-----------+-------+----+-----+-------------------------------------+
    | 1|SIMPLE     |ontime|index_merge|Origin,DepDelay,Dest|Origin,Dest|6,6    |NULL|92800|Using union(Origin,Dest); Using where|
    +--+-----------+------+-----------+--------------------+-----------+-------+----+-----+-------------------------------------+
    SELECT * FROM Country  
    WHERE 
      Country.code IN (SELECT country_code FROM Satellite)
    MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select country_code from Satellite);
    +----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
    | id | select_type | table     | type   | possible_keys | key          | key_len | ref                          | rows | Extra                               |
    +----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
    |  1 | PRIMARY     | Satellite | index  | country_code  | country_code | 9       | NULL                         |  932 | Using where; Using index; LooseScan |
    |  1 | PRIMARY     | Country   | eq_ref | PRIMARY       | PRIMARY      | 3       | world.Satellite.country_code |    1 | Using index condition               |
    +----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
    expr IN (SELECT tbl.keypart1 FROM tbl ...)
    expr IN (SELECT tbl.keypart2 FROM tbl WHERE tbl.keypart1=const AND ...)
    SET optimizer_switch='index_merge_sort_intersection=on'
    MySQL [ontime]> EXPLAIN SELECT AVG(arrdelay) FROM ontime WHERE depdel15=1 AND OriginState ='CA';
    +--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
    |id|select_type|table |type       |possible_keys       |key                 |key_len|ref |rows |Extra                                            |
    +--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
    | 1|SIMPLE     |ontime|index_merge|OriginState,DepDel15|OriginState,DepDel15|3,5    |NULL|76952|Using intersect(OriginState,DepDel15);Using where|
    +--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
    MySQL [ontime]> EXPLAIN SELECT avg(arrdelay) FROM ontime where depdel15=1 and OriginState IN ('CA', 'GB');
    +--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
    |id|select_type|table |type|possible_keys       |key     |key_len|ref  |rows |Extra      |
    +--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
    | 1|SIMPLE     |ontime|ref |OriginState,DepDel15|DepDel15|5      |const|36926|Using where|
    +--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
    MySQL [ontime]> EXPLAIN SELECT avg(arrdelay) FROM ontime where depdel15=1 and OriginState IN ('CA', 'GB');
    +--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
    |id|select_type|table |type       |possible_keys       |key                 |key_len|ref |rows |Extra                                                   |
    +--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
    | 1|SIMPLE     |ontime|index_merge|OriginState,DepDel15|DepDel15,OriginState|5,3    |NULL|60754|Using sort_intersect(DepDel15,OriginState); Using where |
    +--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
    apt-get install mariadb-plugin-provider-lz4
    INSTALL SONAME 'provider_lz4';
    SET GLOBAL innodb_compression_algorithm = lz4;
    Warning  : MariaDB tried to use the LZMA compression, but its provider plugin is not loaded
    
    Error    : Table 'test.t' doesn't exist in engine
    
    status   : Operation failed
    Error    : Table test/t is compressed with lzma, which is not currently loaded. 
      Please load the lzma provider plugin to open the table
    
    error    : Corrupt
    SELECT COUNT(DISTINCT col1) FROM tbl1;
    SELECT aggregate_func(DISTINCT tbl.primary_key, ...) FROM tbl;
    SELECT aggregate_func(DISTINCT t1.pk1, ...) FROM t1 GROUP BY t1.pk2;
    {
                "prepare_sum_aggregators": {
                  "function": "count(distinct t1.col1)",
                  "aggregator_type": "distinct"
                }
              }
    {
                "prepare_sum_aggregators": {
                  "function": "count(distinct t1.pk1)",
                  "aggregator_type": "simple"
                }
              }
    SELECT * 
    FROM
      customer, orders, ...
    WHERE 
      customer.id = orders.customer_id AND ...
    YEAR(indexed_date_col) CMP const_value
    DATE(indexed_date_col) CMP const_value
    {
                "transformation": "date_conds_into_sargable",
                "before": "cast(t1.datetime_col as date) <= '2023-06-01'",
                "after": "t1.datetime_col <= '2023-06-01 23:59:59'"
              },
    CREATE VIEW OCT_TOTALS AS
    SELECT
      customer_id,
      SUM(amount) AS TOTAL_AMT
    FROM orders
    WHERE  order_date BETWEEN '2017-10-01' AND '2017-10-31'
    GROUP BY customer_id;
    
    SELECT * FROM OCT_TOTALS WHERE customer_id=1
    SELECT * 
    FROM 
      (SELECT * FROM City WHERE Population > 10*1000) AS big_city
    WHERE 
      big_city.Country='DEU'
    mysql> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
      AS big_city WHERE big_city.Country='DEU' ;
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    | id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    |  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 4068 | Using where |
    |  2 | DERIVED     | City       | ALL  | Population    | NULL | NULL    | NULL | 4079 | Using where |
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    2 rows in set (0.60 sec)
    UPPER(key_col) = expr
    UPPER(key_col) IN (constant-list)
    MDEV-6384
    MDEV-465
    MySQL Bug#36817
    MDEV-6796
    The solution

    Let's think about what we can do with a hash of the userid. The hash could map to a bit in a bit string. A BIT_COUNT of the bit string would give the 1-bits, representing the number of users. But that bit string would have to be huge. What if we could use shorter bit strings? Then different userids would be folded into the same bit. Let's assume we can solve that.

    Meanwhile, what about the rollup? The daily bit strings can be OR'd together to get a similar bit string for the week.

    We have now figured out how to do the rollup, but have created another problem -- the counts are too low.

    Inflating the BIT_COUNT

    A sufficiently random hash (eg MD5) will fold userids into the same bits with a predictable frequency. We need to figure this out, and work backwards. That is, given that X percent of the bits are set, we need a formula that says approximately how many userids were used to get those bits.

    I simulated the problem by generating random hashes and calculated the number of bits that would be set. Then, with the help of Eureqa software, I derived the formula:

    Y = 0.5456_X + 0.6543_tan(1.39_X_X*X)

    How good is it?

    The formula is reasonably precise. It is usually within 1% of the correct value; rarely off by 2%.

    Of course, if virtually all the bits are set, the forumla can't be very precise. Hence, you need to plan to have the bit strings big enough to handle the expected number of Uniques. In practice, you can use less than 1 bit per Unique. This would be a huge space savings over trying to save all the userids.

    Another suggestion... If you are rolling up over a big span of time (eg hourly -> monthly), the bit strings must all be the same length, and the monthly string must be big enough to handle the expected count. This is likely to lead to very sparse hourly bit strings. Hence, it may be prudent to compress the hourly stings.

    Postlog

    Invented Nov, 2013; published Apr, 2014

    Future: Rick is working on actual code (Sep, 2016) It is complicated by bit-wise operations being limited to BIGINT. However, with MySQL 8.0 (freshly released), the desired bit-wise operations can be applied to BLOB, greatly simplifying my code. I hope to publish the pre-8.0 code soon; 8.0 code later.

    See also

    Rick James graciously allowed us to use this article in the documentation.

    Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: uniques

    This page is licensed: CC BY-SA / Gnu FDL

    When using the COMPRESSED attribute, note that FIELD LENGTH is reduced by 1; for example, a BLOB has a length of 65535, while BLOB COMPRESSED has 65535-1. See MDEV-15592.

    New System Variables

    column_compression_threshold

    • Description: Minimum column data length eligible for compression.

    • Command line: --column-compression-threshold=#

    • Scope: Global, Session

    • Dynamic: Yes

    • Data Type: numeric

    • Default Value: 100

    • Range: 0 to 4294967295

    column_compression_zlib_level

    • Description: zlib compression level (1 gives best speed, 9 gives best compression).

    • Command line: --column-compression-zlib-level=#

    • Scope: Global, Session

    • Dynamic: Yes

    • Data Type: numeric

    • Default Value: 6

    • Range: 1 to 9

    column_compression_zlib_strategy

    • Description: The strategy parameter is used to tune the compression algorithm. Use the value DEFAULT_STRATEGY for normal data, FILTERED for data produced by a filter (or predictor), HUFFMAN_ONLY to force Huffman encoding only (no string match), or RLE to limit match distances to one (run-length encoding). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect of FILTERED is to force more Huffman coding and less string matching; it is somewhat intermediate between DEFAULT_STRATEGY and HUFFMAN_ONLY. RLE is designed to be almost as fast as HUFFMAN_ONLY, but give better compression for PNG image data. The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately. FIXED prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications.

    • Command line: --column-compression-zlib-strategy=#

    • Scope: Global, Session

    • Dynamic: Yes

    • Data Type: enum

    • Default Value: DEFAULT_STRATEGY

    • Valid Values: DEFAULT_STRATEGY, FILTERED, HUFFMAN_ONLY, RLE, FIXED

    column_compression_zlib_wrap

    • Description: If set to 1 (0 is default), generate zlib header and trailer and compute adler32 check value. It can be used with storage engines that don't provide data integrity verification to detect data corruption.

    • Command line: --column-compression-zlib-wrap{=0|1}

    • Scope: Global, Session

    • Dynamic: Yes

    • Data Type: boolean

    • Default Value: OFF

    New Status Variables

    Column_compressions

    • Description: Incremented each time field data is compressed.

    • Scope: Global, Session

    • Data Type: numeric

    Column_decompressions

    • Description: Incremented each time field data is decompressed.

    • Scope: Global, Session

    • Data Type: numeric

    Limitations

    • The only supported method currently is zlib.

    • The CSV storage engine stores data uncompressed on-disk even if the COMPRESSED attribute is present.

    • It is not possible to create indexes over compressed columns.

    Comparison with InnoDB Page Compression

    Storage-independent column compression is different to InnoDB Page Compression in a number of ways.

    • It is storage engine independent, while InnoDB page compression applies to InnoDB only.

    • By being specific to a column, one can access non-compressed fields without the decompression overhead.

    • Only zlib is available, while InnoDB page compression can offer alternative compression algorithms.

    • It is not recommended to use multiple forms of compression over the same data.

    • It is intended for compressing large blobs, while InnoDB page compression is suitable for a more general case.

    • Columns cannot be indexed, while with InnoDB page compression indexes are possible as usual.

    Examples

    See Also

    • InnoDB Page Compression

    • InnoDB Compressed Row Format

    This page is licensed: CC BY-SA / Gnu FDL

    TINYBLOB
    BLOB
    MEDIUMBLOB
    LONGBLOB
    TINYTEXT
    TEXT
    MEDIUMTEXT
    LONGTEXT
    VARCHAR
    VARBINARY
    column attribute
    col2=123

    allows to infer that col2<10.

    Identity and comparison substitution

    There are some limitations to where one can do the substitution, though.

    The first and obvious example is the string datatype and collations. Most commonly-used collations in SQL are "case-insensitive", that is 'A'='a'. Also, most collations have a "PAD SPACE" attribute, which means that comparison ignores the spaces at the end of the value, 'a'='a '.

    Now, consider a query:

    Here, col1=col2, the values are "equal". At the same time LENGTH(col1)=2, while LENGTH(col2)=4, which means one can't perform the substiution for the argument of LENGTH(...).

    It's not only collations. There are similar phenomena when equality compares columns of different datatypes. The exact criteria of when thy happen are rather convoluted.

    The take-away is: sometimes, X=Y does not mean that one can replace any reference to X with Y. What one CAN do is still replace the occurrence in the comparisons <, >, >=, <=, etc.

    This is how we get two kinds of substitution:

    • Identity substitution: X=Y, and any occurrence of X can be replaced with Y.

    • Comparison substitution: X=Y, and an occurrence of X in a comparison (X<Z) can be replaced with Y (Y<Z).

    Place in query optimization

    (A draft description): Let's look at how Equality Propagation is integrated with the rest of the query optimization process.

    • First, multiple-equalities are built (TODO example from optimizer trace)

      • If multiple-equality includes a constant, fields are substituted with a constant if possible.

    • From this point, all optimizations like range optimization, ref access, etc make use of multiple equalities: when they see a reference to tableX.columnY somewhere, they also look at all the columns that tableX.columnY is equal to.

    • After the join order is picked, the optimizer walks through the WHERE clause and substitutes each field reference with the "best" one - the one that can be checked as soon as possible.

      • Then, the parts of the WHERE condition are attached to the tables where they can be checked.

    Interplay with ORDER BY optimization

    Consider a query:

    Suppose, there is an INDEX(col1). MariaDB optimizer is able to figure out that it can use an index on col1 (or sort by the value of col1) in order to resolve ORDER BY col2.

    Optimizer trace

    Look at these elements:

    • condition_processing

    • attaching_conditions_to_tables

    More details

    Equality propagation doesn't just happen at the top of the WHERE clause. It is done "at all levels" where a level is:

    • A top level of the WHERE clause.

    • If the WHERE clause has an OR clause, each branch of the OR clause.

    • The top level of any ON expression

    • (the same as above about OR-levels)

    This page is licensed: CC BY-SA / Gnu FDL

    CREATE TABLE cmp (i TEXT COMPRESSED);
    
    CREATE TABLE cmp2 (i TEXT COMPRESSED=zlib);
    WHERE col1=col2 AND ...
    WHERE col1=col2 AND col1=123
    WHERE col1=col2 AND col1 < 10
    INSERT INTO t1 (col1, col2) VALUES ('ab', 'ab   ');
    SELECT * FROM t1 WHERE col1=col2 AND LENGTH(col1)=2
    SELECT ... FROM ... WHERE col1=col2 ORDER BY col2

    Ignored Indexes

    InnoDB
    Aria
    MyISAM
    aria_stats_method
    myisam_stats_method
    innodb_stats_method
    null-safe-equal
    equal
    Engine-independent table statistics
    Histogram-Based Statistics
    User Statistics
    InnoDB Persistent Statistics
    Engine-independent Statistics
    Histogram-based Statistics

    Resource Protection: Because the abort is not guaranteed to be instantaneous or strictly enforced in all code paths, MAX_STATEMENT_TIME should not be relied upon as the sole mechanism for preventing resource exhaustion (such as filling up temporary disk space).

    The
    SELECT MAX_STATEMENT_TIME = N ...
    syntax is not valid in MariaDB. In MariaDB one should use
    SET STATEMENT MAX_STATEMENT_TIME=N FOR...
    .
    slave_max_statement_time
    max_statement_time_exceeded
    max_statement_time
    CLIENT_STATISTICS
    USER STATISTICS
    max_statement_time
    max_statement_time
    GRANT ... MAX_STATEMENT_TIME
    max_statement_time
    max_statement_time
    SET STATEMENT
    max_statement_time
    max_statement_time
    COMMIT
    MDEV-18673
    max_statement_time
    max_statement_time_exceeded
    Query limits and timeouts
    lock_wait_timeout
    The query will read about 90 rows, which is a big improvement over the 4079 row reads plus 4068 temporary table reads/writes we had before.
    MySQL Bug #44802
    .
    Optimizing Subqueries in the FROM Clause
    Subquery Optimizations Map

    Closing a .

    Connect

    slave is connected to its master.

    Connect Out

    Replication slave is in the process of connecting to its master.

    Create DB

    Executing an operation to create a database.

    Daemon

    Internal server thread rather than for servicing a client connection.

    Debug

    Generating debug information.

    Delayed insert

    A delayed-insert handler.

    Drop DB

    Executing an operation to drop a database.

    Error

    Error.

    Execute

    Executing a .

    Fetch

    Fetching the results of an executed .

    Field List

    Retrieving table column information.

    Init DB

    Selecting default database.

    Kill

    Killing another thread.

    Long Data

    Retrieving long data from the result of executing a .

    Ping

    Handling a server ping request.

    Prepare

    Preparing a .

    Processlist

    Preparing processlist information about server threads.

    Query

    Executing a statement.

    Quit

    In the process of terminating the thread.

    Refresh

    a table, logs or caches, or refreshing replication server or information.

    Register Slave

    Registering a slave server.

    Reset stmt

    Resetting a .

    Set option

    Setting or resetting a client statement execution option.

    Sleep

    Waiting for the client to send a new statement.

    Shutdown

    Shutting down the server.

    Statistics

    Preparing status information about the server.

    Table Dump

    Sending the contents of a table to a slave.

    Time

    Not used.

    This page is licensed: CC BY-SA / Gnu FDL

    Binlog Dump

    Master thread for sending binary log contents to a slave.

    Change user

    Executing a change user operation.

    SHOW PROCESSLIST
    Information Schema PROCESSLIST Table
    Performance Schema threads Table

    Close stmt

    firstmatch-inner-join

    Since Germany has two big cities (in this diagram), it will be put into the query output twice. This is not correct, SELECT ... FROM Country should not produce the same country record twice. The FirstMatch strategy avoids the production of duplicates by short-cutting execution as soon as the first genuine match is found:

    firstmatch-firstmatch

    Note that the short-cutting has to take place after "Using where" has been applied. It would have been wrong to short-cut after we found Trier.

    FirstMatch in action

    The EXPLAIN for the above query will look as follows:

    FirstMatch(Country) in the Extra column means that as soon as we have produced one matching record combination, short-cut the execution and jump back to the Country table.

    FirstMatch's query plan is very similar to one you would get in MySQL:

    and these two particular query plans will execute in the same time.

    Difference between FirstMatch and IN->EXISTS

    The general idea behind the FirstMatch strategy is the same as the one behind the IN->EXISTS transformation, however, FirstMatch has several advantages:

    • Equality propagation works across semi-join bounds, but not subquery bounds. Therefore, converting a subquery to semi-join and using FirstMatch can still give a better execution plan. (TODO example)

    • There is only one way to apply the IN->EXISTS strategy and MySQL will do it unconditionally. With FirstMatch, the optimizer can make a choice between whether it should run the FirstMatch strategy as soon as all tables used in the subquery are in the join prefix, or at some later point in time. (TODO: example)

    FirstMatch factsheet

    • The FirstMatch strategy works by executing the subquery and short-cutting its execution as soon as the first match is found.

    • This means, subquery tables must be after all of the parent select's tables that are referred from the subquery predicate.

    • EXPLAIN shows FirstMatch as "FirstMatch(tableN)".

    • The strategy can handle correlated subqueries.

    • But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.

    • Use of the FirstMatch strategy is controlled with the firstmatch=on|off flag in the variable.

    See Also

    • Semi-join subquery optimizations

    In-depth material:

    • WL#3750: initial specification for FirstMatch

    This page is licensed: CC BY-SA / Gnu FDL

    Semi-join subqueries

    Tag

  • Provider (of news article)

  • Manufacturer (of item for sale)

  • Ticker (financial stock)

  • Variants on "news article"

    • Item for sale

    • Blog comment

    • Blog thread

    Variants on "latest"

    • Publication date (unix_timestamp)

    • Most popular (keep the count)

    • Most emailed (keep the count)

    • Manual ranking (1..10 -- 'top ten')

    Variants on "10" - there is nothing sacred about "10" in this discussion.

    The performance issues

    Currently you have a table (or a column) that relates the topic to the article. The SELECT statement to find the latest 10 articles has grown in complexity, and performance is poor. You have focused on what index to add, but nothing seems to work.

    • If there are multiple topics for each article, you need a many-to-many table.

    • You have a flag "is_deleted" that needs filtering on.

    • You want to "paginate" the list (ten articles per page, for as many pages as necessary).

    The solution

    First, let me give you the solution, then I will elaborate on why it works well.

    • One new table called, say, Lists.

    • Lists has exactly 3 columns: topic, article_id, sequence

    • Lists has exactly 2 indexes: PRIMARY KEY(topic, sequence, article_id), INDEX(article_id)

    • Only viewable articles are in Lists. (This avoids the filtering on "is_deleted", etc)

    • Lists is . (This gets "clustering".)

    • "sequence" is typically the date of the article, but could be some other ordering.

    • "topic" should probably be normalized, but that is not critical to this discussion.

    • "article_id" is a link to the bulky row in another table(s) that provide all the details about the article.

    The queries

    Find the latest 10 articles for a topic:

    You must not have any WHERE condition touching columns in Articles.

    When you mark an article for deletion; you must remove it from Lists:

    I emphasize "must" because flags and other filtering is often the root of performance issues.

    Why it works

    By now, you may have discovered why it works.

    The big goal is to minimize the disk hits. Let's itemize how few disk hits are needed. When finding the latest articles with 'normal' code, you will probably find that it is doing significant scans of the Articles table, failing to quickly home in on the 10 rows you want. With this design, there is only one extra disk hit:

    • 1 disk hit: 10 adjacent, narrow, rows in Lists -- probably in a single "block".

    • 10 disk hits: The 10 articles. (These hits are unavoidable, but may be cached.) The PRIMARY KEY, and using InnoDB, makes these quite efficient.

    OK, you pay for this by removing things that you should avoid.

    • 1 disk hit: INDEX(article_id) - finding a few ids

    • A few more disk hits to DELETE rows from Lists. This is a small price to pay -- and you are not paying it while the user is waiting for the page to render.

    See also

    Rick James graciously allowed us to use this article in the documentation.

    Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: lists

    This page is licensed: CC BY-SA / Gnu FDL

    WAIT/NOWAIT

    Set the lock wait timeout. See WAIT and NOWAIT.

    Defragmenting

    OPTIMIZE TABLE works for InnoDB (before , only if the innodb_file_per_table server system variable is set), Aria, MyISAM and ARCHIVE tables, and should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions.

    This statement requires SELECT and INSERT privileges for the table.

    By default, OPTIMIZE TABLE statements are written to the binary log and will be replicated. The NO_WRITE_TO_BINLOG keyword (LOCAL is an alias) will ensure the statement is not written to the binary log.

    OPTIMIZE TABLE statements are not logged to the binary log if read_only is set. See also Read-Only Replicas.

    OPTIMIZE TABLE is also supported for partitioned tables. You can use[ALTER TABLE](../../../../reference/sql-statements-and-structure/sql-statements/data-definition/alter/alter-table.md) ... OPTIMIZE PARTITION to optimize one or more partitions.

    You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. With other storage engines, OPTIMIZE TABLE does nothing by default, and returns this message: " The storage engine for the table doesn't support optimize". However, if the server has been started with the --skip-new option, OPTIMIZE TABLE is linked to ALTER TABLE, and recreates the table. This operation frees the unused space and updates index statistics.

    The Aria storage engine supports for this statement.

    If a MyISAM table is fragmented, concurrent inserts will not be performed until an OPTIMIZE TABLE statement is executed on that table, unless the concurrent_insert server system variable is set to ALWAYS.

    Updating an InnoDB fulltext index

    When rows are added or deleted to an InnoDB fulltext index, the index is not immediately re-organized, as this can be an expensive operation. Change statistics are stored in a separate location. The fulltext index is only fully re-organized when an OPTIMIZE TABLE statement is run.

    By default, an OPTIMIZE TABLE will defragment a table. In order to use it to update fulltext index statistics, the innodb_optimize_fulltext_only system variable must be set to 1. This is intended to be a temporary setting and should be reset to 0 once the fulltext index has been re-organized.

    Since fulltext re-organization can take a long time, the innodb_ft_num_word_optimize variable limits the re-organization to a number of words (2000 by default). You can run multiple OPTIMIZE statements to fully re-organize the index.

    Defragmenting InnoDB tablespaces

    merged the Facebook/Kakao defragmentation patch, allowing one to use OPTIMIZE TABLE to defragment InnoDB tablespaces. For this functionality to be enabled, the innodb_defragment system variable must be enabled. No new tables are created and there is no need to copy data from old tables to new tables. Instead, this feature loads n pages (determined by innodb-defragment-n-pages) and tries to move records so that pages would be full of records and then frees pages that are fully empty after the operation. Note that tablespace files (including ibdata1) will not shrink as the result of defragmentation, but one will get better memory utilization in the InnoDB buffer pool as there are fewer data pages in use.

    See Defragmenting InnoDB Tablespaces for more details.

    See Also

    • Optimize Table in InnoDB with ALGORITHM set to INPLACE

    • Optimize Table in InnoDB with ALGORITHM set to NOCOPY

    • Optimize Table in InnoDB with ALGORITHM set to INSTANT

    This page is licensed: GPLv2, originally from fill_help_tables.sql

    Queueing master event to the relay log

    Event is being copied to the relay log after being read, where it can be processed by the SQL thread.

    Reconnecting after a failed binlog dump request

    Attempting to reconnect to the primary after a previously failed binary log dump request.

    Reconnecting after a failed master event read

    Attempting to reconnect to the primary after a previously failed request. After successfully connecting, the state will change to Waiting for master to send event.

    Registering slave on master

    Registering the replica on the primary, which only occurs very briefly after establishing a connection with the primary.

    Requesting binlog dump

    Requesting the contents of the binary logs from the given log file name and position. Only occurs very briefly after establishing a connection with the primary.

    Waiting for master to send event

    Waiting for binary log events to arrive after successfully connecting. If there are no new events on the primary, this state can persist for as many seconds as specified by the slave_net_timeout system variable, after which the thread will reconnect. Prior to , , , , and , the time was from SLAVE START. From these versions, the time is since reading the last event.

    Waiting for slave mutex on exit

    Waiting for replica mutex while the thread is stopping. Only occurs very briefly.

    Waiting for the slave SQL thread to free enough relay log space.

    Relay log has reached its maximum size, determined by relay_log_space_limit (no limit by default), so waiting for the SQL thread to free up space by processing enough relay log events.

    Waiting for master update

    State before connecting to primary.

    Waiting to reconnect after a failed binlog dump request

    Waiting to reconnect after a binary log dump request has failed due to disconnection. The length of time in this state is determined by the MASTER_CONNECT_RETRY clause of the CHANGE MASTER TO statement.

    Waiting to reconnect after a failed master event read

    Sleeping while waiting to reconnect after a disconnection error. The time in seconds is determined by the MASTER_CONNECT_RETRY clause of the CHANGE MASTER TO statement.

    optimizer_switch
    optimizer_switch
    DERIVED_CONDITION_PUSHDOWN and NO_DERIVED_CONDITION_PUSHDOWN
    Condition Pushdown into IN Subqueries
    MDEV-9197
    Sort_priority_queue_sorts
    Sort_range
    Sort_scan
    Extended statistics in the slow query log
    log_slow_verbosity=query_plan
    slow query log
    LIMIT Optimization
    WL#1393
    MDEV-415
    MDEV-6430
    optimizer_switch
    MDEV-31496
    LCASE
    MDEV-31955

    Ignored Indexes

    Ignored indexes allow indexes to be visible and maintained without being used by the optimizer. This feature is comparable to MySQL 8’s "invisible indexes."

    This feature is available from MariaDB 10.6.

    Ignored indexes are indexes that are visible and maintained, but which are not used by the optimizer. MySQL 8 has a similar feature which they call "invisible indexes".

    Syntax

    By default, an index is not ignored. One can mark existing index as ignored (or not ignored) with an statement:

    It is also possible to specify IGNORED attribute when creating an index with a , or statement:

    table's primary key cannot be ignored. This applies to both explicitly defined primary key, as well as implicit primary key - if there is no explicit primary key defined but the table has a unique key containing only NOT NULL columns, the first of such keys becomes the implicitly defined primary key.

    Handling for ignored indexes

    The optimizer will treats ignored indexes as if they didn't exist. They will not be used in the query plans, or as a source of statistical information. Also, an attempt to use an ignored index in a USE INDEX, FORCE INDEX, or IGNORE INDEX hint will result in an error - the same what would have if one used a name of a non-existent index.

    Information about whether or not indexes are ignored can be viewed in the IGNORED column in the or the statement.

    Intended Usage

    The primary use case is as follows: a DBA sees an index that seems to have little or no usage and considers whether to remove it. Dropping the index is a risk as it may still be needed in a few cases. For example, the optimizer may rely on the estimates provided by the index without using the index in query plans. If dropping an index causes an issue, it will take a while to re-create the index. On the other hand, marking the index as ignored (or not ignored) is instant, so the suggested workflow is:

    1. Mark the index as ignored

    2. Check if everything continues to work

    3. If not, mark the index as not ignored.

    4. If everything continues to work, one can safely drop the index.

    Examples

    The optimizer does not make use of an index when it is ignored, while if the index is not ignored (the default), the optimizer will consider it in the optimizer plan, as shown in the output.

    This page is licensed: CC BY-SA / Gnu FDL

    optimizer_adjust_secondary_key_costs

    optimizer_adjust_secondary_key_costs

    • Description: Gives the user the ability to affect how the costs for secondary keys using ref are calculated in the few cases when MariaDB 10.6 up to MariaDB 10.11 makes a sub-optimal choice when optimizing ref access, either for key lookups or GROUP BY. ref, as used by , means that the optimizer is using key-lookup on one value to find the matching rows from a table. Unused from . In the variable was changed from a number to a set of strings and disable_forced_index_in_group_by (value 4) was added.

    • Scope: Global, Session

    • Dynamic: Yes

    • Data Type: set

    • Default Value: fix_reuse_range_for_ref, fix_card_multiplier

    • Range: 0 to 63 or any combination of adjust_secondary_key_cost, disable_max_seek or disable_forced_index_in_group_by, fix_innodb_cardinality,fix_reuse_range_for_ref, fix_card_multiplier

    • Introduced: ,

    MariaDB starting with

    optimizer_adjust_secondary_key_costs will be obsolete starting from as the new optimizer in 11.0 does not have max_seek optimization and is already using cost based choices for index usage with GROUP BY.

    The value for optimizer_adjust_secondary_key_costs is one of more of the following:

    Value
    Version added
    Old behavior
    Change when variable is used

    One can set all options with:

    Explanations of the old behavior in MariaDB 10.x

    The reason for the max_seek optimization was originally to ensure that MariaDB would use a key instead of a table scan. This works well for a lot of queries, but can cause problems when a table scan is a better choice, such as when one would have to scan more than 1/4 of the rows in the table (in which case a table scan is better).

    See Also

    • The system variable.

    This page is licensed: CC BY-SA / Gnu FDL

    Rowid Filtering Optimization

    The target use case for rowid filtering is as follows:

    • a table uses ref access on index IDX1

    • but it also has a fairly restrictive range predicate on another index IDX2.

    In this case, it is advantageous to:

    • Do an index-only scan on index IDX2 and collect rowids of index records into a data structure that allows filtering (let's call it $FILTER).

    • When doing ref access on IDX1, check $FILTER before reading the full record.

    Example

    Consider a query

    Suppose the condition on l_shipdate is very restrictive, which means lineitem table should go first in the join order. Then, the optimizer can use o_orderkey=l_orderkey equality to do an index lookup to get the order the line item is from. On the other hand o_totalprice between ... can also be rather selective.

    With filtering, the query plan would be:

    Note that table orders has "Using rowid filter". The type column has "|filter", the key column shows the index that is used to construct the filter. rows column shows the expected filter selectivity, it is 5%.

    ANALYZE FORMAT=JSON output for table orders will show

    Note the rowid_filter element. It has a range element inside it. selectivity_pct is the expected selectivity, accompanied by the r_selectivity_pct showing the actual observed selectivity.

    Details

    • The optimizer makes a cost-based decision about when the filter should be used.

    • The filter data structure is currently an ordered array of rowids. (a Bloom filter would be better here and will probably be introduced in the future versions).

    • The optimization needs to be supported by the storage engine. At the moment, it is supported by and . It is not supported in .

    Limitations

    • Rowid Filtering can't be used with a backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query and decides to handle it with a backward-ordered index scan, it will disable Rowid Filtering.

    Control

    Rowid filtering can be switched on/off using rowid_filter flag in the variable. By default, the optimization is enabled.

    This page is licensed: CC BY-SA / Gnu FDL

    optimizer_join_limit_pref_ratio Optimization

    Basics

    Off (0) by default, when enabling this optimization, MariaDB will consider a join order that may shorten query execution time based on the ORDER BY ... LIMIT n clause. For small values of n, this may improve performance.

    Set the value of optimizer_join_limit_pref_ratio to a non-zero value to enable this option (higher values are more conservative, recommended value is 100), or set to 0 (the default value) to disable it.

    Detailed description

    Problem setting

    By default, the MariaDB optimizer picks a join order without considering the ORDER BY ... LIMIT clause, when present.

    For example, consider a query looking at latest 10 orders together with customers who made them:

    The two possible plans are:customer->orders:

    and orders->customer:

    The customer->orders plan computes a join between all customers and orders, saves that result into a temporary table, and then uses filesort to get the 10 most recent orders. This query plan doesn't benefit from the fact that just 10 orders are needed.

    However, in contrast, the orders->customers plan uses an index to read rows in the ORDER BY order. The query can stop execution once it finds 10 order-and-customer combinations. This is much faster than computing the entire join. Under this plan, and when this new optimization, we can leverage ORDER BY ... LIMIT to stop early, when we have the 10 combinations.

    Plans with LIMIT shortcuts are difficult to estimate

    It is fundamentally difficult to produce a reliable estimate for ORDER BY ... LIMIT shortcuts. Let's take an example from the previous section to see why. This query searches for last 10 orders that were shipped by air:

    Suppose we know beforehand that 50% of orders are shipped by air. Assuming there's no correlation between date and shipping method, orders->customer plan will need to scan 20 orders before we find 10 that are shipped by air. But if there is correlation, then we may need to scan up to (total_orders*0.5 + 10) before we find first 10 orders that are shipped by air. Scanning about 50% of all orders can be expensive.

    This situation worsens when the query has constructs whose selectivity is not known. For example, suppose the WHERE condition was

    in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.

    Providing guidelines to the optimizer

    Due to these challenges, the optimization is not enabled by default.

    When running a mostly OLTP workload such that query WHERE conditions have suitable indexes or are not very selective, then any ORDER BY ... LIMIT queries will typically find matching rows quickly. In this case, it makes sense to give the following guidance to the optimizer:

    The value of X is given to the optimizer via optimizer_join_limit_pref_ratio setting. Higher values carry less risk. The recommended value is 100: prefer the LIMIT join order if it promises at least 100x speedup.

    References

    • introduces optimizer_join_limit_pref_ratio optimization

    • is about future development that would make the optimizer handle such cases without user guidance.

    This page is licensed: CC BY-SA / Gnu FDL

    Virtual Column Support in the Optimizer

    This feature is available from MariaDB 11.8.

    The optimizer can recognize use of indexed virtual column expressions in the WHERE clause and use them to construct range and ref(const) accesses.

    Index Condition Pushdown

    Index Condition Pushdown is an optimization that is applied for access methods that access table data through indexes: range, ref, eq_ref, ref_or_null, and .

    The idea is to check part of the WHERE condition that refers to index fields (we call it Pushed Index Condition) as soon as we've accessed the index. If the Pushed Index Condition is not satisfied, we won't need to read the whole table record.

    Index Condition Pushdown is on by default. To disable it, set its optimizer_switch flag like so:

    When Index Condition Pushdown is used, EXPLAIN will show "Using index condition":

    Duplicate Weedout Strategy

    DuplicateWeedout is an execution strategy for .

    The idea

    The idea is to run the semi-join (a query with uses WHERE X IN (SELECT Y FROM ...)) as if it were a regular inner join, and then eliminate the duplicate record combinations using a temporary table.

    Suppose, you have a query where you're looking for countries which have more than 33% percent of their population in one big city:

    First, we run a regular inner join between the City

    not_null_range_scan Optimization

    The NOT NULL range scan optimization enables the optimizer to construct range scans from NOT NULL conditions that it was able to infer from the WHERE clause.

    The optimization appeared in . It is not enabled by default; one needs to set an optimizer_switch flag to enable it.

    Description

    A basic (but slightly artificial) example:

    The WHERE condition in this form cannot be used for range scans. However, one can infer that it will reject rows that NULL for weight. That is, infer an additional condition that

    SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
    +---------+---------------+------------+
    | 1 <=> 1 | NULL <=> NULL | 1 <=> NULL |
    +---------+---------------+------------+
    |       1 |             1 |          0 |
    +---------+---------------+------------+
    
    SELECT 1 = 1, NULL = NULL, 1 = NULL;
    +-------+-------------+----------+
    | 1 = 1 | NULL = NULL | 1 = NULL |
    +-------+-------------+----------+
    |     1 |        NULL |     NULL |
    +-------+-------------+----------+
    SET STATEMENT max_statement_time=100 FOR 
      SELECT field1 FROM table_name ORDER BY field1;
    SELECT MAX_STATEMENT_TIME=2 * FROM t1;
    MariaDB [world]> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
      AS big_city WHERE big_city.Country='DEU';
    +----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
    | id | select_type | table | type | possible_keys      | key     | key_len | ref   | rows | Extra                              |
    +----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
    |  1 | SIMPLE      | City  | ref  | Population,Country | Country | 3       | const |   90 | Using index condition; Using where |
    +----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
    1 row in set (0.00 sec)
    SET @@optimizer_switch='derived_merge=OFF'
    SELECT * 
    FROM
       Country, 
       (SELECT 
           SUM(City.Population) AS urban_population, 
           City.Country 
        FROM City 
        GROUP BY City.Country 
        HAVING 
        urban_population > 1*1000*1000
       ) AS cities_in_country
    WHERE 
      Country.Code=cities_in_country.Country AND Country.Continent='Europe';
    +----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
    | id | select_type | table      | type | possible_keys     | key       | key_len | ref                | rows | Extra                           |
    +----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
    |  1 | PRIMARY     | Country    | ref  | PRIMARY,continent | continent | 17      | const              |   60 | Using index condition           |
    |  1 | PRIMARY     | <derived2> | ref  | key0              | key0      | 3       | world.Country.Code |   17 |                                 |
    |  2 | DERIVED     | City       | ALL  | NULL              | NULL      | NULL    | NULL               | 4079 | Using temporary; Using filesort |
    +----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
    SET optimizer_switch='derived_with_keys=off'
    SELECT * FROM Country 
    WHERE Country.code IN (SELECT City.Country 
                           FROM City 
                           WHERE City.Population > 1*1000*1000)
          AND Country.continent='Europe'
    MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where City.Population > 1*1000*1000)
        AND Country.continent='Europe';
    +----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
    | id | select_type | table   | type | possible_keys      | key       | key_len | ref                | rows | Extra                            |
    +----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
    |  1 | PRIMARY     | Country | ref  | PRIMARY,continent  | continent | 17      | const              |   60 | Using index condition            |
    |  1 | PRIMARY     | City    | ref  | Population,Country | Country   | 3       | world.Country.Code |   18 | Using where; FirstMatch(Country) |
    +----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
    2 rows in set (0.00 sec)
    MySQL [world]> EXPLAIN SELECT * FROM Country  WHERE Country.code IN 
      (select City.Country from City where City.Population > 1*1000*1000) 
       AND Country.continent='Europe';
    +----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
    | id | select_type        | table   | type           | possible_keys      | key       | key_len | ref   | rows | Extra                              |
    +----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
    |  1 | PRIMARY            | Country | ref            | continent          | continent | 17      | const |   60 | Using index condition; Using where |
    |  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country   | 3       | func  |   18 | Using where                        |
    +----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
    2 rows in set (0.01 sec)
    SELECT  a.*
        FROM  Articles a
        JOIN  Lists s ON s.article_id = a.article_id
        WHERE  s.topic = ?
        ORDER BY  s.sequence DESC
        LIMIT  10;
    DELETE  FROM  Lists
        WHERE  article_id = ?;
    OPTIMIZE [NO_WRITE_TO_BINLOG | LOCAL] TABLE
        tbl_name [, tbl_name] ...
        [WAIT n | NOWAIT]
    SELECT
      customer_id,
      SUM(amount) AS TOTAL_AMT
    FROM orders
    WHERE  order_date BETWEEN '2017-10-01' AND '2017-10-31'
    GROUP BY customer_id
    HAVING
      customer_id=1 
    SELECT
      customer_id,
      SUM(amount) AS TOTAL_AMT
    FROM orders
    WHERE
      order_date BETWEEN '2017-10-01' AND '2017-10-31' AND
      customer_id=1
    GROUP BY customer_id
    # Time: 140714 18:30:39
    # User@Host: root[root] @ localhost []
    # Thread_id: 3  Schema: test  QC_hit: No
    # Query_time: 0.053857  Lock_time: 0.000188  Rows_sent: 11  Rows_examined: 100011
    # Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
    # Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: Yes
    SET TIMESTAMP=1405348239;SET TIMESTAMP=1405348239;
    SELECT * FROM t1 WHERE col1 BETWEEN 10 AND 20 ORDER BY col2 LIMIT 100;
    CREATE TABLE t1 (
      key1 VARCHAR(32) COLLATE utf8mb4_general_ci,
      ...
      KEY(key1)
    );
    EXPLAIN SELECT * FROM t1 WHERE UPPER(key1)='ABC'
    +------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
    | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
    +------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
    |    1 | SIMPLE      | t1    | ref  | key1          | key1 | 131     | const | 1    | Using where; Using index |
    +------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
    EXPLAIN SELECT * FROM t0,t1 WHERE upper(t1.key1)=t0.col;
    +------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
    | id   | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra       |
    +------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
    |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL | NULL    | NULL        | 10   | Using where |
    |    1 | SIMPLE      | t1    | ref  | key1          | key1 | 131     | test.t0.col | 1    | Using index |
    +------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
    "join_optimization": {
            "select_id": 1,
            "steps": [
              {
                "sargable_casefold_removal": {
                  "before": "ucase(t1.key1) = t0.col",
                  "after": "t1.key1 = t0.col"
                }
              },
    InnoDB
    prepared statement
    Replication
    prepared statement
    prepared statement
    prepared statement
    prepared statement
    Flushing
    status variable
    prepared statement
    optimizer_switch
    ALTER TABLE
    CREATE TABLE
    CREATE INDEX
    Information Schema STATISTICS table
    SHOW INDEX
    EXPLAIN
    InnoDB
    MyISAM
    partitioned tables
    optimizer_switch
    MDEV-34720
    MDEV-18079
    Example

    Consider this table with data in JSON format:

    In order to do efficient queries over data in JSON, you can add a virtual column, and an index on that column:

    Before MariaDB 11.8, you had to use vcol1 in the WHERE clause. Now, you can use the virtual column expression, too:

    General Considerations

    • In MariaDB, one has to create a virtual column and then create an index over it. Other databases allow to create an index directly over expression: create index on t1((col1+col2)). This is not yet supported in MariaDB (MDEV-35853).

    • The WHERE clause must use the exact same expression as in the virtual column definition.

    • The optimization is implemented in a way similar to MySQL – the optimizer finds potentially useful occurrences of vcol_expr in the WHERE clause and replaces them with vcol_name.

    • In the optimizer trace, the rewrites are shown like this:

    The following improvements are available from MariaDB 12.1.

    1. Improved Optimizer plans for SELECT statements with ORDER BY or GROUP BY virtual columns when the virtual column expressions are covered by indexes that can be used.

    2. Improved Optimizer plans for SELECT statements with ORDER BY or GROUP BY virtual columns expressions, by substitution of the virtual column expressions with virtual columns when the virtual columns are usable indexes themselves.

    3. The same improvements apply for single-table UPDATE or DELETE statements.

    Accessing JSON fields

    Cast the Value to the Desired Type

    SQL is strongly-typed language while JSON is weakly-typed. This means one must specify the desired datatype when accessing JSON data from SQL. In the above example, we declared vcol1 as INT and then used (CAST ... AS INTEGER) (both in the ALTER TABLE and in the WHERE clause in SELECT query:):

    Specify the Collation for Strings

    When extracting string values, CAST is not necessary, as JSON_VALUE returns strings. However, you must take into account collations. Consider this column declared as JSON:

    The collation of json_data is utf8mb4_bin. The collation of JSON_VALUE(json_data, ...) is utf8mb4_bin, too.

    Most use cases require a more commonly-used collation. It is possible to achieve that using the COLLATE clause:

    References

    • MDEV-35616: Add basic optimizer support for virtual columns

    This page is licensed: CC BY-SA / Gnu FDL

    ALTER TABLE table_name ALTER {KEY|INDEX} [IF EXISTS] key_name [NOT] IGNORED;
    CREATE TABLE table_name (
      ...
      INDEX index_name ( ...) [NOT] IGNORED
      ...
    CREATE INDEX index_name (...) [NOT] IGNORED ON tbl_name (...);
    CREATE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b) IGNORED);
    CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b));
    ALTER TABLE t1 ALTER INDEX k1 IGNORED;
    CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT);
    CREATE INDEX k1 ON t1(b) IGNORED;
    SELECT * FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_NAME = 't1'\G
    *************************** 1. row ***************************
    TABLE_CATALOG: def
     TABLE_SCHEMA: test
       TABLE_NAME: t1
       NON_UNIQUE: 0
     INDEX_SCHEMA: test
       INDEX_NAME: PRIMARY
     SEQ_IN_INDEX: 1
      COLUMN_NAME: id
        COLLATION: A
      CARDINALITY: 0
         SUB_PART: NULL
           PACKED: NULL
         NULLABLE: 
       INDEX_TYPE: BTREE
          COMMENT: 
    INDEX_COMMENT: 
          IGNORED: NO
    *************************** 2. row ***************************
    TABLE_CATALOG: def
     TABLE_SCHEMA: test
       TABLE_NAME: t1
       NON_UNIQUE: 1
     INDEX_SCHEMA: test
       INDEX_NAME: k1
     SEQ_IN_INDEX: 1
      COLUMN_NAME: b
        COLLATION: A
      CARDINALITY: 0
         SUB_PART: NULL
           PACKED: NULL
         NULLABLE: YES
       INDEX_TYPE: BTREE
          COMMENT: 
    INDEX_COMMENT: 
          IGNORED: YES
    SHOW INDEXES FROM t1\G
    *************************** 1. row ***************************
            Table: t1
       Non_unique: 0
         Key_name: PRIMARY
     Seq_in_index: 1
      Column_name: id
        Collation: A
      Cardinality: 0
         Sub_part: NULL
           Packed: NULL
             Null: 
       Index_type: BTREE
          Comment: 
    Index_comment: 
          Ignored: NO
    *************************** 2. row ***************************
            Table: t1
       Non_unique: 1
         Key_name: k1
     Seq_in_index: 1
      Column_name: b
        Collation: A
      Cardinality: 0
         Sub_part: NULL
           Packed: NULL
             Null: YES
       Index_type: BTREE
          Comment: 
    Index_comment: 
          Ignored: YES
    CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b) IGNORED);
    
    EXPLAIN SELECT * FROM t1 ORDER BY b;
    +------+-------------+-------+------+---------------+------+---------+------+------+----------------+
    | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
    +------+-------------+-------+------+---------------+------+---------+------+------+----------------+
    |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1    | Using filesort |
    +------+-------------+-------+------+---------------+------+---------+------+------+----------------+
    
    ALTER TABLE t1 ALTER INDEX k1 NOT IGNORED;
    
    EXPLAIN SELECT * FROM t1 ORDER BY b;
    +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
    +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    |    1 | SIMPLE      | t1    | index | NULL          | k1   | 5       | NULL | 1    | Using index |
    +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    SELECT ...
    FROM orders JOIN lineitem ON o_orderkey=l_orderkey
    WHERE
      l_shipdate BETWEEN '1997-01-01' AND '1997-01-31' AND
      o_totalprice between 200000 and 230000;
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: lineitem
             type: range
    possible_keys: PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity
              key: i_l_shipdate
          key_len: 4
              ref: NULL
             rows: 98
            Extra: Using index condition
    *************************** 2. row ***************************
               id: 1
      select_type: SIMPLE
            table: orders
             type: eq_ref|filter
    possible_keys: PRIMARY,i_o_totalprice
              key: PRIMARY|i_o_totalprice
          key_len: 4|9
              ref: dbt3_s001.lineitem.l_orderkey
             rows: 1 (5%)
            Extra: Using where; Using rowid filter
    "table": {
          "table_name": "orders",
          "access_type": "eq_ref",
          "possible_keys": ["PRIMARY", "i_o_totalprice"],
          "key": "PRIMARY",
          "key_length": "4",
          "used_key_parts": ["o_orderkey"],
          "ref": ["dbt3_s001.lineitem.l_orderkey"],
          "rowid_filter": {
            "range": {
              "key": "i_o_totalprice",
              "used_key_parts": ["o_totalprice"]
            },
            "rows": 69,
            "selectivity_pct": 4.6,
            "r_rows": 71,
            "r_selectivity_pct": 10.417,
            "r_buffer_size": 53,
            "r_filling_time_ms": 0.0716
          }
    SELECT *
    FROM
      customer,ORDER
    WHERE
      customer.name=ORDER.customer_name
    ORDER BY
      ORDER.DATE DESC
    LIMIT 10
    +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
    | id   | select_type | table    | type | possible_keys | key           | key_len | ref           | rows | Extra                                        |
    +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
    |    1 | SIMPLE      | customer | ALL  | name          | NULL          | NULL    | NULL          | 9623 | Using where; Using temporary; Using filesort |
    |    1 | SIMPLE      | orders   | ref  | customer_name | customer_name | 103     | customer.name | 1    |                                              |
    +------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
    +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
    | id   | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows | Extra       |
    +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
    |    1 | SIMPLE      | orders   | index | customer_name | order_date | 4       | NULL                 | 10   | Using where |
    |    1 | SIMPLE      | customer | ref   | name          | name       | 103     | orders.customer_name | 1    |             |
    +------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
    SELECT *
    FROM
      customer,ORDER
    WHERE
      customer.name=ORDER.customer_name 
      AND ORDER.shipping_method='Airplane'
    ORDER BY
      ORDER.DATE DESC
    LIMIT 10
    order.shipping_method='%Airplane%'
    Do consider the query plan using LIMIT short-cutting 
    and prefer it if it promises at least X times speedup.
    CREATE TABLE t1 (json_data JSON);
    INSERT INTO t1 VALUES('{"column1": 1234}'); 
    INSERT INTO t1 ...
    ALTER TABLE t1
      ADD COLUMN vcol1 INT AS (cast(json_value(json_data, '$.column1') AS INTEGER)),
      ADD INDEX(vcol1);
    -- This uses the index before 11.8:
    EXPLAIN SELECT * FROM t1 WHERE vcol1=100;
    -- Starting from 11.8, this uses the index, too:
    EXPLAIN SELECT * FROM t1 
    WHERE cast(json_value(json_data, '$.column1') AS INTEGER)=100;
    +------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
    | id   | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra |
    +------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
    |    1 | SIMPLE      | t1    | ref  | vcol1         | vcol1 | 5       | const | 1    |       |
    +------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
    "virtual_column_substitution": {
                  "condition": "WHERE",
                  "resulting_condition": "t1.vcol1 = 100"
                }
    ALTER TABLE t1
      ADD COLUMN vcol1 INT AS (CAST(json_value(json_data, '$.column1') AS INTEGER)) ...
    SELECT ...  WHERE ... CAST(json_value(json_data, '$.column1') AS INTEGER) ...;
    CREATE TABLE t1 ( 
      json_data JSON 
      ...
    ALTER TABLE t1
      ADD col1 VARCHAR(100) COLLATE utf8mb4_uca1400_ai_ci AS
      (json_value(js1, '$.string_column') COLLATE utf8mb4_uca1400_ai_ci),
      ADD INDEX(col1);
    ...
    SELECT  ... 
    WHERE
      json_value(js1, '$.string_column') COLLATE utf8mb4_uca1400_ai_ci='string-value';

    fix_innodb_cardinality

    By default InnoDB doubles the cardinality for indexes in an effort to force index usage over table scans. This can cause the optimizer to create sub-optimal plans for ranges or index entries that cover a big part of the table.

    Using this option removes the doubling of cardinality in InnoDB. fix_innodb_cardinality is recommended to be used only as a server startup option, as it is enabled for a table at first usage. See for details.

    fix_reuse_range_for_ref

    Number of estimated rows for 'ref' did not always match costs from range optimizer

    Use cost from range optimizer for 'ref' if all used key parts are constants. The old code did not always do this

    fix_card_multiplier

    Index selectivity can be bigger than 1.0 if index statistics is not up to date. Not on by default.

    Ensure that the calculated index selectivity is never bigger than 1.0. Having index selectivity bigger than 1.0 causes MariaDB to believe that there is more rows in the table than in reality, which can cause wrong plans. This option is on by default.

    adjust_secondary_key_cost

    10.6.17

    Limit ref costs by max_seeks

    The secondary key costs for ref are updated to be at least five times the clustered primary key costs if a clustered primary key exists

    disable_max_seek

    10.6.17

    ref cost on secondary keys is limited to max_seek = min('number of expected rows'/ 10, scan_time*3)

    Disable 'max_seek' optimization and do a slight adjustment of filter cost

    disable_forced_index_in_group_by

    10.6.18

    Use a rule-based choice when deciding to use an index to resolve GROUP BY

    EXPLAIN
    MariaDB 10.6.18
    MariaDB 10.6.17
    MariaDB 10.11.7
    optimizer_switch

    The choice is now cost based

    The Idea Behind Index Condition Pushdown

    In disk-based storage engines, making an index lookup is done in two steps, like shown on the picture:

    index-access-2phases

    Index Condition Pushdown optimization tries to cut down the number of full record reads by checking whether index records satisfy part of the WHERE condition that can be checked for them:

    index-access-with-icp

    How much speed will be gained depends on

    • How many records will be filtered out

    • How expensive it was to read them

    The former depends on the query and the dataset. The latter is generally bigger when table records are on disk and/or are big, especially when they have blobs.

    Example Speedup

    I used DBT-3 benchmark data, with scale factor=1. Since the benchmark defines very few indexes, we've added a multi-column index (index condition pushdown is usually useful with multi-column indexes: the first component(s) is what index access is done for, the subsequent have columns that we read and check conditions on).

    The query was to find big (l_quantity > 40) orders that were made in January 1993 that took more than 25 days to ship:

    EXPLAIN without Index Condition Pushdown:

    with Index Condition Pushdown:

    The speedup was:

    • Cold buffer pool: from 5 min down to 1 min

    • Hot buffer pool: from 0.19 sec down to 0.07 sec

    Status Variables

    There are two server status variables:

    Variable name
    Meaning

    Number of times pushed index condition was checked.

    Number of times the condition was matched.

    That way, the value Handler_icp_attempts - Handler_icp_match shows the number records that the server did not have to read because of Index Condition Pushdown.

    Limitations

    • Currently, virtual column indexes can't be used for index condition pushdown. Instead, a generated column can be made declared STORED. Then, index condition pushdown will be possible.

    • Index Condition Pushdown can't be used with backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query which can be handled by using a backward-ordered index scan, it will disable Index Condition Pushdown.

    Partitioned Tables

    Index condition pushdown support for partitioned tables was added in .

    This page is licensed: CC BY-SA / Gnu FDL

    and
    Country
    tables:
    duplicate-weedout-inner-join

    The Inner join produces duplicates. We have Germany three times, because it has three big cities. Now, lets put DuplicateWeedout into the picture:

    duplicate-weedout-diagram

    Here one can see that a temporary table with a primary key was used to avoid producing multiple records with 'Germany'.

    DuplicateWeedout in action

    The Start temporary and End temporary from the last diagram are shown in the EXPLAIN output:

    This query will read 238 rows from the City table, and for each of them will make a primary key lookup in the Country table, which gives another 238 rows. This gives a total of 476 rows, and you need to add 238 lookups in the temporary table (which are typically much cheaper since the temporary table is in-memory).

    If we run the same query with semi-join optimizations disabled, we'll get:

    This plan will read (239 + 239*18) = 4541 rows, which is much slower.

    Factsheet

    • DuplicateWeedout is shown as "Start temporary/End temporary" in EXPLAIN.

    • The strategy can handle correlated subqueries.

    • But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.

    • DuplicateWeedout allows the optimizer to freely mix a subquery's tables and the parent select's tables.

    • There is no separate @@optimizer_switch flag for DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='optimizer_semijoin=off' command.

    See Also

    • Subquery Optimizations Map

    This page is licensed: CC BY-SA / Gnu FDL

    Semi-join subqueries

    and pass it to the range optimizer. The range optimizer can, in turn, evaluate whether it makes sense to construct range access from the condition:

    Here's another example that's more complex but is based on a real-world query. Consider a join query

    Here, the optimizer can infer the condition "return_id IS NOT NULL". If most of the orders are not returned (and so have NULL for return_id), one can use range access to scan only those orders that had a return.

    Controlling the Optimization

    The optimization is not enabled by default. One can enable it like so

    Optimizer Trace

    TODO.

    See Also

    • MDEV-15777 - JIRA bug report which resulted in the optimization

    • NULL Filtering Optimization is a related optimization in MySQL and MariaDB. It uses inferred NOT NULL conditions to perform filtering (but not index access)

    This page is licensed: CC BY-SA / Gnu FDL

    IP Range Table Performance

    The situation

    Your data includes a large set of non-overlapping 'ranges'. These could be IP addresses, datetimes (show times for a single station), zipcodes, etc.

    You have pairs of start and end values; one 'item' belongs to each such 'range'. So, instinctively, you create a table with start and end of the range, plus info about the item. Your queries involve a WHERE clause that compares for being between the start and end values.

    The problem

    Once you get a large set of items, performance degrades. You play with the indexes, but find nothing that works well. The indexes fail to lead to optimal functioning because the database does not understand that the ranges are non-overlapping.

    The solution

    I will present a solution that enforces the fact that items cannot have overlapping ranges. The solution builds a table to take advantage of that, then uses Stored Routines to get around the clumsiness imposed by it.

    Performance

    The instinctive solution often leads to scanning half the table to do just about anything, such as finding the item containing an 'address'. In complexity terms, this is Order(N).

    The solution here can usually get the desired information by fetching a single row, or a small number of rows. It is Order(1).

    In a large table, "counting the disk hits" is the important part of performance. Since InnoDB is used, and the PRIMARY KEY (clustered) is used, most operations hit only 1 block.

    Finding the 'block' where a given IP address lives:

    • For start of block: One single-row fetch using the PRIMARY KEY

    • For end of block: Ditto. The record containing this will be 'adjacent' to the other record.

    For allocating or freeing a block:

    • 2-7 SQL statements, hitting the clustered PRIMARY KEY for the rows containing and immediately adjacent to the block.

    • One SQL statement is a DELETE; if hits as many rows as are needed for the block.

    • The other statements hit one row each.

    Design decisions

    This is crucial to the design and its performance:

    • Having just one address in the row. These were alternative designs; they seemed to be no better, and possibly worse:

    • That one address could have been the 'end' address.

    • The routine parameters for a 'block' could have be start of this block and start of next block.

    • The IPv4 parameters could have been dotted quads; I chose to keep the reference implemetation simpler instead.

    The interesting work is in the Ips, not the second table, so I focus on it. The inconvenience of JOINing to the second table is small compared to the performance gains.

    Details

    Two, not one, tables will be used. The first table (Ips in the reference implementations) is carefully designed to be optimal for all the basic operations needed. The second table contains other infomation about the 'owner' of each 'item'. In the reference implementations owner is an id used to JOIN the two tables. This discussion centers around Ips and how to efficiently map IP(s) to/from owner(s). The second table has "PRIMARY KEY(owner)".

    In addition to the two-table schema, there are a set of Stored Routines to encapsulate the necessary code.

    One row of Ips represents one 'item' by specifying the starting IP address and the 'owner'. The next row gives the starting IP address of the next "address block", thereby indirectly providing the ending address for the current block.

    This lack of explicitly stating the "end address" leads to some clumsiness. The stored routines hide it from the user.

    A special owner (indicated by '0') is reserved for "free" or "not-owned" blocks. Hence, sparse allocation of address blocks is no problem. Also, the 'free' owner is handled no differently than real owners, so there are no extra Stored Routines for such.

    Links below give "reference" implementations for IPv4 and IPv6. You will need to make changes for non-IP situations, and may need to make changes even for IP situations.

    These are the main stored routines provided:

    • IpIncr, IpDecr -- for adding/subtracting 1

    • IpStore -- for allocating/freeing a range

    • IpOwner, IpRangeOwners, IpFindRanges, Owner2IpStarts, Owner2IpRanges -- for lookups

    • IpNext, IpEnd -- IP of start of next block, or end of current block

    None of the provided routines JOIN to the other table; you may wish to develop custom queries based on the given reference Stored Procedures.

    The Ips table's size is proportional to the number of blocks. A million 'owned' blocks may be 20-50MB. This varies due to

    • number of 'free' gaps (between zero and the number of owned blocks)

    • datatypes used for ip and owner

    • overhead Even 100M blocks is quite manageable in today's hardware. Once things are cached, most operations would take only a few milliseconds. A trillion blocks would work, but most operations would hit the disk a few times -- only a few times.

    Reference implementation of IPv4

    This specific to IPv4 (32 bit, a la '196.168.1.255'). It can handle anywhere from 'nothing assigned' (1 row) to 'everything assigned' (4B rows) 'equally' well. That is, to ask the question "who owns '11.22.33.44'" is equally efficient regardless of how many blocks of IP addresses exist in the table. (OK, caching, disk hits, etc may make a slight difference.) The one function that can vary is the one that reassigns a range to a new owner. Its speed is a function of how many existing ranges need to be consumed, since those rows will be DELETEd. (It helps that they are, by schema design, 'clustered'.)

    Notes on the :

    • Externally, the user may use the dotted quad notation (11.22.33.44), but needs to convert to INT UNSIGNED for calling the Stored Procs.

    • The user is responsible for converting to/from the calling datatype (INT UNSIGNED) when accessing the stored routine; suggest /.

    • The internal datatype for addresses is the same as the calling datatype (INT UNSIGNED).

    • Adding and subtracting 1 (simple arithmetic).

    (The reference implementation does not handle CDRs. Such should be easy to add on, by first turning it into an IP range.)

    Reference implementation of IPv6

    The code for handling IP address is more complex, but the overall structure is the same as for IPv4. Launch into it only if you need IPv6.

    Notes on the :

    • Externally, IPv6 has a complex string, VARCHAR(39) CHARACTER SET ASCII. The Stored Procedure IpStr2Hex() is provided.

    • The user is responsible for converting to/from the calling datatype (BINARY(16)) when accessing the stored routine; suggest /.

    • The internal datatype for addresses is the same as the calling datatype (BINARY(16)).

    • Communication with the Stored routines is via 32-char hex strings.

    The INET6* functions were first available in MySQL 5.6.3 and

    Adapting to a different non-IP 'address range' data

    • The external datatype for an 'address' should be whatever is convenient for the application.

    • The datatype for the 'address' in the table must be ordered, and should be as compact as possible.

    • You must write the Stored functions (IpIncr, IpDecr) for incrementing/decrementing an 'address'.

    • An 'owner' is an id of your choosing, but smaller is better.

    "Owner" needs a special value to represent "not owned". The reference implementations use "=" and "!=" to compare two 'owners'. Numeric values and strings work nicely with those operators; NULL does not. Hence, please do not use NULL for "not owned".

    Since the datatypes are pervasive in the stored routines, adapting a reference implementation to a different concept of 'address' would require multiple minor changes.

    The code enforces that consecutive blocks never have the same 'owner', so the table is of 'minimal' size. Your application can assume that such is always the case.

    Postlog

    Original writing -- Oct, 2012; Notes on INET6 functions -- May, 2015.

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Semi-join Materialization Strategy

    Semi-join Materialization is a special kind of subquery materialization used for Semi-join subqueries. It actually includes two strategies:

    • Materialization/lookup

    • Materialization/scan

    The idea

    Consider a query that finds countries in Europe which have big cities:

    The subquery is uncorrelated, that is, we can run it independently of the upper query. The idea of semi-join materialization is to do just that, and fill a temporary table with possible values of the City.country field of big cities, and then do a join with countries in Europe:

    The join can be done in two directions:

    1. From the materialized table to countries in Europe

    2. From countries in Europe to the materialized table

    The first way involves doing a full scan on the materialized table, so we call it "Materialization-scan".

    If you run a join from Countries to the materialized table, the cheapest way to find a match in the materialized table is to make a lookup on its primary key (it has one: we used it to remove duplicates). Because of that, we call the strategy "Materialization-lookup".

    Semi-join materialization in action

    Materialization-Scan

    If we chose to look for cities with a population greater than 7 million, the optimizer will use Materialization-Scan and EXPLAIN will show this:

    Here, you can see:

    • There are still two SELECTs (look for columns with id=1 and id=2)

    • The second select (with id=2) has select_type=MATERIALIZED. This means it will be executed and its results will be stored in a temporary table with a unique key over all columns. The unique key is there to prevent the table from containing any duplicate records.

    The optimizer chose to do a full scan over the materialized table, so this is an example of a use of the Materialization-Scan strategy.

    As for execution costs, we're going to read 15 rows from table City, write 15 rows to materialized table, read them back (the optimizer assumes there won't be any duplicates), and then do 15 eq_ref accesses to table Country. In total, we'll do 45 reads and 15 writes.

    By comparison, if you run the EXPLAIN with semi-join optimizations disabled, you'll get this:

    ...which is a plan to do (239 + 239*15) = 3824 table reads.

    Materialization-Lookup

    Let's modify the query slightly and look for countries which have cities with a population over one millon (instead of seven):

    The EXPLAIN output is similar to the one which used Materialization-scan, except that:

    • the <subquery2> table is accessed with the eq_ref access method

    • the access uses an index named distinct_key

    This means that the optimizer is planning to do index lookups into the materialized table. In other words, we're going to use the Materialization-lookup strategy.

    With optimizer_switch='semijoin=off,materialization=off'), one will get this EXPLAIN:

    One can see that both plans will do a full scan on the Country table. For the second step, MariaDB will fill the materialized table (238 rows read from table City and written to the temporary table) and then do a unique key lookup for each record in table Country, which works out to 238 unique key lookups. In total, the second step will cost (239+238) = 477 reads and 238 temp.table writes.

    Execution of the latter (DEPENDENT SUBQUERY) plan reads 18 rows using an index on City.Country for each record it receives for table Country. This works out to a cost of (18*239) = 4302 reads. Had there been fewer subquery invocations, this plan would have been better than the one with Materialization. By the way, MariaDB has an option to use such a query plan, too (see ), but it did not choose it.

    Subqueries with grouping

    MariaDB is able to use Semi-join materialization strategy when the subquery has grouping (other semi-join strategies are not applicable in this case).

    This allows for efficient execution of queries that search for the best/last element in a certain group.

    For example, let's find cities that have the biggest population on their continent:

    the cities are:

    Factsheet

    Semi-join materialization

    • Can be used for uncorrelated IN-subqueries. The subselect may use grouping and/or aggregate functions.

    • Is shown in EXPLAIN as type=MATERIALIZED for the subquery, and a line withtable=<subqueryN> in the parent subquery.

    • Is enabled when one has both materialization=on and semijoin=on

    This page is licensed: CC BY-SA / Gnu FDL

    MariaDB 5.3 Optimizer Debugging

    MariaDB 5.3 has an optimizer debugging patch. The patch is pushed into:

    lp:maria-captains/maria/5.3-optimizer-debugging

    The patch is wrapped in #ifdef, but there is a #define straight in mysql_priv.h so simply compiling that tree should produce a binary with optimizer debugging enabled.

    The patch adds two system variables:

    • @@debug_optimizer_prefer_join_prefix

    • @@debug_optimizer_dupsweedout_penalized

    the variables are present as session/global variables, and are also settable via the server command line.

    debug_optimizer_prefer_join_prefix

    If this variable is non-NULL, it is assumed to specify a join prefix as a comma-separated list of table aliases:

    The optimizer will try its best to build a join plan which matches the specified join prefix. It does this by comparing join prefixes it is considering with @@debug_optimizer_prefer_join_prefix, and multiplying cost by a million if the plan doesn't match the prefix.

    As a result, you can more-or-less control the join order. For example, let's take this query:

    and request a join order of C,A,B:

    We got it.

    Note that this is still a best-effort approach:

    • you won't be successful in forcing join orders which the optimizer considers invalid (e.g. for "t1 LEFT JOIN t2" you won't be able to get a join order of t2,t1).

    • The optimizer does various plan pruning and may discard the requested join order before it has a chance to find out that it is a million-times cheaper than any other.

    Semi-joins

    It is possible to force the join order of joins plus semi-joins. This may cause a different strategy to be used:

    Semi-join materialization is a somewhat special case, because "join prefix" is not exactly what you see in the EXPLAIN output. For semi-join materialization:

    • don't put "<subqueryN>" into @@debug_optimizer_prefer_join_prefix

    • instead, put all of the materialization tables into the place where you want the <subqueryN> line.

    • Attempts to control the join order inside the materialization nest will be unsuccessful. Example: we want A-C-B-AA:

    but we get A-B-C-AA.

    debug_optimizer_dupsweedout_penalized

    There are four semi-join execution strategies:

    1. FirstMatch

    2. Materialization

    3. LooseScan

    4. DuplicateWeedout

    The first three strategies have flags in @@optimizer_switch that can be used to disable them. The DuplicateWeedout strategy does not have a flag. This was done for a reason, as that strategy is the catch-all strategy and it can handle all kinds of subqueries, in all kinds of join orders. (We're slowly moving to the point where it will be possible to run with FirstMatch enabled and everything else disabled but we are not there yet.)

    Since DuplicateWeedout cannot be disabled, there are cases where it "gets in the way" by being chosen over the strategy you need. This is whatdebug_optimizer_dupsweedout_penalized is for. if you set:

    ...the costs of query plans that use DuplicateWeedout will be multiplied by a millon. This doesn't mean that you will get rid of DuplicateWeedout — due to it is still possible to have DuplicateWeedout used even if a cheaper plan exits. A partial remedy to this is to run with

    It is possible to use both debug_optimizer_dupsweedout_penalized and debug_optimizer_prefer_join_prefix at the same time. This should give you the desired strategy and join order.

    Further reading

    • See mysql-test/t/debug_optimizer.test (in the MariaDB source code) for examples

    This page is licensed: CC BY-SA / Gnu FDL

    Optimizer Hints

    This section details special comments you can add to SQL statements to influence the query optimizer, helping you manually select better execution plans for improved performance and query tuning.

    Optimizer hints are options available that affect the execution plan.

    SELECT Modifiers have been in MariaDB for a long time, while Expanded Optimizer Hints were introduced in MariaDB 12.0 and 12.1.

    See Also

    • Use optimizer_switch to enable/disable specific optimizations

    This page is licensed: CC BY-SA / Gnu FDL

    Data Sampling: Techniques for Efficiently Finding a Random Row

    Fetching random rows from a table (beyond ORDER BY RAND())

    The problem

    One would like to do "SELECT ... ORDER BY RAND() LIMIT 10" to get 10 rows at random. But this is slow. The optimizer does

    LIMIT ROWS EXAMINED

    Syntax

    Similar to the parameters of LIMIT, rows_limit can be both a prepared statement parameter, or a stored program parameter.

    MariaDB [securedb]> select @@version;
    +-----------------------------------+
    | @@version             |
    +-----------------------------------+
    | 10.6.20-16-MariaDB-enterprise-log |
    +-----------------------------------+
    1 row in set (0.001 sec)
    
    MariaDB [securedb]> select @@optimizer_adjust_secondary_key_costs;
    +---------------------------------------------+
    | @@optimizer_adjust_secondary_key_costs   |
    +---------------------------------------------+
    | fix_reuse_range_for_ref,fix_card_multiplier |
    +---------------------------------------------+
    SET @@optimizer_adjust_secondary_key_costs='all';
    SET optimizer_switch='index_condition_pushdown=off'
    MariaDB [test]> EXPLAIN SELECT * FROM tbl WHERE key_col1 BETWEEN 10 AND 11 AND key_col2 LIKE '%foo%';
    +----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
    | id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                 |
    +----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
    |  1 | SIMPLE      | tbl   | range | key_col1      | key_col1 | 5       | NULL |    2 | Using index condition |
    +----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
    ALTER TABLE lineitem ADD INDEX s_r (l_shipdate, l_receiptdate);
    SELECT COUNT(*) FROM lineitem
    WHERE
      l_shipdate BETWEEN '1993-01-01' AND '1993-02-01' AND
      datediff(l_receiptdate,l_shipdate) > 25 AND
      l_quantity > 40;
    -+----------+-------+----------------------+-----+---------+------+--------+-------------+
     | table    | type | possible_keys         | key | key_len | ref | rows    | Extra       |
    -+----------+-------+----------------------+-----+---------+------+--------+-------------+
     | lineitem | range | s_r                  | s_r | 4       | NULL | 152064 | Using where |
    -+----------+-------+----------------------+-----+---------+------+--------+-------------+
    -+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
     | table     | type | possible_keys | key | key_len | ref | rows     | Extra                              |
    -+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
     | lineitem | range | s_r            | s_r | 4       | NULL | 152064 | Using index condition; Using where |
    -+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
    SELECT * 
    FROM Country 
    WHERE 
       Country.code IN (SELECT City.Country
                        FROM City 
                        WHERE 
                          City.Population > 0.33 * Country.Population AND 
                          City.Population > 1*1000*1000);
    EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where City.Population > 0.33 * Country.Population 
       AND City.Population > 1*1000*1000)\G
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            TABLE: City
             type: RANGE
    possible_keys: Population,Country
              KEY: Population
          key_len: 4
              ref: NULL
             ROWS: 238
            Extra: USING INDEX CONDITION; Start temporary
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            TABLE: Country
             type: eq_ref
    possible_keys: PRIMARY
              KEY: PRIMARY
          key_len: 3
              ref: world.City.Country
             ROWS: 1
            Extra: USING WHERE; End temporary
    2 rows in set (0.00 sec)
    EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where City.Population > 0.33 * Country.Population 
        AND City.Population > 1*1000*1000)\G
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            TABLE: Country
             type: ALL
    possible_keys: NULL
              KEY: NULL
          key_len: NULL
              ref: NULL
             ROWS: 239
            Extra: USING WHERE
    *************************** 2. row ***************************
               id: 2
      select_type: DEPENDENT SUBQUERY
            TABLE: City
             type: index_subquery
    possible_keys: Population,Country
              KEY: Country
          key_len: 3
              ref: func
             ROWS: 18
            Extra: USING WHERE
    2 rows in set (0.00 sec)
    CREATE TABLE items (
      price  DECIMAL(8,2),
      weight DECIMAL(8,2),
      ...
      INDEX(weight)
    );
    -- Find items that cost more than 1000 $currency_units per kg:
    SET optimizer_switch='not_null_range_scan=ON';
    EXPLAIN
    SELECT * FROM items WHERE items.price > items.weight / 1000;
    weight IS NOT NULL
    +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
    | id   | select_type | table | type  | possible_keys | key    | key_len | ref  | rows | Extra       |
    +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
    |    1 | SIMPLE      | items | range | NULL          | weight | 5       | NULL | 1    | Using where |
    +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
    -- Find orders that were returned
    SELECT * FROM current_orders AS O, order_returns AS RET
    WHERE 
      O.return_id= RET.id;
    SET optimizer_switch='not_null_range_scan=ON';
    The first select received the table name subquery2. This is the table that we got as a result of the materialization of the select with id=2.
    in the
    variable.
  • The materialization=on|off flag is shared with Non-semijoin materialization.

  • FirstMatch Strategy
    sj-materialization1
    optimizer_switch
    Bug #898747
  • Fetch all the rows -- this is costly

  • Append RAND() to the rows

  • Sort the rows -- also costly

  • Pick the first 10.

  • All the algorithms given below are "fast", but most introduce flaws:

    • Bias -- some rows are more like to be fetched than others.

    • Repetitions -- If two random sets contain the same row, they are likely to contain other dups.

    • Sometimes failing to fetch the desired number of rows.

    "Fast" means avoiding reading all the rows. There are many techniques that require a full table scan, or at least an index scan. They are not acceptable for this list. There is even a technique that averages half a scan; it is relegated to a footnote.

    Metrics

    Here's a way to measure performance without having a big table.

    If some of the "Handler" numbers look like the number of rows in the table, then there was a table scan.

    None of the queries presented here need a full table (or index) scan. Each has a time proportional to the number of rows returned.

    Virtually all published algorithms involve a table scan. The previously published version of this blog had, embarrassingly, several algorithms that had table scans.

    Sometimes the scan can be avoided via a subquery. For example, the first of these will do a table scan; the second will not.

    Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned

    • Requirement: AUTO_INCREMENT id

    • Requirement: No gaps in id

    (Of course, you might be able to simplify this. For example, min_id is likely to be 1. Or precalculate limits into @min and @max.)

    Case: Consecutive AUTO_INCREMENT without gaps, 10 rows

    • Requirement: AUTO_INCREMENT id

    • Requirement: No gaps in id

    • Flaw: Sometimes delivers fewer than 10 rows

    The FLOOR expression could lead to duplicates, hence the inflated inner LIMIT. There could (rarely) be so many duplicates that the inflated LIMIT leads to fewer than the desired 10 different rows. One approach to that Flaw is to rerun the query if it delivers too few rows.

    A variant:

    Again, ugly but fast, regardless of table size.

    Case: AUTO_INCREMENT with gaps, 1 or more rows returned

    • Requirement: AUTO_INCREMENT, possibly with gaps due to DELETEs, etc

    • Flaw: Only semi-random (rows do not have an equal chance of being picked), but it does partially compensate for the gaps

    • Flaw: The first and last few rows of the table are less likely to be delivered.

    This gets 50 "consecutive" ids (possibly with gaps), then delivers a random 10 of them.

    Yes, it is complex, but yes, it is fast, regardless of the table size.

    Case: Extra FLOAT column for randomizing

    (Unfinished: need to check these.)

    Assuming rnd is a FLOAT (or DOUBLE) populated with RAND() and INDEXed:

    • Requirement: extra, indexed, FLOAT column

    • Flaw: Fetches 10 adjacent rows (according to rnd), hence not good randomness

    • Flaw: Near 'end' of table, can't find 10 rows.

    • These two variants attempt to resolve the end-of-table flaw:

    Case: UUID or MD5 column

    • Requirement: UUID/GUID/MD5/SHA1 column exists and is indexed.

    • Similar code/benefits/flaws to AUTO_INCREMENT with gaps.

    • Needs 7 random HEX digits:

    can be used as a start for adapting a gapped AUTO_INCREMENT case. If the field is BINARY instead of hex, then

    See also

    Rick James graciously allowed us to use this article in the documentation.

    Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: random

    This page is licensed: CC BY-SA / Gnu FDL

    Description

    The purpose of this optimization is to provide the means to terminate the execution of SELECT statements which examine too many rows, and thus use too many resources. This is achieved through an extension of the LIMIT clause —LIMIT ROWS EXAMINED number_of_rows. Whenever possible the semantics of LIMIT ROWS EXAMINED is the same as that of normal LIMIT (for instance for aggregate functions).

    The LIMIT ROWS EXAMINED clause is taken into account by the query engine only during query execution. Thus the clause is ignored in the following cases:

    • If a query is EXPLAIN-ed.

    • During query optimization.

    • During auxiliary operations such as writing to system tables (e.g. logs).

    The clause is not applicable to DELETE or UPDATE statements, and if used in those statements produces a syntax error.

    The effects of this clause are as follows:

    • The server counts the number of read, inserted, modified, and deleted rows during query execution. This takes into account the use of temporary tables, and sorting for intermediate query operations.

    • Once the counter exceeds the value specified in the LIMIT ROWS EXAMINED clause, query execution is terminated as soon as possible.

    • The effects of terminating the query because of LIMIT ROWS EXAMINED are as follows:

      • The result of the query is a subset of the complete query, depending on when the query engine detected that the limit was reached. The result may be empty if no result rows could be computed before reaching the limit.

      • A warning is generated of the form: "Query execution was interrupted. The query examined at least 100 rows, which exceeds LIMIT ROWS EXAMINED (20). The query result may be incomplete."

      • If query processing was interrupted during filesort, an error is returned in addition to the warning.

      • If a UNION was interrupted during execution of one of its queries, the last step of the UNION is still executed in order to produce a partial result.

      • Depending on the join and other execution strategies used for a query, the same query may produce no result at all, or a different subset of the complete result when terminated due to LIMIT ROWS EXAMINED.

      • If the query contains a GROUP BY clause, the last group where the limit was reached will be discarded.

    The LIMIT ROWS EXAMINED clause cannot be specified on a per-subquery basis. There can be only one LIMIT ROWS EXAMINED clause for the whole SELECT statement. If a SELECT statement contains several subqueries with LIMIT ROWS EXAMINED, the one that is parsed last is taken into account.

    Examples

    A simple example of the clause is:

    The LIMIT ROWS EXAMINED clause is global for the whole statement.

    If a composite query (such as UNION, or query with derived tables or with subqueries) contains more than one LIMIT ROWS EXAMINED, the last one parsed is taken into account. In this manner either the last or the outermost one is taken into account. For instance, in the query:

    The limit that is taken into account is 11, not 0.

    This page is licensed: CC BY-SA / Gnu FDL

    SELECT * FROM Country 
    WHERE Country.code IN (SELECT City.Country 
                           FROM City 
                           WHERE City.Population > 7*1000*1000)
          AND Country.continent='Europe'
    MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where  City.Population > 7*1000*1000);
    +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
    | id | select_type  | table       | type   | possible_keys      | key        | key_len | ref                | rows | Extra                 |
    +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
    |  1 | PRIMARY      | <subquery2> | ALL    | distinct_key       | NULL       | NULL    | NULL               |   15 |                       |
    |  1 | PRIMARY      | Country     | eq_ref | PRIMARY            | PRIMARY    | 3       | world.City.Country |    1 |                       |
    |  2 | MATERIALIZED | City        | range  | Population,Country | Population | 4       | NULL               |   15 | Using index condition |
    +----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
    3 rows in set (0.01 sec)
    MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where  City.Population > 7*1000*1000);
    +----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
    | id | select_type        | table   | type  | possible_keys      | key        | key_len | ref  | rows | Extra                              |
    +----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
    |  1 | PRIMARY            | Country | ALL   | NULL               | NULL       | NULL    | NULL |  239 | Using where                        |
    |  2 | DEPENDENT SUBQUERY | City    | range | Population,Country | Population | 4       | NULL |   15 | Using index condition; Using where |
    +----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
    MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where  City.Population > 1*1000*1000) ;
    +----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
    | id | select_type  | table       | type   | possible_keys      | key          | key_len | ref  | rows | Extra                 |
    +----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
    |  1 | PRIMARY      | Country     | ALL    | PRIMARY            | NULL         | NULL    | NULL |  239 |                       |
    |  1 | PRIMARY      | <subquery2> | eq_ref | distinct_key       | distinct_key | 3       | func |    1 |                       |
    |  2 | MATERIALIZED | City        | range  | Population,Country | Population   | 4       | NULL |  238 | Using index condition |
    +----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
    3 rows in set (0.00 sec)
    MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
      (select City.Country from City where  City.Population > 1*1000*1000) ;
    +----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
    | id | select_type        | table   | type           | possible_keys      | key     | key_len | ref  | rows | Extra       |
    +----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
    |  1 | PRIMARY            | Country | ALL            | NULL               | NULL    | NULL    | NULL |  239 | Using where |
    |  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country | 3       | func |   18 | Using where |
    +----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
    EXPLAIN 
    SELECT * FROM City 
    WHERE City.Population IN (SELECT max(City.Population) FROM City, Country 
                              WHERE City.Country=Country.Code 
                              GROUP BY Continent)
    +------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
    | id   | select_type  | table       | type | possible_keys | key        | key_len | ref                              | rows | Extra           |
    +------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
    |    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL       | NULL    | NULL                             |  239 |                 |
    |    1 | PRIMARY      | City        | ref  | Population    | Population | 4       | <subquery2>.max(City.Population) |    1 |                 |
    |    2 | MATERIALIZED | Country     | ALL  | PRIMARY       | NULL       | NULL    | NULL                             |  239 | Using temporary |
    |    2 | MATERIALIZED | City        | ref  | Country       | Country    | 3       | world.Country.Code               |   18 |                 |
    +------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
    4 rows in set (0.00 sec)
    +------+-------------------+---------+------------+
    | ID   | Name              | Country | Population |
    +------+-------------------+---------+------------+
    | 1024 | Mumbai (Bombay)   | IND     |   10500000 |
    | 3580 | Moscow            | RUS     |    8389200 |
    | 2454 | Macao             | MAC     |     437500 |
    |  608 | Cairo             | EGY     |    6789479 |
    | 2515 | Ciudad de México | MEX     |    8591309 |
    |  206 | São Paulo        | BRA     |    9968485 |
    |  130 | Sydney            | AUS     |    3276207 |
    +------+-------------------+---------+------------+
    SET debug_optimizer_prefer_join_prefix='tbl1,tbl2,tbl3';
    MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten B, ten C;
    +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
    +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
    |  1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                    |
    |  1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
    |  1 | SIMPLE      | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
    +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
    3 rows in set (0.00 sec)
    MariaDB [test]> SET debug_optimizer_prefer_join_prefix='C,A,B';
    Query OK, 0 rows affected (0.00 sec)
    
    MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten B, ten C;
    +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
    +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
    |  1 | SIMPLE      | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                    |
    |  1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
    |  1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
    +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
    3 rows in set (0.00 sec)
    MariaDB [test]> SET debug_optimizer_prefer_join_prefix=NULL;
    Query OK, 0 rows affected (0.00 sec)
    
    MariaDB [test]> EXPLAIN SELECT * FROM ten A WHERE a IN (SELECT B.a FROM ten B, ten C WHERE C.a + A.a < 4);
    +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                      |
    +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
    |  1 | PRIMARY     | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 |                            |
    |  1 | PRIMARY     | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where                |
    |  1 | PRIMARY     | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; FirstMatch(A) |
    +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
    3 rows in set (0.00 sec)
    
    MariaDB [test]> SET debug_optimizer_prefer_join_prefix='C,A,B';
    Query OK, 0 rows affected (0.00 sec)
    
    MariaDB [test]> EXPLAIN SELECT * FROM ten A WHERE a IN (SELECT B.a FROM ten B, ten C WHERE C.a + A.a < 4);
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
    |  1 | PRIMARY     | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Start temporary                                 |
    |  1 | PRIMARY     | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; Using join buffer (flat, BNL join) |
    |  1 | PRIMARY     | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; End temporary                      |
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
    3 rows in set (0.00 sec)
    MariaDB [test]> SET debug_optimizer_prefer_join_prefix='A,C,B,AA';
    Query OK, 0 rows affected (0.00 sec)
    
    MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten AA WHERE A.a IN (SELECT B.a FROM ten B, ten C);
    +----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
    | id | select_type | table       | type   | possible_keys | key          | key_len | ref  | rows | Extra                              |
    +----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
    |  1 | PRIMARY     | A           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |                                    |
    |  1 | PRIMARY     | <subquery2> | eq_ref | distinct_key  | distinct_key | 5       | func |    1 |                                    |
    |  1 | PRIMARY     | AA          | ALL    | NULL          | NULL         | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
    |  2 | SUBQUERY    | B           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |                                    |
    |  2 | SUBQUERY    | C           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |                                    |
    +----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
    5 rows in set (0.00 sec)
    MariaDB [test]> SET debug_optimizer_dupsweedout_penalized=TRUE;
    MariaDB [test]> SET optimizer_prune_level=0;
    FLUSH STATUS;
        SELECT ...;
        SHOW SESSION STATUS LIKE 'Handler%';
    SELECT *  FROM RandTest AS a
      WHERE id = FLOOR(@min + (@max - @min + 1) * RAND());  -- BAD: table scan
    SELECT *
     FROM RandTest AS a
     JOIN (
       SELECT FLOOR(@min + (@max - @min + 1) * RAND()) AS id -- Good; single eval.
          ) b  USING (id);
    SELECT r.*
          FROM (
              SELECT FLOOR(mm.min_id + (mm.max_id - mm.min_id + 1) * RAND()) AS id
                  FROM (
                      SELECT MIN(id) AS min_id,
                             MAX(id) AS max_id
                          FROM RandTest
                       ) AS mm
               ) AS init
          JOIN  RandTest AS r  ON r.id = init.id;
    -- First select is one-time:
      SELECT @min := MIN(id),
             @max := MAX(id)
          FROM RandTest;
      SELECT DISTINCT *
          FROM RandTest AS a
          JOIN (
              SELECT FLOOR(@min + (@max - @min + 1) * RAND()) AS id
                  FROM RandTest
                  LIMIT 11    -- more than 10 (to compensate for dups)
               ) b  USING (id)
          LIMIT 10;           -- the desired number of rows
    SELECT r.*
          FROM (
              SELECT FLOOR(mm.min_id + (mm.max_id - mm.min_id + 1) * RAND()) AS id
                  FROM (
                      SELECT MIN(id) AS min_id,
                             MAX(id) AS max_id
                          FROM RandTest
                       ) AS mm
                  JOIN ( SELECT id dummy FROM RandTest LIMIT 11 ) z
               ) AS init
          JOIN  RandTest AS r  ON r.id = init.id
          LIMIT 10;
    -- First select is one-time:
    SELECT @min := MIN(id),
           @max := MAX(id)
        FROM RandTest;
    SELECT a.*
        FROM RandTest a
        JOIN ( SELECT id FROM
                ( SELECT id
                    FROM ( SELECT @min + (@max - @min + 1 - 50) * RAND() 
                      AS start FROM DUAL ) AS init
                    JOIN RandTest y
                    WHERE    y.id > init.start
                    ORDER BY y.id
                    LIMIT 50         -- Inflated to deal with gaps
                ) z ORDER BY RAND()
               LIMIT 10              -- number of rows desired (change to 1 if looking for a single row)
             ) r ON a.id = r.id;
    SELECT r.*
          FROM ( SELECT RAND() AS start FROM DUAL ) init
          JOIN RandTest r
          WHERE r.rnd >= init.start
          ORDER BY r.rnd
          LIMIT 10;
    SELECT r.*
          FROM ( SELECT RAND() * ( SELECT rnd
                            FROM RandTest
                            ORDER BY rnd DESC
                            LIMIT 10,1 ) AS start
               ) AS init
          JOIN RandTest r
          WHERE r.rnd > init.start
          ORDER BY r.rnd
          LIMIT 10;
    
    
      SELECT @start := RAND(),
             @cutoff := CAST(1.1 * 10 + 5 AS DECIMAL(20,8)) / TABLE_ROWS
          FROM information_schema.TABLES
          WHERE TABLE_SCHEMA = 'dbname'
            AND TABLE_NAME = 'RandTest'; -- 0.0030
      SELECT d.*
          FROM (
              SELECT a.id
                  FROM RandTest a
                  WHERE rnd BETWEEN @start AND @start + @cutoff
               ) sample
          JOIN RandTest d USING (id)
          ORDER BY rand()
          LIMIT 10;
    RIGHT( HEX( (1<<24) * (1+RAND()) ), 6)
    UNHEX(RIGHT( HEX( (1<<24) * (1+RAND()) ), 6))
    SELECT ... FROM ... WHERE ...
    [group_clause] [order_clause]
    LIMIT [[OFFSET,] row_count] ROWS EXAMINED rows_limit;
    SELECT * FROM t1, t2 LIMIT 10 ROWS EXAMINED 10000;
    SELECT * FROM t1
    WHERE c1 IN (SELECT * FROM t2 WHERE c2 > ' ' LIMIT ROWS EXAMINED 0)
    LIMIT ROWS EXAMINED 11;

    The IPv6 parameters are 32-digit hex because it was the simpler that BINARY(16) or IPv5 for a reference implementation.

    The datatype of an 'owner' (MEDIUMINT UNSIGNED: 0..16M) -- adjust if needed.

  • The address "Off the end" (255.255.255.255+1 - represented as NULL).

  • The table is initialized to one row: (ip=0, owner=0), meaning "all addresses are free See the comments in the code for more details.

  • Inside the Procedures, and in the Ips table, an address is stored as BINARY(16) for efficiency. HEX() and UNHEX() are used at the boundaries.

  • Adding/subtracting 1 is rather complex (see the code).

  • The datatype of an 'owner' (MEDIUMINT UNSIGNED: 0..16M); 'free' is represented by 0. You may need a bigger datatype.

  • The address "Off the end" (ffff.ffff.ffff.ffff.ffff.ffff.ffff.ffff+1 is represented by NULL).

  • The table is initialized to one row: (UNHEX('00000000000000000000000000000000'), 0), meaning "all addresses are free.

  • You may need to decide on a canonical representation of IPv4 in IPv6. See the comments in the code for more details.

  • A special value (such as 0 or '') must be provided for 'free'.

  • The table must be initialized to one row: (SmallestAddress, Free)

  • InnoDB
    Reference implementation for IPv4
    INET_ATON
    INET_NTOA
    reference implementation for IPv6
    INET6_ATON
    INET6_NTOA
    Related blog
    Another approach
    Free IP tables
    Rick James' site
    ipranges
    10.6.19
    MDEV-34664
    10.6.20
    10.6.20
    Handler_icp_attempts
    Handler_icp_match

    Foreign Keys

    A foreign key is a database constraint that references columns in a parent table to enforce data integrity in a child table. When used, MariaDB checks to maintain these integrity rules.

    Overview

    A foreign key is a constraint which can be used to enforce data integrity. It is composed by a column (or a set of columns) in a table called the child table, which references to a column (or a set of columns) in a table called the parent table. If foreign keys are used, MariaDB performs some checks to enforce that some integrity rules are always enforced. For a more exhaustive explanation, see .

    Foreign keys can only be used with storage engines that support them. The default InnoDB supports foreign keys.

    Partitioned tables cannot contain foreign keys, and cannot be referenced by a foreign key.

    Syntax

    Note: Until , MariaDB accepts the shortcut format with a REFERENCES clause only in ALTER TABLE and CREATE TABLE statements, but that syntax does nothing. For example:

    MariaDB simply parses it without returning any error or warning, for compatibility with other DBMS's. However, only the syntax described below creates foreign keys. From , MariaDB will attempt to apply the constraint. See the below.

    Foreign keys are created with or . The definition must follow this syntax:

    The symbol clause, if specified, is used in error messages and must be unique in the database.

    The columns in the child table must be a BTREE (not HASH, RTREE, or FULLTEXT — see ) index, or the leftmost part of a BTREE index. Index prefixes are not supported (thus, and columns cannot be used as foreign keys). If MariaDB automatically creates an index for the foreign key (because it does not exist and is not explicitly created), its name will be index_name.

    The referenced columns in the parent table must be a an index or a prefix of an index.

    The foreign key columns and the referenced columns must be of the same type, or similar types. For integer types, the size and sign must also be the same.

    Both the foreign key columns and the referenced columns can be columns. However, the ON UPDATE CASCADE, ON UPDATE SET NULL, ON DELETE SET NULL clauses are not allowed in this case.

    The parent and the child table must use the same storage engine, and must not be TEMPORARY or partitioned tables. They can be the same table.

    Constraints

    If a foreign keys exists, each row in the child table must match a row in the parent table. Multiple child rows can match the same parent row. A child row matches a parent row if all its foreign key values are identical to a parent row's values in the parent table. However, if at least one of the foreign key values is NULL, the row has no parents, but it is still allowed.

    MariaDB performs certain checks to guarantee that the data integrity is enforced:

    • Trying to insert non-matching rows (or update matching rows in a way that makes them non-matching rows) in the child table produces a 1452 error ( '23000').

    • When a row in the parent table is deleted and at least one child row exists, MariaDB performs an action which depends on the ON DELETE clause of the foreign key.

    • When a value in the column referenced by a foreign key changes and at least one child row exists, MariaDB performs an action which depends on the ON UPDATE clause of the foreign key.

    The allowed actions for ON DELETE and ON UPDATE are:

    • RESTRICT: The change on the parent table is prevented. The statement terminates with a 1451 error ( '2300'). This is the default behavior for both ON DELETE and ON UPDATE.

    • NO ACTION: Synonym for RESTRICT.

    The delete or update operations triggered by foreign keys do not activate and are not counted in the and status variables.

    Foreign key constraints can be disabled by setting the server system variable to 0. This speeds up the insertion of large quantities of data.

    Metadata

    The table contains information about foreign keys. The individual columns are listed in the table.

    The InnoDB-specific Information Schema tables also contain information about the InnoDB foreign keys. The foreign key information is stored in the . Data about the individual columns are stored in .

    The most human-readable way to get information about a table's foreign keys sometimes is the statement.

    Limitations

    Foreign keys have the following limitations in MariaDB:

    • Currently, foreign keys are only supported by InnoDB.

    • Cannot be used with views.

    • The SET DEFAULT action is not supported.

    • Foreign keys actions do not activate .

    Examples

    Let's see an example. We will create an author table and a book table. Both tables have a primary key called id. book also has a foreign key composed by a field called author_id, which refers to the author primary key. The foreign key constraint name is optional, but we'll specify it because we want it to appear in error messages: fk_book_author.

    Now, if we try to insert a book with a non-existing author, we will get an error:

    The error is very descriptive.

    Now, let's try to properly insert two authors and their books:

    It worked!

    Now, let's delete the second author. When we created the foreign key, we specified ON DELETE CASCADE. This should propagate the deletion, and make the deleted author's books disappear:

    We also specified ON UPDATE RESTRICT. This should prevent us from modifying an author's id (the column referenced by the foreign key) if a child row exists:

    REFERENCES

    This page is licensed: CC BY-SA / Gnu FDL

    Defragmenting InnoDB Tablespaces

    Overview

    When rows are deleted from an InnoDB table, the rows are simply marked as deleted and not physically deleted. The free space is not returned to the operating system for re-use.

    The purge thread will physically delete index keys and rows, but the free space introduced is still not returned to operating system. This can lead to gaps in the pages. If you have variable length rows, new rows may be larger than old rows and cannot make use of the available space.

    You can run OPTIMIZE TABLE or ALTER TABLE

    ENGINE=InnoDB to reconstruct the table. Unfortunately running OPTIMIZE TABLE against an InnoDB table stored in the shared table-space file ibdata1 does two things:

    • Makes the table’s data and indexes contiguous inside ibdata1.

    • Increases the size of ibdata1 because the contiguous data and index pages are appended to ibdata1.

    InnoDB Defragmentation

    The feature described below has been deprecated in and was removed in . See and .

    merged Facebook's defragmentation code prepared for MariaDB by Matt, Seong Uck Lee from Kakao. The only major difference to Facebook's code and Matt’s patch is that MariaDB does not introduce new literals to SQL and makes no changes to the server code. Instead, is used and all code changes are inside the InnoDB/XtraDB storage engines.

    The behaviour of OPTIMIZE TABLE is unchanged by default, and to enable this new feature, you need to set the system variable to 1.

    No new tables are created and there is no need to copy data from old tables to new tables. Instead, this feature loads n pages (determined by ) and tries to move records so that pages would be full of records and then frees pages that are fully empty after the operation.

    Note that tablespace files (including ibdata1) will not shrink as the result of defragmentation, but one will get better memory utilization in the InnoDB buffer pool as there are fewer data pages in use.

    A number of new system and status variables for controlling and monitoring the feature are introduced.

    System Variables

    • : Enable InnoDB defragmentation.

    • : Number of pages considered at once when merging multiple pages to defragment.

    • : Number of defragment stats changes there are before the stats are written to persistent storage.

    • : Number of records of space that defragmentation should leave on the page.

    Status Variables

    • : Number of defragment re-compression failures

    • : Number of defragment failures.

    • : Number of defragment operations.

    Example

    After these CREATE and INSERT operations, the following information can be seen from the INFORMATION SCHEMA:

    Deleting three-quarters of the records, leaving gaps, and then optimizing:

    Now some pages have been freed, and some merged:

    See on the Mariadb.org blog for more details.

    This page is licensed: CC BY-SA / Gnu FDL

    Full-Text Index Overview

    MariaDB has support for full-text indexing and searching:

    • A full-text index in MariaDB is an index of type FULLTEXT, and it allows more options when searching for portions of text from a field.

    • Full-text indexes can be used only with MyISAM, Aria, InnoDB and Mroonga tables, and can be created only for CHAR, VARCHAR, or TEXT columns.

    • Partitioned tables cannot contain fulltext indexes, even if the storage engine supports them.

    • A FULLTEXT index definition can be given in the statement when a table is created, or added later using or .

    • For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.

    Full-text searching is performed using syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a literal string, not a variable or a column name.

    Excluded Results

    • Partial words are excluded.

    • Words less than 4 (MyISAM) or 3 (InnoDB) characters in length will not be stored in the fulltext index. This value can be adjusted by changing the system variable (or, for , ).

    • Words longer than 84 characters in length will also not be stored in the fulltext index. This values can be adjusted by changing the system variable (or, for , ).

    Relevance

    MariaDB calculates a relevance for each result, based on a number of factors, including the number of words in the index, the number of unique words in a row, the total number of words in both the index and the result, and the weight of the word. In English, 'cool' will be weighted less than 'dandy', at least at present! The relevance can be returned as part of a query simply by using the MATCH function in the field list.

    Types of Full-Text search

    IN NATURAL LANGUAGE MODE

    IN NATURAL LANGUAGE MODE is the default type of full-text search, and the keywords can be omitted. There are no special operators, and searches consist of one or more comma-separated keywords.

    Searches are returned in descending order of relevance.

    IN BOOLEAN MODE

    Boolean search permits the use of a number of special operators:

    Operator
    Description

    Searches are not returned in order of relevance, and nor does the 50% limit apply. Stopwords and word minimum and maximum lengths still apply as usual.

    WITH QUERY EXPANSION

    A query expansion search is a modification of a natural language search. The search string is used to perform a regular natural language search. Then, words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. It can be useful when relying on implied knowledge within the data, for example that MariaDB is a database.

    Examples

    Creating a table, and performing a basic search:

    Multiple words:

    Since 'Once' is a , no result is returned:

    Inserting the word 'wicked' into more than half the rows excludes it from the results:

    Using IN BOOLEAN MODE to overcome the 50% limitation:

    Returning the relevance:

    WITH QUERY EXPANSION. In the following example, 'MariaDB' is always associated with the word 'database', so it is returned when query expansion is used, even though not explicitly requested.

    Partial word matching with IN BOOLEAN MODE:

    Using boolean operators

    See Also

    • For simpler searches of a substring in text columns, see the operator.

    This page is licensed: CC BY-SA / Gnu FDL

    Data Warehousing High Speed Ingestion

    The problem

    You are ingesting lots of data. Performance is bottlenecked in the INSERT area.

    This will be couched in terms of Data Warehousing, with a huge Fact table and Summary (aggregation) tables.

    Overview of solution

    • Have a separate staging table.

    • Inserts go into Staging.

    • Normalization and Summarization reads Staging, not Fact.

    • After normalizing, the data is copied from Staging to Fact.

    Staging is one (or more) tables in which the data lives only long enough to be handed off to Normalization, Summary, and the Fact tables.

    Since we are probably talking about a billion-row table, shrinking the width of the Fact table by normalizing (as mentioned here). Changing an to a will save a GB. Replacing a string by an id (normalizing) saves many GB. This helps disk space and cacheability, hence speed.

    Injection speed

    Some variations:

    • Big dump of data once an hour, versus continual stream of records.

    • The input stream could be single-threaded or multi-threaded.

    • You might have 3rd party software tying your hands.

    Generally the fastest injection rate can be achieved by "staging" the INSERTs in some way, then batch processing the staged records. This blog discusses various techniques for staging and batch processing.

    Normalization

    Let's say your Input has a host_name column, but you need to turn that into a smaller host_id in the Fact table. The "Normalization" table, as I call it, looks something like

    Here's how you can use Staging as an efficient way achieve the swap from name to id.

    Staging has two fields (for this normalization example):

    Meawhile, the Fact table has:

    SQL #1 (of 2):

    By isolating this as its own transaction, we get it finished in a hurry, thereby minimizing blocking. By saying IGNORE, we don't care if other threads are 'simultaneously' inserting the same host_names.

    There is a subtle reason for the LEFT JOIN. If, instead, it were INSERT IGNORE..SELECT DISTINCT, then the INSERT would preallocate auto_increment ids for as many rows as the SELECT provides. This is very likely to "burn" a lot of ids, thereby leading to overflowing MEDIUMINT unnecessarily. The LEFT JOIN leads to finding just the new ids that are needed (except for the rare possibility of a 'simultaneous' insert by another thread). More rationale:

    SQL #2:

    This gets the IDs, whether already existing, set by another thread, or set by SQL #1.

    If the size of Staging changes depending on the busy versus idle times of the day, this pair of SQL statements has another comforting feature. The more rows in Staging, the more efficient the SQL runs, thereby helping compensate for the "busy" times.

    The companion folds SQL #2 into the INSERT INTO Fact. But you may need host_id for further normalization steps and/or Summarization steps, so this explicit UPDATE shown here is often better.

    Flip-flop staging

    The simple way to stage is to ingest for a while, then batch-process what is in Staging. But that leads to new records piling up waiting to be staged. To avoid that issue, have 2 processes:

    • one process (or set of processes) for INSERTing into Staging;

    • one process (or set of processes) to do the batch processing (normalization, summarization).

    To keep the processes from stepping on each other, we have a pair of staging tables:

    • Staging is being INSERTed into;

    • StageProcess is one being processed for normalization, summarization, and moving to the Fact table. A separate process does the processing, then swaps the tables:

    This may not seem like the shortest way to do it, but has these features:

    • The DROP + CREATE might be faster than TRUNCATE, which is the desired effect.

    • The RENAME is atomic, so the INSERT process(es) never find that Staging is missing.

    A variant on the 2-table flip-flop is to have a separate Staging table for each Insertion process. The Processing process would run around to each Staging in turn.

    A variant on that would be to have a separate processing process for each Insertion process.

    The choice depends on which is faster (insertion or processing). There are tradeoffs; a single processing thread avoids some locks, but lacks some parallelism.

    Engine choice

    Fact table -- , if for no other reason than that a system crash would not need a REPAIR TABLE. (REPAIRing a billion-row table can take hours or days.)

    Normalization tables -- InnoDB, primarily because it can be done efficiently with 2 indexes, whereas, MyISAM would need 4 to achieve the same efficiency.

    Staging -- Lots of options here.

    • If you have multiple Inserters and a single Staging table, InnoDB is desirable due to row-level, not table-level, locking.

    • MEMORY may be the fastest and it avoids I/O. This is good for a single staging table.

    • For multiple Inserters, a separate Staging table for each Inserter is desired.

    • For multiple Inserters into a single Staging table, InnoDB may be faster. (MEMORY does table-level locking.)

    Confused? Lost? There are enough variations in applications that make it impractical to predict what is best. Or, simply good enough. Your ingestion rate may be low enough that you don't hit the brick walls that I am helping you avoid.

    Should you do "CREATE TEMPORARY TABLE"? Probably not. Consider Staging as part of the data flow, not to be DROPped.

    Summarization

    This is mostly covered here: Summarize from the Staging table instead of the Fact table.

    Replication Issues

    Row Based Replication (RBR) is probably the best option.

    The following allows you to keep more of the Ingestion process in the Master, thereby not bogging down the Slave(s) with writes to the Staging table.

    • RBR

    • Staging is in a separate database

    • That database is not replicated (binlog-ignore-db on Master)

    • In the Processing steps, USE that database, reach into the main db via syntax like "MainDb.Hosts". (Otherwise, the binlog-ignore-db does the wrong thing.)

    That way

    • Writes to Staging are not replicated.

    • Normalization sends only the few updates to the normalization tables.

    • Summarization sends only the updates to the summary tables.

    • Flip-flop does not replicate the DROP, CREATE or RENAME.

    Sharding

    You could possibly spread the data you are trying ingest across multiple machines in a predictable way (sharding on hash, range, etc). Running "reports" on a sharded Fact table is a challenge unto itself. On the other hand, Summary Tables rarely get too big to manage on a single machine.

    For now, Sharding is beyond the scope of this blog.

    Push me vs pull me

    I have implicitly assumed the data is being pushed into the database. If, instead, you are "pulling" data from some source(s), then there are some different considerations.

    Case 1: An hourly upload; run via cron

    1. Grab the upload, parse it

    2. Put it into the Staging table

    3. Normalize -- each SQL in its own transaction (autocommit)

    4. BEGIN

    If you need parallelism in Summarization, you will have to sacrifice the transactional integrity of steps 4-7.

    Caution: If these steps add up to more than an hour, you are in deep dodo.

    Case 2: You are polling for the data

    It is probably reasonable to have multiple processes doing this, so it will be detailed about locking.

    1. Create a Staging table for this polling processor. Loop:

    2. With some locked mechanism, decide which 'thing' to poll.

    3. Poll for the data, pull it in, parse it. (Potentially polling and parsing are significantly costly)

    4. Put it into the process-specific Staging table

    iblog_file_size should be larger than the change in the STATUS "Innodb_os_log_written" across the BEGIN...COMMIT transaction (for either Case).

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Pagination Optimization

    The Desire

    You have a website with news articles, or a blog, or some other thing with a list of things that might be too long for a single page. So, you decide to break it into chunks of, say, 10 items and provide a [Next] button to go the next "page".

    You spot OFFSET and LIMIT in MariaDB and decide that is the obvious way to do it.

    Note that the problem requirement needs a [Next] link on each page so that the user can 'page' through the data. He does not really need "GoTo Page #". Jump to the [First] or [Last] page may be useful.

    The Problem

    All is well -- until you have 50,000 items in a list. And someone tries to walk through all 5000 pages. That 'someone' could be a search engine crawler.

    Where's the problem? Performance. Your web page is doing "SELECT ... OFFSET 49990 LIMIT 10" (or the equivalent "LIMIT 49990,10"). MariaDB has to find all 50,000 rows, step over the first 49,990, then deliver the 10 for that distant page.

    If it is a crawler ('spider') that read all the pages, then it actually touched about 125,000,000 items to read all 5,000 pages.

    Reading the entire table, just to get a distant page, can be so much I/O that it can cause timeouts on the web page. Or it can interfere with other activity, causing other things to be slow.

    Other Bugs

    In addition to a performance problem, ...

    • If an item is inserted or deleted between the time you look at one page and the next, you could miss an item, or see an item duplicated.

    • The pages are not easily bookmarked or sent to someone else because the contents shift over time.

    • The WHERE clause and the may even make it so that all 50,000 items have to be read, just to find the 10 items for page 1!

    What to Do?

    Hardware? No, that's just a bandaid. The data will continue to grow and even the new hardware won't handle it.

    Better INDEX? No. You must get away from reading the entire table to get the 5000th page.

    Build another table saying where the pages start? Get real! That would be a maintenance nightmare, and expensive.

    Bottom line: Don't use OFFSET; instead remember where you "left off".

    With INDEX(id), this suddenly becomes very efficient.

    Implementation -- Getting Rid of OFFSET

    You are probably doing this now: datetime DESC LIMIT 49990,10 You probably have some unique id on the table. This can probably be used for "left off".

    Currently, the [Next] button probably has a url something like ?topic=xyz&page=4999&limit=10 The 'topic' (or 'tag' or 'provider' or 'user' or etc) says which set of items are being displayed. The product of page*limit gives the OFFSET. (The "limit=10" might be in the url, or might be hard-coded; this choice is not relevant to this discussion.)

    The new variant would be ?topic=xyz&id=12345&limit=10. (Note: the 12345 is not computable from 4999.) By using INDEX(topic, id) you can efficiently say

    That will hit only 10 rows. This is a huge improvement for later pages. Now for more details.

    Implementation -- "Left Off"

    What if there are exactly 10 rows left when you display the current page? It would make the UI nice if you grayed out the [Next] button, wouldn't it. (Or you could suppress the button all together.)

    How to do that? Instead of LIMIT 10, use LIMIT 11. That will give you the 10 items needed for the current page, plus an indication of whether there is another page. And the id for that page.

    So, take the 11th id for the [Next] button: <a href=?topic=xyz&id=$id11&limit=10>Next

    Implementation -- Links Beyond [Next]

    Let's extend the 11 trick to also find the next 5 pages and build links for them.

    Plan A is to say LIMIT 51. If you are on page 12, that would give you links for pages 13 (using 11th id) through pages 17 (51st).

    Plan B is to do two queries, one to get the 10 items for the current page, the other to get the next 41 ids (LIMIT 10, 41) for the next 5 pages.

    Which plan to pick? It depends on many things, so benchmark.

    A Reasonable Set of Links

    Reaching forward and backward by 5 pages is not too much work. It would take two separate queries to find the ids in both directions. Also, having links that take you to the First and Last pages would be easy to do. No id is needed; they can be something like

    The UI would recognize those, then generate a SELECT with something like

    The last items would be delivered in reverse order. Either deal with that in the UI, or make the SELECT more complex:

    Let's say you are on page 12 of lots of pages. It could show these links:

    where the ellipsis is really used. Some end cases:

    Why it Works

    The goal is to touch only the relevant rows, not all the rows leading up to the desired rows. This is nicely achieved, except for building links to the "next 5 pages". That may (or may not) be efficiently resolved by the simple SELECT id, discussed above. The reason that may not be efficient deals with the WHERE clause.

    Let's discuss the optimal and suboptimal indexes.

    For this discussion, I am assuming

    • The datetime field might have duplicates -- this can cause troubles

    • The id field is unique

    • The id field is close enough to datetime-ordered to be used instead of datetime.

    Very efficient -- it does all the work in the index:

    That will hit at least 51 consecutive index entries, plus at least 51 randomly located data rows.

    Efficient -- back to the previous degree of efficiency:

    Note how all the '=' parts of the WHERE come first; then comes both the '>=' and 'ORDER BY', both on id. This means that the INDEX can be used for all the WHERE, plus the ORDER BY.

    "Items 11-20 Out of 12345"

    You lose the "out of" except when the count is small. Instead, say something like

    Alternatively... Only a few searches will have too many items to count. Keep another table with the search criteria and a count. This count can be computed daily (or hourly) by some background script. When discovering that the topic is a busy one, look it up in the table to get

    The background script would round the count off.

    The quick way to get an estimated number of rows for an InnoDB table is

    However, it does not allow for the WHERE clause that you probably have.

    Complex WHERE, or JOIN

    If the search criteria cannot be confined to an INDEX in a single table, this technique is doomed. I have another paper that discusses "Lists", which solves that (which extra development work), and even improves on what is discussed here.

    How Much Faster?

    This depends on

    • How many rows (total)

    • Whether the WHERE clause prevented the efficient use of the ORDER BY

    • Whether the data is bigger than the cache. This last one kicks in when building one page requires reading more data from disk can be cached. At that point, the problem goes from being CPU-bound to being I/O-bound. This is likely to suddenly slow down the loading of a pages by a factor of 10.

    What is Lost

    • Cannot "jump to Page N", for an arbitrary N. Why do you want to do that?

    • Walking backward from the end does not know the page numbers.

    • The code is more complex.

    Postlog

    Designed about 2007; posted 2012.

    See Also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    How to Quickly Insert Data Into MariaDB

    This article describes different techniques for inserting data quickly into MariaDB.

    Background

    When inserting new data into MariaDB, the things that take time are (in order of importance):

    • Syncing data to disk (as part of the end of transactions)

    • Adding new keys. The larger the index, the more time it takes to keep keys updated.

    • Checking against foreign keys (if they exist).

    • Adding rows to the storage engine.

    • Sending data to the server.

    The following describes the different techniques (again, in order of importance) you can use to quickly insert data into a table.

    Disabling Keys

    You can temporarily disable updating of non-unique indexes. This is mostly useful when there are zero (or very few) rows in the table into which you are inserting data.

    In many storage engines (at least MyISAM and Aria),ENABLE KEYS works by scanning through the row data and collecting keys, sorting them and then creating the index blocks. This is an order of magnitude faster than creating the index one row at a time and it also uses less key buffer memory.

    Note: When you insert into an empty table with or , MariaDB automatically does a before and an afterwards.

    When inserting big amounts of data, integrity checks are sensibly time-consuming. It is possible to disable the UNIQUE indexes and the checks using the and the system variables:

    For InnoDB tables, the can be temporarily set to 2, which is the fastest setting:

    Also, if the table has or columns, you may want to drop them, insert all data, and recreate them.

    Loading Text Files

    The fastest way to insert data into MariaDB is through the command.

    The simplest form of the command is:

    You can also read a file locally on the machine where the client is running by using:

    This is not as fast as reading the file on the server side, but the difference is not that big.

    LOAD DATA INFILE is very fast because:

    1. there is no parsing of SQL.

    2. data is read in big blocks.

    3. if the table is empty at the beginning of the operation, all non-unique indexes are disabled during the operation.

    4. the engine is told to cache rows first and then insert them in big blocks (At least MyISAM and Aria support this).

    Because of the above speed advantages there are many cases, when you need to insert many rows at a time, where it may be faster to create a file locally, add the rows there, and then use LOAD DATA INFILE to load them; compared to using INSERT to insert the rows.

    You will also get forLOAD DATA INFILE.

    mariadb-import

    You can import many files in parallel with (mysqlimport before ). For example:

    Internally uses to read in the data.

    Inserting Data with INSERT Statements

    Using Big Transactions

    When doing many inserts in a row, you should wrap them with BEGIN / END to avoid doing a full transaction (which includes a disk sync) for every row. For example, doing a begin/end every 1000 inserts will speed up your inserts by almost 1000 times.

    The reason why you may want to have many BEGIN/END statements instead of just one is that the former will use up less transaction log space.

    Multi-Value Inserts

    You can insert many rows at once with multi-value row inserts:

    The limit for how much data you can have in one statement is controlled by the server variable.

    Inserting Data Into Several Tables at Once

    If you need to insert data into several tables at once, the best way to do so is to enable multi-row statements and send many inserts to the server at once:

    is a function that returns the lastauto_increment value inserted.

    By default, the command line mariadb client will send the above as multiple statements.

    To test this in the mariadb client you have to do:

    Note: for multi-query statements to work, your client must specify theCLIENT_MULTI_STATEMENTS flag to mysql_real_connect().

    Server Variables That Can be Used to Tune Insert Speed

    Option
    Description

    See for the full list of server variables.

    This page is licensed: CC BY-SA / Gnu FDL

    Lateral Derived Optimization

    MariaDB supports the Lateral Derived optimization, also referred to as "Split Grouping Optimization" or "Split Materialized Optimization" in some sources.

    Description

    The optimization's use case is

    • The query uses a derived table (or a VIEW, or a non-recursive CTE)

    • The derived table/View/CTE has a GROUP BY operation as its top-level operation

    • The query only needs data from a few GROUP BY groups

    An example of this: consider a VIEW that computes totals for each customer in October:

    And a query that does a join with the customer table to get October totals for "Customer#1" and Customer#2:

    Before Lateral Derived optimization, MariaDB would execute the query as follows:

    1. Materialize the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.

    2. Join it with table customer.

    The EXPLAIN would look like so:

    It is obvious that Step #1 is very inefficient: we compute totals for all customers in the database, while we will only need them for two customers. (If there are 1000 customers, we are doing 500x more work than needed here)

    Lateral Derived optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables. This allows pushing the equality customer.customer_id=OCT_TOTALS.customer_id down into the derived table/view, where it can be used to limit the computation to compute totals only for the customer of interest.

    The query plan will look as follows:

    1. Scan table customer and find customer_id for Customer#1 and Customer#2.

    2. For each customer_id, compute the October totals, for this specific customer.

    The EXPLAIN output will look like so:

    Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.

    In EXPLAIN FORMAT=JSON output, the optimization is shown like so:

    Note the "lateral": 1 member.

    Controlling the Optimization

    Lateral Derived is enabled by default. The optimizer will make a cost-based decision whether the optimization should be used.

    If you need to disable the optimization, it has an flag. It can be disabled like so:

    From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .

    For example, by default, this table and query makes use of the optimization:

    CREATE TABLE t1 ( n1 INT(10) NOT NULL, n2 INT(10) NOT NULL, c1 CHAR(1) NOT NULL, KEY c1 (c1), KEY n1_c1_n2 (n1,c1,n2) ) ENGINE=innodb CHARSET=latin1;

    INSERT INTO t1 VALUES (0, 2, 'a'), (1, 3, 'a');

    INSERT INTO t1 SELECT seq+1,seq+2,'c' FROM seq_1_to_1000;

    ANALYZE TABLE t1;

    EXPLAIN SELECT t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G

    References

    • Jira task:

    • Commit:

    This page is licensed: CC BY-SA / Gnu FDL

    Entity-Attribute-Value Implementation

    The desires

    • Open-ended set of "attributes" (key=value) for each "entity". That is, the list of attributes is not known at development time, and will grow in the future. (This makes one column per attribute impractical.)

    • "ad hoc" queries testing attributes.

    • Attribute values come in different types (numbers, strings, dates, etc.)

    • Scale to lots of entities, yet perform well.

    It goes by various names

    • EAV -- Entity - Attribute - Value

    • key-value

    • RDF -- This is a flavor of EAV

    • MariaDB has dynamic columns that look something like the solution below, with the added advantage of being able to index the columns otherwise hidden in the blob. (There are caveats.)

    Bad solution

    • Table with 3 columns: entity_id, key, value

    • The "value" is a string, or maybe multiple columns depending on datatype or other kludges.

    • a JOIN b ON a.entity=b.entity AND b.key='x' JOIN c ON ... WHERE a.value=... AND b.value=...

    The problems

    • The SELECTs get messy -- multiple JOINs

    • Datatype issues -- It's clumsy to be putting numbers into strings

    • Numbers stored in do not compare 'correctly', especially for range tests.

    • Bulky.

    A solution

    Decide which columns need to be searched/sorted by SQL queries. No, you don't need all the columns to be searchable or sortable. Certain columns are frequently used for selection; identify these. You probably won't use all of them in all queries, but you will use some of them in every query.

    The solution uses one table for all the EAV stuff. The columns include the searchable fields plus one . Searchable fields are declared appropriately (, , etc). The BLOB contains JSON-encoding of all the extra fields.

    The table should be , hence it should have a PRIMARY KEY. The entitity_id is the 'natural' PK. Add a small number of other indexes (often 'composite') on the searchable fields. is unlikely to be of any use, unless the Entities should purged after some time. (Example: News Articles)

    But what about the ad hoc queries?

    You have included the most important fields to search on -- date, category, etc. These should filter the data down significantly. When you also need to filter on something more obscure, that will be handled differently. The application code will look at the BLOB for that; more on this later.

    Why it works

    • You are not really going to search on more than a few fields.

    • The disk footprint is smaller; Smaller --> More cacheable --> Faster

    • It needs no JOINs

    • The indexes are useful

    Details on the BLOB/JSON

    • Build the extra (or all) key-value pairs in a hash (associative array) in your application. Encode it. COMPRESS it. Insert that string into the .

    • JSON is recommended, but not mandatory; it is simpler than XML. Other serializations (eg, YAML) could be used.

    • COMPRESS the JSON and put it into a (or ) instead of a field. Compression gives about 3x shrinkage.

    • When SELECTing, UNCOMPRESS the blob. Decode the string into a hash. You are now ready to interrogate/display any of the extra fields.

    Conclusions

    • Schema is reasonably compact (compression, real datatypes, less redundancy, etc, than EAV)

    • Queries are fast (since you have picked 'good' indexes)

    • Expandable (JSON is happy to have new fields)

    • Compatible (No 3rd party products, just supported products)

    Postlog

    Posted Jan, 2014; Refreshed Feb, 2016.

    • MariaDB's

    This looks very promising; I will need to do more research to see how much of this article is obviated by it: ,

    If you insist on EAV, set .

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Multi Range Read Optimization

    Multi Range Read is an optimization aimed at improving performance for IO-bound queries that need to scan lots of rows.

    Multi Range Read can be used with

    • range access

    • ref and eq_ref access, when they are using

    as shown in this diagram:

    Compound (Composite) Indexes

    A mini-lesson in "compound indexes" ("composite indexes")

    This document starts out trivial and perhaps boring, but builds up to more interesting information, perhaps things you did not realize about how MariaDB and MySQL indexing works.

    This also explains (to some extent).

    (Most of this applies to other databases, too.)

    SELECT  *
            FROM  items
            WHERE  messy_filtering
            ORDER BY  date DESC
            OFFSET  $M  LIMIT $N

    MySQL 5.7 Has JSON datatype, plus functions to access parts

  • MongoDB, CouchDB -- and others -- Not SQL-based.

  • Dedupping the values is clumsy.

    The one table has one row per entity, and can grow as needed. (EAV needs many rows per entity.)
  • Performance is as good as the indexes you have on the 'searchable fields'.

  • Optionally, you can duplicate the indexed fields in the BLOB.

  • Values missing from 'searchable fields' would need to be NULL (or whatever), and the code would need to deal with such.

  • If you choose to use the JSON features of MariaDB or 5.7, you will have to forgo the compression feature described.

  • MySQL 5.7.8's JSON native JSON datatype uses a binary format for more efficient access.

  • Range tests work (unlike storing INTs in VARCHARs)

  • (Drawback) Cannot use the non-indexed attributes in WHERE or ORDER BY clauses, must deal with that in the app. (MySQL 5.7 partially alleviates this.)

  • VARCHAR
    BLOB
    INT
    TIMESTAMP
    InnoDB
    PARTITIONing
    BLOB
    BLOB
    MEDIUMBLOB
    TEXT
    Dynamic Columns
    MySQL 5.7's JSON
    Using MySQL as a Document Store in 5.7
    more DocStore discussion
    optimizer_search_depth=1
    Rick James' site
    eav
    Stopwords are a list of common words such as "once" or "then" that do not reflect in the search results unless IN BOOLEAN MODE is used. The stopword list for MyISAM/Aria tables and InnoDB tables can differ. See
    for details and a full list, as well as for details on how to change the default list.
  • For MyISAM/Aria fulltext indexes only, if a word appears in more than half the rows, it is also excluded from the results of a fulltext search.

  • For InnoDB indexes, only committed rows appear - modifications from the current transaction do not apply.

  • *

    The wildcard, indicating zero or more characters. It can only appear at the end of a word.

    "

    Anything enclosed in the double quotes is taken as a whole (so you can match phrases, for example).

    +

    The word is mandatory in all rows returned.

    -

    The word cannot appear in any row returned.

    <

    The word that follows has a lower relevance than other words, although rows containing it will still match

    >

    The word that follows has a higher relevance than other words.

    ()

    Used to group words into subexpressions.

    ~

    The word following contributes negatively to the relevance of the row (which is different to the '-' operator, which specifically excludes the word, or the '<' operator, which still causes the word to contribute positively to the relevance of the row.

    CREATE TABLE
    ALTER TABLE
    CREATE INDEX
    MATCH() ... AGAINST
    ft_min_word_length
    InnoDB
    innodb_ft_min_token_size
    ft_max_word_length
    InnoDB
    innodb_ft_max_token_size
    stopword
    LIKE
    stopwords

    With one non-InnoDB Staging table per Inserter, using an explicit LOCK TABLE avoids repeated implicit locks on each INSERT.

  • But, if you are doing LOCK TABLE and the Processing thread is separate, an UNLOCK is necessary periodically to let the RENAME grab the table.

  • "Batch INSERTs" (100-1000 rows per SQL) eliminates much of the issues of the above bullet items.

  • Summarize
  • Copy from Staging to Fact.

  • COMMIT

  • Normalize -- each SQL in its own transaction (autocommit)

  • BEGIN

  • Summarize

  • Copy from Staging to Fact.

  • COMMIT

  • Declare that you are finished with this 'thing' (see step 1) EndLoop.

  • INT
    MEDIUMINT
    VARCHAR
    MEDIUMINT
    Mapping table
    Data Warehouse article
    InnoDB
    MyISAM
    Summary Tables
    companion Data Warehouse blog
    companion Summary Table blog
    a forum thread that prodded me into writing this blog
    StackExchange discussion
    Rick James' site
    staging_table
    ORDER BY
    ORDER BY
    A forum discussion
    Luke calls it "seek method" or "keyset pagination"
    More by Luke
    Rick James' site
    pagination
    The optimization can be disabled as follows:

    EXPLAIN SELECT /*+ NO_SPLIT_MATERIALIZED(t) */ t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G

    No optimizer hint is available.

    optimizer_switch
    SPLIT_MATERLIZED or NO_SPLIT_MATERIALIZED
    MDEV-13369
    b14e2b044b
    MATCH (col1,col2,...) AGAINST (expr [search_modifier])
    CREATE TABLE ft_myisam(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
    
    INSERT INTO ft_myisam(copy) VALUES ('Once upon a time'),
      ('There was a wicked witch'), ('Who ate everybody up');
    
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
    +--------------------------+
    | copy                     |
    +--------------------------+
    | There was a wicked witch |
    +--------------------------+
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked,witch');
    +---------------------------------+
    | copy                            |
    +---------------------------------+
    | There was a wicked witch        |
    +---------------------------------+
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('Once');
    Empty set (0.00 sec)
    INSERT INTO ft_myisam(copy) VALUES ('Once upon a wicked time'),
      ('There was a wicked wicked witch'), ('Who ate everybody wicked up');
    
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
    Empty set (0.00 sec)
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked' IN BOOLEAN MODE);
    +---------------------------------+
    | copy                            |
    +---------------------------------+
    | There was a wicked witch        |
    | Once upon a wicked time         |
    | There was a wicked wicked witch |
    | Who ate everybody wicked up     |
    +---------------------------------+
    SELECT copy,MATCH(copy) AGAINST('witch') AS relevance 
      FROM ft_myisam WHERE MATCH(copy) AGAINST('witch');
    +---------------------------------+--------------------+
    | copy                            | relevance          |
    +---------------------------------+--------------------+
    | There was a wicked witch        | 0.6775632500648499 |
    | There was a wicked wicked witch | 0.5031757950782776 |
    +---------------------------------+--------------------+
    CREATE TABLE ft2(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
    
    INSERT INTO ft2(copy) VALUES
     ('MySQL vs MariaDB database'),
     ('Oracle vs MariaDB database'), 
     ('PostgreSQL vs MariaDB database'),
     ('MariaDB overview'),
     ('Foreign keys'),
     ('Primary keys'),
     ('Indexes'),
     ('Transactions'),
     ('Triggers');
    
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database');
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    +--------------------------------+
    3 rows in set (0.00 sec)
    
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database' WITH QUERY EXPANSION);
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    | MariaDB overview               |
    +--------------------------------+
    4 rows in set (0.00 sec)
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('Maria*' IN BOOLEAN MODE);
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    | MariaDB overview               |
    +--------------------------------+
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('+MariaDB -database' 
      IN BOOLEAN MODE);
    +------------------+
    | copy             |
    +------------------+
    | MariaDB overview |
    +------------------+
    CREATE TABLE Hosts (
        host_id  MEDIUMINT UNSIGNED  NOT NULL AUTO_INCREMENT,
        host_name VARCHAR(99) NOT NULL,
        PRIMARY KEY (host_id),      -- for mapping one direction
        INDEX(host_name, host_id)   -- for mapping the other direction
    ) ENGINE=InnoDB;                -- InnoDB works best for Many:Many mapping table
    host_name VARCHAR(99) NOT NULL,     -- Comes from the insertion proces
        host_id  MEDIUMINT UNSIGNED  NULL,  -- NULL to start with; see code below
    host_id  MEDIUMINT UNSIGNED NOT NULL,
    # This should not be in the main transaction, and it should be done with autocommit = ON
        # In fact, it could lead to strange errors if this were part
        #    of the main transaction and it ROLLBACKed.
        INSERT IGNORE INTO Hosts (host_name)
            SELECT DISTINCT s.host_name
                FROM Staging AS s
                LEFT JOIN Hosts AS n  ON n.host_name = s.host_name
                WHERE n.host_id IS NULL;
    # Also not in the main transaction, and it should be with autocommit = ON
        # This multi-table UPDATE sets the ids in Staging:
        UPDATE   Hosts AS n
            JOIN Staging AS s  ON s.host_name = n host_name
            SET s.host_id = n.host_id
    DROP   TABLE StageProcess;
        CREATE TABLE StageProcess LIKE Staging;
        RENAME TABLE Staging TO tmp, StageProcess TO Staging, tmp TO StageProcess;
    # First page (latest 10 items):
        SELECT ... WHERE ... ORDER BY id DESC LIMIT 10
    # Next page (second 10):
        SELECT ... WHERE ... AND id < $left_off ORDER BY id DESC LIMIT 10
    WHERE topic = 'xyz'
          AND id >= 1234
        ORDER BY id
        LIMIT 10
    <a href=?topic=xyz&id=FIRST&limit=10>First</a>
        <a href=?topic=xyz&id=LAST&limit=10>Last</a>
    WHERE topic = 'xyz'
        ORDER BY id ASC -- ASC for First; DESC for Last
        LIMIT 10
    ( SELECT ...
            WHERE topic = 'xyz'
            ORDER BY id DESC
            LIMIT 10
        ) ORDER BY id ASC
    [First] ... [7] [8] [9] [10] [11] 12 [13] [14] [15] [16] [17] ... [Last]
    # Page one of three:
        First [2] [3]
    # Page one of many:
        First [2] [3] [4] [5] ... [Last]
    # Page two of many:
        [First] 2 [3] [4] [5] ... [Last]
    # If you jump to the Last page, you don't know what page number it is.
    # So, the best you can do is perhaps:
    #    [First] ... [Prev] Last
    INDEX(topic, id)
        WHERE topic = 'xyz'
          AND id >= 876
        ORDER BY id ASC
        LIMIT 10,41
    <</code??
    That will hit 51 consecutive index entries, 0 data rows.
    
    Inefficient -- it must reach into the data:
    <<code>>
        INDEX(topic, id)
        WHERE topic = 'xyz'
          AND id >= 876
          AND is_deleted = 0
        ORDER BY id ASC
        LIMIT 10,41
    INDEX(topic, is_deleted, id)
        WHERE topic = 'xyz'
          AND id >= 876
          AND is_deleted = 0
        ORDER BY id ASC
        LIMIT 10,41
    Items 11-20 out of Many
    Items 11-20 out of about 49,000
    SELECT  table_rows
            FROM  information_schema.TABLES
            WHERE  TABLE_SCHEMA = 'database_name'
              AND  TABLE_NAME = 'table_name'
    CREATE VIEW OCT_TOTALS AS
    SELECT
      customer_id,
      SUM(amount) AS TOTAL_AMT
    FROM orders
    WHERE
      order_date BETWEEN '2017-10-01' AND '2017-10-31'
    GROUP BY
      customer_id;
    SELECT *
    FROM
      customer, OCT_TOTALS
    WHERE
      customer.customer_id=OCT_TOTALS.customer_id AND
      customer.customer_name IN ('Customer#1', 'Customer#2')
    +------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
    | id   | select_type | table      | type  | possible_keys | key       | key_len | ref                       | rows  | Extra                    |
    +------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
    |    1 | PRIMARY     | customer   | range | PRIMARY,name  | name      | 103     | NULL                      | 2     | Using where; Using index |
    |    1 | PRIMARY     | <derived2> | ref   | key0          | key0      | 4       | test.customer.customer_id | 36    |                          |
    |    2 | DERIVED     | orders     | index | NULL          | o_cust_id | 4       | NULL                      | 36738 | Using where              |
    +------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
    +------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
    | id   | select_type     | table      | type  | possible_keys | key       | key_len | ref                       | rows | Extra                    |
    +------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
    |    1 | PRIMARY         | customer   | range | PRIMARY,name  | name      | 103     | NULL                      | 2    | Using where; Using index |
    |    1 | PRIMARY         | <derived2> | ref   | key0          | key0      | 4       | test.customer.customer_id | 2    |                          |
    |    2 | LATERAL DERIVED | orders     | ref   | o_cust_id     | o_cust_id | 4       | test.customer.customer_id | 1    | Using where              |
    +------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
    ...
            "table": {
              "table_name": "<derived2>",
              "access_type": "ref",
    ...
              "materialized": {
                "lateral": 1,
    SET optimizer_switch='split_materialized=off'
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: t1
             type: ref
    possible_keys: c1,n1_c1_n2
              key: c1
          key_len: 1
              ref: const
             rows: 2
            Extra: Using index condition; Using where; Using temporary; Using filesort
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            table: <derived2>
             type: ref
    possible_keys: key0
              key: key0
          key_len: 8
              ref: test.t1.n1,test.t1.n2
             rows: 1
            Extra: 
    *************************** 3. row ***************************
               id: 2
      select_type: LATERAL DERIVED
            table: t1
             type: ref
    possible_keys: c1,n1_c1_n2
              key: n1_c1_n2
          key_len: 4
              ref: test.t1.n1
             rows: 1
            Extra: Using where; Using index
    
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: t1
             type: ref
    possible_keys: c1,n1_c1_n2
              key: c1
          key_len: 1
              ref: const
             rows: 2
            Extra: Using index condition; Using where; Using temporary; Using filesort
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            table: <derived2>
             type: ref
    possible_keys: key0
              key: key0
          key_len: 8
              ref: test.t1.n1,test.t1.n2
             rows: 1
            Extra: 
    *************************** 3. row ***************************
               id: 2
      select_type: DERIVED
            table: t1
             type: ref
    possible_keys: c1
              key: c1
          key_len: 1
              ref: const
             rows: 2
            Extra: Using index condition; Using where; Using temporary; Using filesort

    Trying to drop a table that is referenced by a foreign key produces a 1217 error (SQLSTATE '23000').

  • A TRUNCATE TABLE against a table containing one or more foreign keys is executed as a DELETE without WHERE, so that the foreign keys are enforced for each row.

  • CASCADE
    : The change is allowed and propagates on the child table. For example, if a parent row is deleted, the child row is also deleted; if a parent row's ID changes, the child row's ID will also change.
  • SET NULL: The change is allowed, and the child row's foreign key columns are set to NULL.

  • SET DEFAULT: Only worked with PBXT. Similar to SET NULL, but the foreign key columns were set to their default values. If default values do not exist, an error is produced.

  • If ON UPDATE CASCADE recurses to update the same table it has previously updated during the cascade, it acts like RESTRICT.

  • Indexed generated columns (both VIRTUAL and PERSISTENT) are not supported as InnoDB foreign key indexes.

  • Prior to MariaDB 12.1, foreign key names are required to be unique per database. From MariaDB 12.1, foreign key names are only required to be unique per table.

  • Examples
    CREATE TABLE
    ALTER TABLE
    SHOW INDEX
    TEXT
    BLOB
    PERSISTENT
    SQLSTATE
    SQLSTATE
    triggers
    Com_delete
    Com_update
    foreign_key_checks
    Information Schema
    REFERENTIAL_CONSTRAINTS
    KEY_COLUMN_USAGE
    INNODB_SYS_FOREIGN
    INNODB_SYS_FOREIGN_COLS
    SHOW CREATE TABLE
    triggers

    innodb_defragment_fill_factor: Indicates how full defragmentation should fill a page.

  • innodb_defragment_frequency: Maximum times per second for defragmenting a single index.

  • MDEV-30544
    MDEV-30545
    OPTIMIZE TABLE
    innodb_defragment
    innodb-defragment-n-pages
    innodb_defragment
    innodb_defragment_n_pages
    innodb_defragment_stats_accuracy
    innodb_defragment_fill_factor_n_recs
    Innodb_defragment_compression_failures
    Innodb_defragment_failures
    Innodb_defragment_count
    Defragmenting unused space on InnoDB tablespace

    for empty tables, some transactional engines (like Aria) do not log the inserted data in the transaction log because one can roll back the operation by just doing a TRUNCATE on the table.

    innodb_buffer_pool_size

    Increase this if you have many indexes in InnoDB/XtraDB tables

    key_buffer_size

    Increase this if you have many indexes in MyISAM tables

    max_allowed_packet

    Increase this to allow bigger multi-insert statements

    read_buffer_size

    Read block size when reading a file with LOAD DATA

    INSERT
    LOAD DATA
    DISABLE KEYS
    ENABLE KEYS
    foreign keys
    unique_checks
    foreign_key_checks
    AUTO_INCREMENT lock mode
    INSERT triggers
    PERSISTENT
    LOAD DATA INFILE
    mariadb-import
    mariadb-import
    LOAD DATA INFILE
    max_allowed_packet
    LAST_INSERT_ID()
    Server System Variables
    possible-mrr-uses

    The Idea

    Case 1: Rowid Sorting for Range Access

    Consider a range query:

    When this query is executed, disk IO access pattern will follow the red line in this figure:

    no-mrr-access-pattern

    Execution will hit the table rows in random places, as marked with the blue line/numbers in the figure.

    When the table is sufficiently big, each table record read will need to actually go to disk (and be served from buffer pool or OS cache), and query execution will be too slow to be practical. For example, a 10,000 RPM disk drive is able to make 167 seeks per second, so in the worst case, query execution will be capped at reading about 167 records per second.

    SSD drives do not need to do disk seeks, so they will not be hurt as badly, however the performance will still be poor in many cases.

    Multi-Range-Read optimization aims to make disk access faster by sorting record read requests and then doing one ordered disk sweep. If one enables Multi Range Read, EXPLAIN will show that a "Rowid-ordered scan" is used:

    and the execution will proceed as follows:

    mrr-access-pattern

    Reading disk data sequentially is generally faster, because

    • Rotating drives do not have to move the head back and forth

    • One can take advantage of IO-prefetching done at various levels

    • Each disk page will be read exactly once, which means we won't rely on disk cache (or buffer pool) to save us from reading the same page multiple times.

    The above can make a huge difference on performance. There is also a catch, though:

    • If you're scanning small data ranges in a table that is sufficiently small so that it completely fits into the OS disk cache, then you may observe that the only effect of MRR is that extra buffering/sorting adds some CPU overhead.

    • LIMIT n and ORDER BY ... LIMIT n queries with small values of n may become slower. The reason is that MRR reads data in disk order, while ORDER BY ... LIMIT n wants first n records in index order.

    Case 2: Rowid Sorting for Batched Key Access

    Batched Key Access can benefit from rowid sorting in the same way as range access does. If one has a join that uses index lookups:

    Execution of this query will cause table t2 to be hit in random locations by lookups made through t2.key1=t1.col. If you enable Multi Range and Batched Key Access, you will get table t2 to be accessed using a Rowid-ordered scan:

    The benefits will be similar to those listed for range access.

    An additional source of speedup is this property: if there are multiple records in t1 that have the same value of t1.col1, then regular Nested-Loops join will make multiple index lookups for the same value of t2.key1=t1.col1. The lookups may or may not hit the cache, depending on how big the join is. With Batched Key Access and Multi-Range Read, no duplicate index lookups will be made.

    Case 3: Key Sorting for Batched Key Access

    Let us consider again the nested loop join example, with ref access on the second table:

    Execution of this query plan will cause random hits to be made into the index t2.key1, as shown in this picture:

    key-sorting-regular-nl-join

    In particular, on step #5 we'll read the same index page that we've read on step #2, and the page we've read on step #4 will be re-read on step#6. If all pages you're accessing are in the cache (in the buffer pool, if you're using InnoDB, and in the key cache, if you're using MyISAM), this is not a problem. However, if your hit ratio is poor and you're going to hit the disk, it makes sense to sort the lookup keys, like shown in this figure:

    key-sorting-join

    This is roughly what Key-ordered scan optimization does. In EXPLAIN, it looks as follows:

    ((TODO: a note about why sweep-read over InnoDB's clustered primary index scan (which is, actually the whole InnoDB table itself) will use Key-ordered scan algorithm, but not Rowid-ordered scan algorithm, even though conceptually they are the same thing in this case))

    Buffer Space Management

    As was shown above, Multi Range Read requires sort buffers to operate. The size of the buffers is limited by system variables. If MRR has to process more data than it can fit into its buffer, it will break the scan into multiple passes. The more passes are made, the less is the speedup though, so one needs to balance between having too big buffers (which consume lots of memory) and too small buffers (which limit the possible speedup).

    Range Access

    When MRR is used for range access, the size of its buffer is controlled by the mrr_buffer_size system variable. Its value specifies how much space can be used for each table. For example, if there is a query which is a 10-way join and MRR is used for each table, 10*@@mrr_buffer_size bytes may be used.

    Batched Key Access

    When Multi Range Read is used by Batched Key Access, then buffer space is managed by BKA code, which will automatically provide a part of its buffer space to MRR. You can control the amount of space used by BKA by setting

    • join_buffer_size to limit how much memory BKA uses for each table, and

    • join_buffer_space_limit to limit the total amount of memory used by BKA in the join.

    Status Variables

    There are three status variables related to Multi Range Read:

    Variable name
    Meaning

    Counts how many Multi Range Read scans were performed

    Number of times key buffer was refilled (not counting the initial fill)

    Number of times rowid buffer was refilled (not counting the initial fill)

    Non-zero values of Handler_mrr_key_refills and/or Handler_mrr_rowid_refills mean that Multi Range Read scan did not have enough memory and had to do multiple key/rowid sort-and-sweep passes. The greatest speedup is achieved when Multi Range Read runs everything in one pass, if you see lots of refills it may be beneficial to increase sizes of relevant buffers mrr_buffer_size join_buffer_size and join_buffer_space_limit

    Effect on Other Status Variables

    When a Multi Range Read scan makes an index lookup (or some other "basic" operation), the counter of the "basic" operation, e.g. Handler_read_key, will also be incremented. This way, you can still see total number of index accesses, including those made by MRR. Per-user/table/index statistics counters also include the row reads made by Multi Range Read scans.

    Why Using Multi Range Read Can Cause Higher Values in Status Variables

    Multi Range Read is used for scans that do full record reads (i.e., they are not "Index only" scans). A regular non-index-only scan will read

    1. an index record, to get a rowid of the table record

    2. a table record Both actions will be done by making one call to the storage engine, so the effect of the call will be that the relevan Handler_read_XXX counter will be incremented BY ONE, and Innodb_rows_read will be incremented BY ONE.

    Multi Range Read will make separate calls for steps #1 and #2, causing TWO increments to Handler_read_XXX counters and TWO increments to Innodb_rows_read counter. To the uninformed, this looks as if Multi Range Read was making things worse. Actually, it doesn't - the query will still read the same index/table records, and actually Multi Range Read may give speedups because it reads data in disk order.

    Multi Range Read Factsheet

    • Multi Range Read is used by

      • range access method for range scans.

      • Batched Key Access for joins

    • Multi Range Read can cause slowdowns for small queries over small tables, so it is disabled by default.

    • There are two strategies:

      • Rowid-ordered scan

      • Key-ordered scan

    • : and you can tell if either of them is used by checking the Extra column in EXPLAIN output.

    • There are three flags you can switch ON:

      • mrr=on - enable MRR and rowid ordered scans

      • mrr_sort_keys=on - enable Key-ordered scans (you must also set mrr=on for this to have any effect)

    Differences from MySQL

    • MySQL supports only Rowid ordered scan strategy, which it shows in EXPLAIN as Using MRR.

    • EXPLAIN in MySQL shows Using MRR, while in MariaDB it may show

      • Rowid-ordered scan

      • Key-ordered scan

      • Key-ordered Rowid-ordered scan

    • MariaDB uses as a limit of MRR buffer size for range access, while MySQL uses .

    • MariaDB has three MRR counters: , Handler_mrr_extra_rowid_sorts, Handler_mrr_extra_key_sorts, while MySQL has only Handler_mrr_init, and it will only count MRR scans that were used by BKA. MRR scans used by range access are not counted.

    This page is licensed: CC BY-SA / Gnu FDL

    The query to discuss

    The question is "When was Andrew Johnson president of the US?".

    The available table Presidents looks like:

    ("Andrew Johnson" was picked for this lesson because of the duplicates.)

    What index(es) would be best for that question? More specifically, what would be best for

    Some INDEXes to try...

    • No indexes

    • INDEX(first_name), INDEX(last_name) (two separate indexes)

    • "Index Merge Intersect"

    • INDEX(last_name, first_name) (a "compound" index)

    • INDEX(last_name, first_name, term) (a "covering" index)

    • Variants

    No indexes

    Well, I am fudging a little here. I have a PRIMARY KEY on seq, but that has no advantage on the query we are studying.

    Implementation Details

    First, let's describe how InnoDB stores and uses indexes.

    • The data and the PRIMARY KEY are "clustered" together in on BTree.

    • A BTree lookup is quite fast and efficient. For a million-row table there might be 3 levels of BTree, and the top two levels are probably cached.

    • Each secondary index is in another BTree, with the PRIMARY KEY at the leaf.

    • Fetching 'consecutive' (according to the index) items from a BTree is very efficient because they are stored consecutively.

    • For the sake of simplicity, we can count each BTree lookup as 1 unit of work, and ignore scans for consecutive items. This approximates the number of disk hits for a large table in a busy system.

    For MyISAM, the PRIMARY KEY is not stored with the data, so think of it as being a secondary key (over-simplified).

    INDEX(first_name), INDEX(last_name)

    The novice, once he learns about indexing, decides to index lots of columns, one at a time. But...

    MariaDB rarely uses more than one index at a time in a query. So, it will analyze the possible indexes.

    • first_name -- there are 2 possible rows (one BTree lookup, then scan consecutively)

    • last_name -- there are 2 possible rows Let's say it picks last_name. Here are the steps for doing the SELECT:

    1. Using INDEX(last_name), find 2 index entries with last_name = 'Johnson'.

    2. Get the PRIMARY KEY (implicitly added to each secondary index in InnoDB); get (17, 36).

    3. Reach into the data using seq = (17, 36) to get the rows for Andrew Johnson and Lyndon B. Johnson.

    4. Use the rest of the WHERE clause filter out all but the desired row.

    5. Deliver the answer (1865-1869).

    "Index Merge Intersect"

    OK, so you get really smart and decide that MariaDB should be smart enough to use both name indexes to get the answer. This is called "Intersect".

    1. Using INDEX(last_name), find 2 index entries with last_name = 'Johnson'; get (7, 17)

    2. Using INDEX(first_name), find 2 index entries with first_name = 'Andrew'; get (17, 36)

    3. "And" the two lists together (7,17) & (17,36) = (17)

    4. Reach into the data using seq = (17) to get the row for Andrew Johnson.

    5. Deliver the answer (1865-1869).

    The EXPLAIN fails to give the gory details of how many rows collected from each index, etc.

    INDEX(last_name, first_name)

    This is called a "compound" or "composite" index since it has more than one column.

    1. Drill down the BTree for the index to get to exactly the index row for Johnson+Andrew; get seq = (17).

    2. Reach into the data using seq = (17) to get the row for Andrew Johnson.

    3. Deliver the answer (1865-1869). This is much better. In fact this is usually the "best".

    "Covering": INDEX(last_name, first_name, term)

    Surprise! We can actually do a little better. A "Covering" index is one in which all of the fields of the SELECT are found in the index. It has the added bonus of not having to reach into the "data" to finish the task.

    1. Drill down the BTree for the index to get to exactly the index row for Johnson+Andrew; get seq = (17).

    2. Deliver the answer (1865-1869). The "data" BTree is not touched; this is an improvement over "compound".

    Everything is similar to using "compound", except for the addition of "Using index".

    Variants

    • What would happen if you shuffled the fields in the WHERE clause? Answer: The order of ANDed things does not matter.

    • What would happen if you shuffled the fields in the INDEX? Answer: It may make a huge difference. More in a minute.

    • What if there are extra fields on the end? Answer: Minimal harm; possibly a lot of good (eg, 'covering').

    • Reduncancy? That is, what if you have both of these: INDEX(a), INDEX(a,b)? Answer: Reduncy costs something on INSERTs; it is rarely useful for SELECTs.

    • Prefix? That is, INDEX(last_name(5). first_name(5)) Answer: Don't bother; it rarely helps, and often hurts. (The details are another topic.)

    More examples:

    Postlog

    Refreshed -- Oct, 2012; more links -- Nov 2016

    See also

    • Cookbook on designing the best index for a SELECT

    • Sheeri's discussing of Indexes

    • Slides on EXPLAIN

    • Mysql manual page on range accesses in composite indexes

    Rick James graciously allowed us to use this article in the documentation.

    Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: index1

    This page is licensed: CC BY-SA / Gnu FDL

    EXPLAIN

    Thread Groups in the Unix Implementation of the Thread Pool

    This article does not apply to the thread pool implementation on Windows. On Windows, MariaDB uses a native thread pool created with the CreateThreadpool APl, which has its own methods to distribute threads between CPUs.

    On Unix, the thread pool implementation uses objects called thread groups to divide up client connections into many independent sets of threads. The thread_pool_size system variable defines the number of thread groups on a system. Generally speaking, the goal of the thread group implementation is to have one running thread on each CPU on the system at a time. Therefore, the default value of the thread_pool_size system variable is auto-sized to the number of CPUs on the system.

    When setting the thread_pool_size system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting its value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher. It can be changed dynamically with SET GLOBAL. For example:

    It can also be set in a server option group in an option file prior to starting up the server. For example:

    If you do not want MariaDB to use all CPUs on the system for some reason, then you can set it to a lower value than the number of CPUs. For example, this would make sense if the MariaDB Server process is limited to certain CPUs with the utility on Linux.

    If you set the value to the number of CPUs and if you find that the CPUs are still underutilized, then try increasing the value.

    The system variable tends to have the most visible performance effect. It is roughly equivalent to the number of threads that can run at the same time. In this case, run means use CPU, rather than sleep or wait. If a client connection needs to sleep or wait for some reason, then it wakes up another client connection in the thread group before it does so.

    One reason that CPU underutilization may occur in rare cases is that the thread pool is not always informed when a thread is going to wait. For example, some waits, such as a page fault or a miss in the OS buffer cache, cannot be detected by MariaDB.

    Distributing Client Connections Between Thread Groups

    When a new client connection is created, its thread group is determined using the following calculation:

    The connection_id value in the above calculation is the same monotonically increasing number that you can use to identify connections in output or the table.

    This calculation should assign client connections to each thread group in a round-robin manner. In general, this should result in an even distribution of client connections among thread groups.

    Types of Threads

    Thread Group Threads

    Thread groups have two different kinds of threads: a listener thread and worker threads.

    • A thread group's worker threads actually perform work on behalf of client connections. A thread group can have many worker threads, but usually, only one will be actively running at a time. This is not always the case. For example, the thread group can become oversubscribed if the thread pool's timer thread detects that the thread group is stalled. This is explained more in the sections below.

    • A thread group's listener thread listens for I/O events and distributes work to the worker threads. If it detects that there is a request that needs to be worked on, then it can wake up a sleeping worker thread in the thread group, if any exist. If the listener thread is the only thread in the thread group, then it can also create a new worker thread. If there is only one request to handle, and if the system variable is not enabled, then the listener thread can also become a worker thread and handle the request itself. This helps decrease the overhead that may be introduced by excessively waking up sleeping worker threads and excessively creating new worker threads.

    Global Threads

    The thread pool has one global thread: a timer thread. The timer thread performs tasks, such as:

    • Checks each thread group for stalls.

    • Ensures that each thread group has a listener thread.

    Thread Creation

    A new thread is created in a thread group in the scenarios listed below.

    In all of the scenarios below, the thread pool implementation prefers to wake up a sleeping worker thread that already exists in the thread group, rather than to create a new thread.

    Worker Thread Creation by Listener Thread

    A thread group's listener thread can create a new worker thread when it has more client connection requests to distribute, but no pre-existing worker threads are available to work on the requests. This can help to ensure that the thread group always has enough threads to keep one worker thread active at a time.

    A thread group's listener thread creates a new worker thread if all of the following conditions are met:

    • The listener thread receives a client connection request that needs to be worked on.

    • There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads, so the listener thread should not become a worker thread.

    • There are no active worker threads in the thread group.

    • There are no sleeping worker threads in the thread group that the listener thread can wake up.

    Thread Creation by Worker Threads During Waits

    A thread group's worker thread can create a new worker thread when the thread has to wait on something, and the thread group has more client connection requests queued, but no pre-existing worker threads are available to work on them. This can help to ensure that the thread group always has enough threads to keep one worker thread active at a time. For most workloads, this tends to be the primary mechanism that creates new worker threads.

    A thread group's worker thread creates a new thread if all of the following conditions are met:

    • The worker thread has to wait on some request. For example, it might be waiting on disk I/O, or it might be waiting on a lock, or it might just be waiting for a query that called the function to finish.

    • There are no active worker threads in the thread group.

    • There are no sleeping worker threads in the thread group that the worker thread can wake up.

    • And one of the following conditions is also met:

    Listener Thread Creation by Timer Thread

    The thread pool's timer thread can create a new listener thread for a thread group when the thread group has more client connection requests that need to be distributed, but the thread group does not currently have a listener thread to distribute them. This can help to ensure that the thread group does not miss client connection requests because it has no listener thread.

    The thread pool's timer thread creates a new listener thread for a thread group if all of the following conditions are met:

    • The thread group has not handled any I/O events since the last check by the timer thread.

    • There is currently no listener thread in the thread group. For example, if the system variable is not enabled, then the thread group's listener thread can became a worker thread, so that it could handle some client connection request. In this case, the new thread can become the thread group's listener thread.

    • There are no sleeping worker threads in the thread group that the timer thread can wake up.

    • And one of the following conditions is also met:

    Worker Thread Creation by Timer Thread during Stalls

    The thread pool's timer thread can create a new worker thread for a thread group when the thread group is stalled. This can help to ensure that a long query can't monopole its thread group.

    The thread pool's timer thread creates a new worker thread for a thread group if all of the following conditions are met:

    • The timer thread thinks that the thread group is stalled. This means that the following conditions have been met:

      • There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads.

      • No client connection requests have been allowed to be dequeued to run since the last stall check by the timer thread.

    • There are no sleeping worker threads in the thread group that the timer thread can wake up.

    Thread Creation Throttling

    In some of the scenarios listed above, a thread is only created within a thread group if no new threads have been created for the thread group within the throttling interval. The throttling interval depends on the number of threads that are already in the thread group.

    In and later, thread creation is not throttled until a thread group has more than 1 + threads:

    Number of Threads in Thread Group
    Throttling Interval (milliseconds)

    The throttling factor is calculated like this (see for more information):

    Thread Group Stalls

    The thread pool has a feature that allows it to detect if a client connection is executing a long-running query that may be monopolizing its thread group. If a client connection were to monopolize its thread group, then that could prevent other client connections in the thread group from running their queries. In other words, the thread group would appear to be stalled.

    This stall detection feature is implemented by creating a timer thread that periodically checks if any of the thread groups are stalled. There is only a single timer thread for the entire thread pool. The system variable defines the number of milliseconds between each stall check performed by the timer thread. The default value is 500. It can be changed dynamically with . For example:

    It can also be set in a server in an prior to starting up the server. For example:

    The timer thread considers a thread group to be stalled if the following is true:

    • There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads.

    • No client connection requests have been allowed to be dequeued to run since the last stall check by the timer thread.

    This indicates that the one or more client connections currently using the active worker threads may be monopolizing the thread group, and preventing the queued client connections from performing work. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel.

    The system variable essentially defines the limit for what a "fast query" is. If a query takes longer than , then the thread pool is likely to think that it is too slow, and it will either wake up a sleeping worker thread or create a new worker thread to let another client connection in the thread group run a query in parallel.

    In general, changing the value of the system variable has the following effect:

    • Setting it to higher values can help avoid starting too many parallel threads if you expect a lot of client connections to execute long-running queries.

    • Setting it to lower values can help prevent deadlocks.

    Thread Group Oversubscription

    If the timer thread were to detect a stall in a thread group, then it would either wake up a sleeping worker thread or create a new worker thread in that thread group. At that point, the thread group would have multiple active worker threads. In other words, the thread group would be oversubscribed.

    You might expect that the thread pool would shutdown one of the worker threads when the stalled client connection finished what it was doing, so that the thread group would only have one active worker thread again. However, this does not always happen. Once a thread group is oversubscribed, the system variable defines the upper limit for when worker threads start shutting down after they finish work for client connections. The default value is 3. It can be changed dynamically with . For example:

    It can also be set in a server in an prior to starting up the server. For example:

    To clarify, the system variable does not play any part in the creation of new worker threads. The system variable is only used to determine how many worker threads should remain active in a thread group, once a thread group is already oversubscribed due to stalls.

    In general, the default value of 3 should be adequate for most users. Most users should not need to change the value of the system variable.

    This page is licensed: CC BY-SA / Gnu FDL

    Thread Pool System and Status Variables

    This article describes the system and status variables used by the MariaDB thread pool. For a full description, see Thread Pool in MariaDB.

    System variables

    extra_max_connections

    • Description: The number of connections on the .

      • See for more information.

    • Command line: --extra-max-connections=#

    • Scope: Global

    extra_port

    • Description: Extra port number to use for TCP connections in a one-thread-per-connection manner. If set to 0, then no extra port is used.

      • See for more information.

    • Command line: --extra-port=#

    thread_handling

    • Description: Determines how the server handles threads for client connections. In addition to threads for client connections, this also applies to certain internal server threads, such as . On Windows, if you would like to use the thread pool, then you do not need to do anything, because the default for the thread_handling system variable is already preset to pool-of-threads.

      • When the default one-thread-per-connection mode is enabled, the server uses one thread to handle each client connection.

      • When the pool-of-threads

    thread_pool_dedicated_listener

    • Description: If set to 1, then each group will have its own dedicated listener, and the listener thread will not pick up work items. As a result, the queueing time in the and the actual queue size in the table will be more exact, since IO requests are immediately dequeued from poll, without delay.

      • This system variable is only meaningful on Unix.

    • Command line: thread-pool-dedicated-listener={0|1}

    thread_pool_exact_stats

    • Description: If set to 1, provides better queueing time statistics by using a high precision timestamp, at a small performance cost, for the time when the connection was added to the queue. This timestamp helps calculate the queuing time shown in the table.

      • This system variable is only meaningful on Unix.

    • Command line: thread-pool-exact-stats={0|1}

    thread_pool_idle_timeout

    • Description: The number of seconds before an idle worker thread exits. The default value is 60. If there is currently no work to do, how long should an idle thread wait before exiting?

      • This system variable is only meaningful on Unix.

      • The system variable is comparable for Windows.

    thread_pool_max_threads

    • Description: The maximum number of threads in the . Once this limit is reached, no new threads will be created in most cases.

      • On Unix, in rare cases, the actual number of threads can slightly exceed this, because each needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks.

    • Scope:

    • Command line:

    thread_pool_min_threads

    • Description: Minimum number of threads in the . In bursty environments, after a period of inactivity, threads would normally be retired. When the next burst arrives, it would take time to reach the optimal level. Setting this value higher than the default would prevent thread retirement even if inactive.

      • This system variable is only meaningful on Windows.

      • The system variable is comparable for Unix.

    thread_pool_oversubscribe

    • Description: Determines how many worker threads in a thread group can remain active at the same time once a thread group is oversubscribed due to stalls. The default value is 3. Usually, a thread group only has one active worker thread at a time. However, the timer thread can add more active worker threads to a thread group if it detects a stall. There are trade-offs to consider when deciding whether to allow only one thread per CPU to run at a time, or whether to allow more than one thread per CPU to run at a time. Allowing only one thread per CPU means that the thread can have unrestricted access to the CPU while its running, but it also means that there is additional overhead from putting threads to sleep or waking them up more frequently. Allowing more than one thread per CPU means that the threads have to share the CPU, but it also means that there is less overhead from putting threads to sleep or waking them up.

      • See for more information.

    thread_pool_prio_kickup_timer

    • Description: Time in milliseconds before a dequeued low-priority statement is moved to the high-priority queue.

      • This system variable is only meaningful on Unix.

    • Command line: thread-pool-kickup-timer=#

    • Scope: Global

    thread_pool_priority

    • Description: priority. High-priority connections usually start executing earlier than low-priority. If set to 'auto' (the default), the actual priority (low or high) is determined by whether or not the connection is inside a transaction.

    • Command line: --thread-pool-priority=#

    • Scope: Global,Connection

    • Data Type: enum

    thread_pool_size

    • Description: The number of in the , which determines how many statements can execute simultaneously. The default value is the number of CPUs on the system. When setting this system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting this system variable's value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher.

      • See for more information.

      • This system variable is only meaningful on Unix.

    thread_pool_stall_limit

    • Description: The number of milliseconds between each stall check performed by the timer thread. The default value is 500. Stall detection is used to prevent a single client connection from monopolizing a thread group. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel. However, note that the timer thread will not create a new worker thread if the number of threads in the thread pool is already greater than or equal to the maximum defined by the variable, unless the thread group does not already have a listener thread.

      • See for more information.

    Status variables

    Threadpool_idle_threads

    • Description: Number of inactive threads in the . Threads become inactive for various reasons, such as by waiting for new work. However, an inactive thread is not necessarily one that has not been assigned work. Threads are also considered inactive if they are being blocked while waiting on disk I/O, or while waiting on a lock, etc.

      • This status variable is only meaningful on Unix.

    • Scope: Global, Session

    • Data Type:

    Threadpool_threads

    • Description: Number of threads in the . In rare cases, this can be slightly higher than , because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks.

    • Scope: Global, Session

    • Data Type: numeric

    See Also

    This page is licensed: CC BY-SA / Gnu FDL

    Big DELETEs

    The problem

    How to DELETE lots of rows from a large table? Here is an example of purging items older than 30 days:

    If there are millions of rows in the table, this statement may take minutes, maybe hours.

    Any suggestions on how to speed this up?

    Why it is a problem

    • will lock the table during the entire operation, thereby nothing else can be done with the table.

    • won't lock the table, but it will chew up a lot of resources, leading to sluggishness.

    • InnoDB has to write the undo information to its transaction logs; this significantly increases the I/O required.

    • , being asynchronous, will effectively be delayed (on Slaves) while the DELETE is running.

    InnoDB and undo

    To be ready for a crash, a transactional engine such as InnoDB will record what it is doing to a log file. To make that somewhat less costly, the log file is sequentially written. If the log files you have (there are usually 2) fill up because the delete is really big, then the undo information spills into the actual data blocks, leading to even more I/O.

    Deleting in chunks avoids some of this excess overhead.

    Limited benchmarking of total delete elapsed time show two observations:

    • Total delete time approximately doubles above some 'chunk' size (as opposed to below that threshold). I do not have a formula relating the log file size with the threshold cutoff.

    • Chunk size below several hundred rows is slower. This is probably because the overhead of starting/ending each chunk dominates the timing.

    Solutions

    • PARTITION -- Requires some careful setup, but is excellent for purging a time-base series.

    • DELETE in chunks -- Carefully walk through the table N rows at a time.

    PARTITION

    The idea here is to have a sliding window of . Let's say you need to purge news articles after 30 days. The "partition key" would be the (or ) that is to be used for purging, and the PARTITIONs would be "range". Every night, a cron job would come along and build a new partition for the next day, and drop the oldest partition.

    Dropping a partition is essentially instantaneous, much faster than deleting that many rows. However, you must design the table so that the entire partition can be dropped. That is, you cannot have some items living longer than others.

    PARTITION tables have a lot of restrictions, some are rather weird. You can either have no UNIQUE (or PRIMARY) key on the table, or every UNIQUE key must include the partition key. In this use case, the partition key is the datetime. It should not be the first part of the PRIMARY KEY (if you have a PRIMARY KEY).

    You can PARTITION InnoDB or MyISAM tables.

    Since two news articles could have the same timestamp, you cannot assume the partition key is sufficient for uniqueness of the PRIMARY KEY, so you need to find something else to help with that.

    Reference implementation for Partition maintenance

    Deleting in chunks

    Although the discussion in this section talks about DELETE, it can be used for any other "chunking", such as, say, UPDATE, or SELECT plus some complex processing.

    (This discussion applies to both MyISAM and InnoDB.)

    When deleting in chunks, be sure to avoid doing a table scan. The code below is good at that; it scans no more than 1001 rows in any one query. (The 1000 is tunable.)

    Assuming you have news articles that need to be purged, and you have a schema something like

    Then, this pseudo-code is a good way to delete the rows older than 30 days:

    Notes (Most of these caveats will be covered later):

    • It uses the PK instead of the secondary key. This gives much better locality of disk hits, especially for InnoDB.

    • You could (should?) do something to avoid walking through recent days but doing nothing. Caution -- the code for this could be costly.

    • The 1000 should be tweaked so that the DELETE usually takes under, say, one second.

    • No INDEX on ts is needed. (This helps INSERTs a little.)

    If there are big gaps in id values (and there will after the first purge), then

    That code works whether id is numeric or character, and it mostly works even if id is not UNIQUE. With a non-unique key, the risk is that you could be caught in a loop whenever @z==@a. That can be detected and fixed thus:

    The drawback is that there could be more than 1000 items with a single id. In most practical cases, that is unlikely.

    If you do not have a primary (or unique) key defined on the table, and you have an INDEX on ts, then consider

    This technique is NOT recommended because the LIMIT leads to a warning on replication about it being non-deterministic (discussed below).

    InnoDB chunking recommendation

    • Have a 'reasonable' size for .

    • Use AUTOCOMMIT=1 for the session doing the deletions.

    • Pick about 1000 rows for the chunk size.

    • Adjust the row count down if asynchronous replication (Statement Based) causes too much delay on the Slaves or hogs the table too much.

    Iterating through a compound key

    To perform the chunked deletes recommended above, you need a way to walk through the PRIMARY KEY. This can be difficult if the PK has more than one column in it.

    To efficiently to do compound 'greater than':

    Assume that you left off at ($g, $s) (and have handled that row):

    Addenda: The above AND/OR works well in older versions of MySQL; this works better in MariaDB and newer versions of MySQL:

    A caution about using @variables for strings. If, instead of '$g', you use @g, you need to be careful to make sure that @g has the same CHARACTER SET and COLLATION as Genus, else there could be a charset/collation conversion on the fly that prevents the use of the INDEX. Using the INDEX is vital for performance. It may require a COLLATE clause on SET NAMES and/or the @g in the SELECT.

    Reclaiming the disk space

    This is costly. (Switch to the PARTITION solution if practical.)

    MyISAM leaves gaps in the table (.MYD file); will reclaim the freed space after a big delete. But it may take a long time and lock the table.

    InnoDB is block-structured, organized in a BTree on the PRIMARY KEY. An isolated deleted row leaves a block less full. A lot of deleted rows can lead to coalescing of adjacent blocks. (Blocks are normally 16KB - see .)

    In InnoDB, there is no practical way to reclaim the freed space from ibdata1, other than to reuse the freed blocks eventually.

    The only option with is to dump ALL tables, remove ibdata*, restart, and reload. That is rarely worth the effort and time.

    InnoDB, even with innodb_file_per_table = 1, won't give space back to the OS, but at least it is only one table to rebuild with. In this case, something like this should work:

    You do need enough disk space for both copies. You must not write to the table during the process.

    Deleting more than half a table

    The following technique can be used for any combination of

    • Deleting a large portion of the table more efficiently

    • Add PARTITIONing

    • Converting to

    • Defragmenting

    This can be done by chunking, or (if practical) all at once:

    Notes:

    • You do need enough disk space for both copies.

    • You must not write to the table during the process. (Changes to Main may not be reflected in New.)

    Non-deterministic replication

    Any UPDATE, DELETE, etc with LIMIT that is replicated to slaves (via ) may cause inconsistencies between the Master and Slaves. This is because the actual order of the records discovered for updating/deleting may be different on the slave, thereby leading to a different subset being modified. To be safe, add ORDER BY to such statements. Moreover, be sure the ORDER BY is deterministic -- that is, the fields/expressions in the ORDER BY are unique.

    An example of an ORDER BY that does not quite work: Assume there are multiple rows for each 'date':

    Given that id is the PRIMARY KEY (or UNIQUE), this will be safe:

    Unfortunately, even with the ORDER BY, MySQL has a deficiency that leads to a bogus warning in mysqld.err. See Spurious "Statement is not safe to log in statement format." warnings

    Some of the above code avoids this spurious warning by doing

    That pair of statements guarantees no more than 1000 rows are touched, not the whole table.

    Replication and KILL

    If you KILL a DELETE (or any? query) on the master in the middle of its execution, what will be replicated?

    If it is InnoDB, the query should be rolled back. (Exceptions??)

    In MyISAM, rows are DELETEd as the statement is executed, and there is no provision for ROLLBACK. Some of the rows will be deleted, some won't. You probably have no clue of how much was deleted. In a single server, simply run the delete again. The delete is put into the binlog, but with error 1317. Since replication is supposed to keep the master and slave in sync, and since it has no clue of how to do that, replication stops and waits for manual intervention. In a HA (High Available) system using replication, this is a minor disaster. Meanwhile, you need to go to each slave(s) and verify that it is stuck for this reason, then do

    Then (presumably) re-executing the DELETE will finish the aborted task.

    (That is yet another reason to move all your tables .)

    SBR vs RBR; Galera

    TBD -- "Row Based Replication" may impact this discussion.

    Postlog

    The tips in this document apply to MySQL, MariaDB, and Percona.

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Charset Narrowing Optimization

    The Charset Narrowing optimization handles equality comparisons like:

    It enables the optimizer to construct ref access to utf8mb3_key_column based on this equality. The optimization supports comparisons of columns that use utf8mb3_general_ci to expressions that use utf8mb4_general_ci .

    The optimization was introduced in MariaDB 10.6.16, , MariaDB 10.11.6, , and , where it is OFF by default. From , it is ON by default.

    Description

    MariaDB supports both the UTF8MB3 and UTF8MB4 . It is possible to construct join queries that compare values in UTF8MB3 to UTF8MB4.

    Suppose, we have the table 'users that uses UTF8MB4:

    and table orders that uses UTF8MB3:

    One can join users to orders on user_name:

    Internally the optimizer will handle the equality by converting the UTF8MB3 value into UTF8MB4 and then doing the comparison. One can see the call to CONVERT in EXPLAIN FORMAT=JSON output or Optimizer Trace:

    This produces the expected result but the query optimizer is not able to use the index over orders.user_name_mb3 to find matches for values of users.user_name_mb4.

    The EXPLAIN of the above query looks like this:

    The Charset Narrowing optimization enables the optimizer to perform the comparison between UTF8MB3 and UTF8MB4 values by "narrowing" the value in UTF8MB4 to UTF8MB3. The CONVERT call is no longer needed, and the optimizer is able to use the equality to construct ref access:

    Controlling the Optimization

    The optimization is controlled by an flag. Specify:

    to enable the optimization.

    References

    • : utf8mb3_key_col=utf8mb4_value cannot be used for ref access

    • Blog post:

    This page is licensed: CC BY-SA / Gnu FDL

    Pivoting in MariaDB

    The problem

    You want to "pivot" the data so that a linear list of values with two keys becomes a spreadsheet-like array. See examples, below.

    A solution

    The best solution is probably to do it in some form of client code (PHP, etc). MySQL and MariaDB do not have a syntax for SELECT that will do the work for you. The code provided here uses a

    CREATE TABLE b(for_key INT REFERENCES a(not_key));
    [CONSTRAINT [symbol]] FOREIGN KEY
        [index_name] (index_col_name, ...)
        REFERENCES tbl_name (index_col_name,...)
        [ON DELETE reference_option]
        [ON UPDATE reference_option]
    
    reference_option:
        RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULT
    CREATE TABLE author (
      id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      name VARCHAR(100) NOT NULL
    ) ENGINE = InnoDB;
    
    CREATE TABLE book (
      id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      title VARCHAR(200) NOT NULL,
      author_id SMALLINT UNSIGNED NOT NULL,
      CONSTRAINT `fk_book_author`
        FOREIGN KEY (author_id) REFERENCES author (id)
        ON DELETE CASCADE
        ON UPDATE RESTRICT
    ) ENGINE = InnoDB;
    INSERT INTO book (title, author_id) VALUES ('Necronomicon', 1);
    ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
     (`test`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`) 
      REFERENCES `author` (`id`) ON DELETE CASCADE)
    INSERT INTO author (name) VALUES ('Abdul Alhazred');
    INSERT INTO book (title, author_id) VALUES ('Necronomicon', LAST_INSERT_ID());
    
    INSERT INTO author (name) VALUES ('H.P. Lovecraft');
    INSERT INTO book (title, author_id) VALUES
      ('The call of Cthulhu', LAST_INSERT_ID()),
      ('The colour out of space', LAST_INSERT_ID());
    DELETE FROM author WHERE name = 'H.P. Lovecraft';
    
    SELECT * FROM book;
    +----+--------------+-----------+
    | id | title        | author_id |
    +----+--------------+-----------+
    |  3 | Necronomicon |         1 |
    +----+--------------+-----------+
    UPDATE author SET id = 10 WHERE id = 1;
    ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails 
     (`test`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`) 
      REFERENCES `author` (`id`) ON DELETE CASCADE)
    CREATE TABLE a(a_key INT PRIMARY KEY, not_key INT);
    
    CREATE TABLE b(for_key INT REFERENCES a(not_key));
    ERROR 1005 (HY000): Can't create table `test`.`b` 
      (errno: 150 "Foreign key constraint is incorrectly formed")
    
    CREATE TABLE c(for_key INT REFERENCES a(a_key));
    
    SHOW CREATE TABLE c;
    +-------+----------------------------------------------------------------------------------+
    | Table | Create Table                                                                     |
    +-------+----------------------------------------------------------------------------------+
    | c     | CREATE TABLE `c` (
      `for_key` INT(11) DEFAULT NULL,
      KEY `for_key` (`for_key`),
      CONSTRAINT `c_ibfk_1` FOREIGN KEY (`for_key`) REFERENCES `a` (`a_key`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
    +-------+----------------------------------------------------------------------------------+
    
    INSERT INTO a VALUES (1,10);
    Query OK, 1 row affected (0.004 sec)
    
    INSERT INTO c VALUES (10);
    ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails 
      (`test`.`c`, CONSTRAINT `c_ibfk_1` FOREIGN KEY (`for_key`) REFERENCES `a` (`a_key`))
    
    INSERT INTO c VALUES (1);
    Query OK, 1 row affected (0.004 sec)
    
    SELECT * FROM c;
    +---------+
    | for_key |
    +---------+
    |       1 |
    +---------+
    [mysqld]
    ...
    innodb-defragment=1
    SET @@global.innodb_file_per_table = 1;
    SET @@global.innodb_defragment_n_pages = 32;
    SET @@global.innodb_defragment_fill_factor = 0.95;
    CREATE TABLE tb_defragment (
    pk1 BIGINT(20) NOT NULL,
    pk2 BIGINT(20) NOT NULL,
    fd4 TEXT,
    fd5 VARCHAR(50) DEFAULT NULL,
    PRIMARY KEY (pk1),
    KEY ix1 (pk2)
    ) ENGINE=InnoDB;
     
    DELIMITER //
    CREATE PROCEDURE innodb_insert_proc (repeat_count INT)
    BEGIN
      DECLARE current_num INT;
      SET current_num = 0;
      WHILE current_num < repeat_count DO
        INSERT INTO tb_defragment VALUES (current_num, 1, REPEAT('Abcdefg', 20), REPEAT('12345',5));
        INSERT INTO tb_defragment VALUES (current_num+1, 2, REPEAT('HIJKLM', 20), REPEAT('67890',5));
        INSERT INTO tb_defragment VALUES (current_num+2, 3, REPEAT('HIJKLM', 20), REPEAT('67890',5));
        INSERT INTO tb_defragment VALUES (current_num+3, 4, REPEAT('HIJKLM', 20), REPEAT('67890',5));
        SET current_num = current_num + 4;
      END WHILE;
    END//
    DELIMITER ;
    COMMIT;
     
    SET autocommit=0;
    CALL innodb_insert_proc(50000);
    COMMIT;
    SET autocommit=1;
    SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name = 'PRIMARY';
    Value
    313
     
    SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name = 'ix1';
    Value
    72
     
    SELECT COUNT(stat_value) FROM mysql.innodb_index_stats 
      WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_pages_freed');
    COUNT(stat_value)
    0
     
    SELECT COUNT(stat_value) FROM mysql.innodb_index_stats 
      WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_page_split');
    COUNT(stat_value)
    0
     
    SELECT COUNT(stat_value) FROM mysql.innodb_index_stats 
      WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_leaf_pages_defrag');
    COUNT(stat_value)
    0
     
    SELECT table_name, data_free/1024/1024 AS data_free_MB, table_rows FROM information_schema.tables 
      WHERE engine LIKE 'InnoDB' AND table_name LIKE '%tb_defragment%';
    TABLE_NAME data_free_MB table_rows
    tb_defragment 4.00000000 50051
     
    SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'PRIMARY';
    TABLE_NAME index_name SUM(number_records) SUM(data_size)
    `test`.`tb_defragment` PRIMARY 25873 4739939
     
    SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'ix1';
    TABLE_NAME index_name SUM(number_records) SUM(data_size)
    `test`.`tb_defragment` ix1 50071 1051775
    DELETE FROM tb_defragment WHERE pk2 BETWEEN 2 AND 4;
     
    OPTIMIZE TABLE tb_defragment;
    TABLE	Op	Msg_type	Msg_text
    test.tb_defragment	OPTIMIZE	status	OK
    SHOW status LIKE '%innodb_def%';
    Variable_name	Value
    Innodb_defragment_compression_failures	0
    Innodb_defragment_failures	1
    Innodb_defragment_count	4
    SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name = 'PRIMARY';
    Value
    0
     
    SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name = 'ix1';
    Value
    0
     
    SELECT COUNT(stat_value) FROM mysql.innodb_index_stats 
      WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_pages_freed');
    COUNT(stat_value)
    2
     
    SELECT COUNT(stat_value) FROM mysql.innodb_index_stats 
      WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_page_split');
    COUNT(stat_value)
    2
     
    SELECT COUNT(stat_value) FROM mysql.innodb_index_stats 
      WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_leaf_pages_defrag');
    COUNT(stat_value)
    2
     
    SELECT table_name, data_free/1024/1024 AS data_free_MB, table_rows FROM information_schema.tables 
      WHERE engine LIKE 'InnoDB';
    TABLE_NAME data_free_MB table_rows
    innodb_index_stats 0.00000000 8
    innodb_table_stats 0.00000000 0
    tb_defragment 4.00000000 12431
     
    SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'PRIMARY';
    TABLE_NAME index_name SUM(number_records) SUM(data_size)
    `test`.`tb_defragment` PRIMARY 690 102145
     
    SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page 
      WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'ix1';
    TABLE_NAME index_name SUM(number_records) SUM(data_size)
    `test`.`tb_defragment` ix1 5295 111263
    ALTER TABLE table_name DISABLE KEYS;
    BEGIN;
    ... inserting data WITH INSERT OR LOAD DATA ....
    COMMIT;
    ALTER TABLE table_name ENABLE KEYS;
    SET @@session.unique_checks = 0;
    SET @@session.foreign_key_checks = 0;
    SET @@global.innodb_autoinc_lock_mode = 2;
    LOAD DATA INFILE 'file_name' INTO TABLE table_name;
    LOAD DATA LOCAL INFILE 'file_name' INTO TABLE table_name;
    mariadb-import --use-threads=10 database text-file-name [text-file-name...]
    BEGIN;
    INSERT ...
    INSERT ...
    END;
    BEGIN;
    INSERT ...
    INSERT ...
    END;
    ...
    INSERT INTO table_name VALUES(1,"row 1"),(2, "row 2"),...;
    INSERT INTO table_name_1 (auto_increment_key, data) VALUES (NULL,"row 1");
    INSERT INTO table_name_2 (auto_increment, reference, data) VALUES (NULL, LAST_INSERT_ID(), "row 2");
    delimiter ;;
    SELECT 1; SELECT 2;;
    delimiter ;
    EXPLAIN SELECT * FROM tbl WHERE tbl.key1 BETWEEN 1000 AND 2000;
    +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
    | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
    |  1 | SIMPLE      | tbl   | range | key1          | key1 | 5       | NULL |  960 | Using index condition |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
    SET optimizer_switch='mrr=ON';
    Query OK, 0 rows affected (0.06 sec)
    
    EXPLAIN SELECT * FROM tbl WHERE tbl.key1 BETWEEN 1000 AND 2000;
    +----+-------------+-------+-------+---------------+------+---------+------+------+-------------------------------------------+
    | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                     |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-------------------------------------------+
    |  1 | SIMPLE      | tbl   | range | key1          | key1 | 5       | NULL |  960 | Using index condition; Rowid-ordered scan |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-------------------------------------------+
    1 row in set (0.03 sec)
    EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1;
    +----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref          | rows | Extra       |
    +----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
    |  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL         | 1000 | Using where |
    |  1 | SIMPLE      | t2    | ref  | key1          | key1 | 5       | test.t1.col1 |    1 |             |
    +----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
    2 rows in set (0.00 sec)
    SET optimizer_switch='mrr=ON';
    Query OK, 0 rows affected (0.06 sec)
    
    SET join_cache_level=6;
    Query OK, 0 rows affected (0.00 sec)
    
    EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1;
    +----+-------------+-------+------+---------------+------+---------+--------------+------+--------------------------------------------------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref          | rows | Extra                                                  |
    +----+-------------+-------+------+---------------+------+---------+--------------+------+--------------------------------------------------------+
    |  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL         | 1000 | Using where                                            |
    |  1 | SIMPLE      | t2    | ref  | key1          | key1 | 5       | test.t1.col1 |    1 | Using join buffer (flat, BKA join); Rowid-ordered scan |
    +----+-------------+-------+------+---------------+------+---------+--------------+------+--------------------------------------------------------+
    2 rows in set (0.00 sec)
    EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1;
    +----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref          | rows | Extra       |
    +----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
    |  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL         | 1000 | Using where |
    |  1 | SIMPLE      | t2    | ref  | key1          | key1 | 5       | test.t1.col1 |    1 |             |
    +----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
    SET optimizer_switch='mrr=ON,mrr_sort_keys=ON';
    Query OK, 0 rows affected (0.00 sec)
    
    SET join_cache_level=6;
    Query OK, 0 rows affected (0.02 sec)
    EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            TABLE: t1
             type: ALL
    possible_keys: a
              KEY: NULL
          key_len: NULL
              ref: NULL
             ROWS: 1000
            Extra: USING WHERE
    *************************** 2. row ***************************
               id: 1
      select_type: SIMPLE
            TABLE: t2
             type: ref
    possible_keys: key1
              KEY: key1
          key_len: 5
              ref: test.t1.col1
             ROWS: 1
            Extra: USING JOIN buffer (flat, BKA JOIN); KEY-ordered Rowid-ordered scan
    2 rows in set (0.00 sec)
    +-----+------------+----------------+-----------+
    | seq | last_name  | first_name     | term      |
    +-----+------------+----------------+-----------+
    |   1 | Washington | George         | 1789-1797 |
    |   2 | Adams      | John           | 1797-1801 |
    ...
    |   7 | Jackson    | Andrew         | 1829-1837 |
    ...
    |  17 | Johnson    | Andrew         | 1865-1869 |
    ...
    |  36 | Johnson    | Lyndon B.      | 1963-1969 |
    ...
    SELECT  term
            FROM  Presidents
            WHERE  last_name = 'Johnson'
              AND  first_name = 'Andrew';
    SHOW CREATE TABLE Presidents \G
    CREATE TABLE `presidents` (
      `seq` TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT,
      `last_name` VARCHAR(30) NOT NULL,
      `first_name` VARCHAR(30) NOT NULL,
      `term` VARCHAR(9) NOT NULL,
      PRIMARY KEY (`seq`)
    ) ENGINE=InnoDB AUTO_INCREMENT=45 DEFAULT CHARSET=utf8
    
    EXPLAIN  SELECT  term
       FROM  Presidents
       WHERE  last_name = 'Johnson'
       AND  first_name = 'Andrew';
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    | id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    |  1 | SIMPLE      | Presidents | ALL  | NULL          | NULL | NULL    | NULL |   44 | Using where |
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    
    # Or, using the other form of display:  EXPLAIN ... \G
               id: 1
      select_type: SIMPLE
            table: Presidents
             type: ALL        <-- Implies table scan
    possible_keys: NULL
              key: NULL       <-- Implies that no index is useful, hence table scan
          key_len: NULL
              ref: NULL
             rows: 44         <-- That's about how many rows in the table, so table scan
            Extra: Using where
    EXPLAIN  SELECT  term
      FROM  Presidents
      WHERE  last_name = 'Johnson'
      AND  first_name = 'Andrew'  \G
      select_type: SIMPLE
            table: Presidents
             type: ref
    possible_keys: last_name, first_name
              key: last_name
          key_len: 92                 <-- VARCHAR(30) utf8 may need 2+3*30 bytes
              ref: const
             rows: 2                  <-- Two 'Johnson's
            Extra: Using where
    id: 1
      select_type: SIMPLE
            table: Presidents
             type: index_merge
    possible_keys: first_name,last_name
              key: first_name,last_name
          key_len: 92,92
              ref: NULL
             rows: 1
            Extra: Using intersect(first_name,last_name); Using where
    ALTER TABLE Presidents
            (DROP old indexes AND...)
            ADD INDEX compound(last_name, first_name);
    
               id: 1
      select_type: SIMPLE
            table: Presidents
             type: ref
    possible_keys: compound
              key: compound
          key_len: 184             <-- The length of both fields
              ref: const,const     <-- The WHERE clause gave constants for both
             rows: 1               <-- Goodie!  It homed in on the one row.
            Extra: Using where
    ... ADD INDEX covering(last_name, first_name, term);
    
               id: 1
      select_type: SIMPLE
            table: Presidents
             type: ref
    possible_keys: covering
              key: covering
          key_len: 184
              ref: const,const
             rows: 1
            Extra: Using where; Using index   <-- Note
    INDEX(last, first)
        ... WHERE last = '...' -- good (even though `first` is unused)
        ... WHERE first = '...' -- index is useless
    
        INDEX(first, last), INDEX(last, first)
        ... WHERE first = '...' -- 1st index is used
        ... WHERE last = '...' -- 2nd index is used
        ... WHERE first = '...' AND last = '...' -- either could be used equally well
    
        INDEX(last, first)
        Both of these are handled by that one INDEX:
        ... WHERE last = '...'
        ... WHERE last = '...' AND first = '...'
    
        INDEX(last), INDEX(last, first)
        In light of the above example, don't bother including INDEX(last).
    SET GLOBAL thread_pool_size=32;
    [mariadb]
    ..
    thread_handling=pool-of-threads
    thread_pool_size=32
    DELETE FROM tbl WHERE 
      ts < CURRENT_DATE() - INTERVAL 30 DAY
    utf8mb3_key_column=utf8mb4_expression
    Overhead of Composite Indexes
    Size and other limits on Indexes

    If your PRIMARY KEY is compound, the code gets messier.

  • This code will not work without a numeric PRIMARY or UNIQUE key.

  • Read on, we'll develop messier code to deal with most of these caveats.

  • MyISAM
    InnoDB
    Replication
    partitions
    datetime
    timestamp
    MariaDB docs on PARTITION
    innodb_log_file_size
    OPTIMIZE TABLE
    innodb_page_size
    innodb_file_per_table = 0
    innodb_file_per_table = ON
    Statement Based Replication
    from MyISAM to InnoDB
    Rick James' site
    deletebig
    to generate code to pivot the data, and then runs the code.

    You can edit the SQL generated by the stored procedure to tweak the output in a variety of ways. Or you can tweak the stored procedure to generate what you would prefer.

    Reference code for solution

    'Source' this into the mysql commandline tool:

    Then do a CALL, like in the examples, below.

    Variants

    I thought about having several extra options for variations, but decided that would be too messy. Instead, here are instructions for implementing the variations, either by capturing the SELECT that was output by the Stored Procedure, or by modifying the SP, itself.

    • The data is strings (not numeric) -- Remove "SUM" (but keep the expression); remove the SUM...AS TOTAL line.

    • If you want blank output instead of 0 -- Currently the code says "SUM(IF(... 0))"; change the 0 to NULL, then wrap the SUM: IFNULL(SUM(...), ''). Note that this will distinguish between a zero total (showing '0') and no data (blank).

    • Fancier output -- Use PHP/VB/Java/etc.

    • No Totals at the bottom -- Remove the WITH ROLLUP line from the SELECT.

    • No Total for each row -- Remove the SUM...AS Total line from the SELECT.

    • Change the order of the columns -- Modify the ORDER BY 1 ('1' meaning first column) in the SELECT DISTINCT in the SP.

    • Example: ORDER BY FIND_IN_SET(DAYOFWEEK(...), 'Sun,Mon,Tue,Wed,Thu,Fri,Sat')

    Notes about "base_cols":

    • Multiple columns on the left, such as an ID and its meaning -- This is already handled by allowing base_cols to be a commalist like 'id, meaning'

    • You cannot call the SP with "foo AS 'blah'" in hopes of changing the labels, but you could edit the SELECT to achieve that goal.

    Notes about the "Totals":

    • If "base_cols" is more than one column, WITH ROLLUP will be subtotals as well as a grand total.

    • NULL shows up in the Totals row in the "base_cols" column; this can be changed via something like IFNULL(..., 'Totals').

    Example 1 - Population vs Latitude in US

    Notice how Alaska (AK) has populations in high latitudes and Hawaii (HI) in low latitudes.

    Example 2 - Home Solar Power Generation

    This give the power (KWh) generated by hour and month for 2012.

    Other variations made the math go wrong. (Note that there is no CAST to FLOAT.)

    While I was at it, I gave an alias to change "MONTH(ts)" to just "Month".

    So, I edited the SQL to this and ran it:

    -- Which gave cleaner output:

    Midday in the summer is the best time for solar panels, as you would expect. 1-2pm in July was the best.

    Postlog

    Posted, Feb. 2015

    See Also

    • Brawley's notes Rick James graciously allowed us to use this article in the documentation.Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: pivot

    This page is licensed: CC BY-SA / Gnu FDL

    stored procedure
    CREATE TABLE tbl
          id INT UNSIGNED NOT NULL AUTO_INCREMENT,
          ts TIMESTAMP,
          ...
          PRIMARY KEY(id)
    @a = 0
       LOOP
          DELETE FROM tbl
             WHERE id BETWEEN @a AND @a+999
               AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
          SET @a = @a + 1000
          sleep 1  -- be a nice guy
       UNTIL end of table
    @a = SELECT MIN(id) FROM tbl
       LOOP
          SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1
          IF @z IS NULL
             EXIT LOOP  -- last chunk
          DELETE FROM tbl
             WHERE id >= @a
               AND id <  @z
               AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
          SET @a = @z
          sleep 1  -- be a nice guy, especially in replication
       ENDLOOP
       # Last chunk:
       DELETE FROM tbl
          WHERE id >= @a
            AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    ...
          SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1
          IF @z == @a
             SELECT @z := id FROM tbl WHERE id > @a ORDER BY id LIMIT 1
       ...
    LOOP
          DELETE FROM tbl
             WHERE ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
             ORDER BY ts   -- to use the index, and to make it deterministic
             LIMIT 1000
       UNTIL no rows deleted
    INDEX(Genus, species)
       SELECT/DELETE ...
          WHERE Genus >= '$g' AND ( species  > '$s' OR Genus > '$g' )
          ORDER BY Genus, species
          LIMIT ...
    WHERE ( Genus = '$g' AND species  > '$s' ) OR Genus > '$g' )
    CREATE TABLE new LIKE main;
       INSERT INTO new SELECT * FROM main;  -- This could take a long time
       RENAME TABLE main TO old, new TO main;   -- Atomic swap
       DROP TABLE old;   -- Space freed up here
    -- Optional:  SET GLOBAL innodb_file_per_table = ON;
       CREATE TABLE New LIKE Main;
       -- Optional:  ALTER TABLE New ADD PARTITION BY RANGE ...;
       -- Do this INSERT..SELECT all at once, or with chunking:
       INSERT INTO New
          SELECT * FROM Main
             WHERE ...;  -- just the rows you want to keep
       RENAME TABLE main TO Old, New TO Main;
       DROP TABLE Old;   -- Space freed up here
    DELETE * FROM tbl ORDER BY date LIMIT 111
    DELETE * FROM tbl ORDER BY date, id LIMIT 111
    SELECT @z := ... LIMIT 1000,1;  -- not replicated
       DELETE ... BETWEEN @a AND @z;   -- deterministic
    SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
       START SLAVE;
    DELIMITER //
    DROP   PROCEDURE IF EXISTS Pivot //
    CREATE PROCEDURE Pivot(
        IN tbl_name VARCHAR(99),       -- table name (or db.tbl)
        IN base_cols VARCHAR(99),      -- column(s) on the left, separated by commas
        IN pivot_col VARCHAR(64),      -- name of column to put across the top
        IN tally_col VARCHAR(64),      -- name of column to SUM up
        IN where_clause VARCHAR(99),   -- empty string or "WHERE ..."
        IN order_by VARCHAR(99)        -- empty string or "ORDER BY ..."; usually the base_cols
        )
        DETERMINISTIC
        SQL SECURITY INVOKER
    BEGIN
        -- Find the distinct values
        -- Build the SUM()s
        SET @subq = CONCAT('SELECT DISTINCT ', pivot_col, ' AS val ',
                        ' FROM ', tbl_name, ' ', where_clause, ' ORDER BY 1');
        -- select @subq;
    
        SET @cc1 = "CONCAT('SUM(IF(&p = ', &v, ', &t, 0)) AS ', &v)";
        SET @cc2 = REPLACE(@cc1, '&p', pivot_col);
        SET @cc3 = REPLACE(@cc2, '&t', tally_col);
        -- select @cc2, @cc3;
        SET @qval = CONCAT("'\"', val, '\"'");
        -- select @qval;
        SET @cc4 = REPLACE(@cc3, '&v', @qval);
        -- select @cc4;
    
        SET SESSION group_concat_max_len = 10000;   -- just in case
        SET @stmt = CONCAT(
                'SELECT  GROUP_CONCAT(', @cc4, ' SEPARATOR ",\n")  INTO @sums',
                ' FROM ( ', @subq, ' ) AS top');
         select @stmt;
        PREPARE _sql FROM @stmt;
        EXECUTE _sql;                      -- Intermediate step: build SQL for columns
        DEALLOCATE PREPARE _sql;
        -- Construct the query and perform it
        SET @stmt2 = CONCAT(
                'SELECT ',
                    base_cols, ',\n',
                    @sums,
                    ',\n SUM(', tally_col, ') AS Total'
                '\n FROM ', tbl_name, ' ',
                where_clause,
                ' GROUP BY ', base_cols,
                '\n WITH ROLLUP',
                '\n', order_by
            );
        select @stmt2;                    -- The statement that generates the result
        PREPARE _sql FROM @stmt2;
        EXECUTE _sql;                     -- The resulting pivot table ouput
        DEALLOCATE PREPARE _sql;
        -- For debugging / tweaking, SELECT the various @variables after CALLing.
    END;
    //
    DELIMITER ;
    -- Sample input:
    +-------+----------------------+---------+------------+
    | state | city                 | lat     | population |
    +-------+----------------------+---------+------------+
    | AK    | Anchorage            | 61.2181 |     276263 |
    | AK    | Juneau               | 58.3019 |      31796 |
    | WA    | Monroe               | 47.8556 |      15554 |
    | WA    | Spanaway             | 47.1042 |      25045 |
    | PR    | Arecibo              | 18.4744 |      49189 |
    | MT    | Kalispell            | 48.1958 |      18018 |
    | AL    | Anniston             | 33.6597 |      23423 |
    | AL    | Scottsboro           | 34.6722 |      14737 |
    | HI    | Kaneohe              | 21.4181 |      35424 |
    | PR    | Candelaria           | 18.4061 |      17632 |
    ...
    
    -- Call the Stored Procedure:
    CALL Pivot('World.US', 'state', '5*FLOOR(lat/5)', 'population', '', '');
    
    -- SQL generated by the SP:
    SELECT state,
    SUM(IF(5*FLOOR(lat/5) = "15", population, 0)) AS "15",
    SUM(IF(5*FLOOR(lat/5) = "20", population, 0)) AS "20",
    SUM(IF(5*FLOOR(lat/5) = "25", population, 0)) AS "25",
    SUM(IF(5*FLOOR(lat/5) = "30", population, 0)) AS "30",
    SUM(IF(5*FLOOR(lat/5) = "35", population, 0)) AS "35",
    SUM(IF(5*FLOOR(lat/5) = "40", population, 0)) AS "40",
    SUM(IF(5*FLOOR(lat/5) = "45", population, 0)) AS "45",
    SUM(IF(5*FLOOR(lat/5) = "55", population, 0)) AS "55",
    SUM(IF(5*FLOOR(lat/5) = "60", population, 0)) AS "60",
    SUM(IF(5*FLOOR(lat/5) = "70", population, 0)) AS "70",
     SUM(population) AS Total
     FROM World.US  GROUP BY state
     WITH ROLLUP
    
    -- Output from that SQL (also comes out of the SP):
    +-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
    | state | 15      | 20     | 25       | 30       | 35       | 40       | 45      | 55    | 60     | 70   | Total     |
    +-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
    | AK    |       0 |      0 |        0 |        0 |        0 |        0 |       0 | 60607 | 360765 | 4336 |    425708 |
    | AL    |       0 |      0 |        0 |  1995225 |        0 |        0 |       0 |     0 |      0 |    0 |   1995225 |
    | AR    |       0 |      0 |        0 |   595537 |   617361 |        0 |       0 |     0 |      0 |    0 |   1212898 |
    | AZ    |       0 |      0 |        0 |  4708346 |   129989 |        0 |       0 |     0 |      0 |    0 |   4838335 |
    ...
    | FL    |       0 |  34706 |  9096223 |  1440916 |        0 |        0 |       0 |     0 |      0 |    0 |  10571845 |
    | GA    |       0 |      0 |        0 |  2823939 |        0 |        0 |       0 |     0 |      0 |    0 |   2823939 |
    | HI    |   43050 | 752983 |        0 |        0 |        0 |        0 |       0 |     0 |      0 |    0 |    796033 |
    ...
    | WY    |       0 |      0 |        0 |        0 |        0 |   277480 |       0 |     0 |      0 |    0 |    277480 |
    | NULL  | 1792991 | 787689 | 16227033 | 44213344 | 47460670 | 61110822 | 7105143 | 60607 | 360765 | 4336 | 179123400 |
    +-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
    -- Sample input:
    +---------------------+------+
    | ts                  | enwh |
    +---------------------+------+
    | 2012-06-06 11:00:00 |  523 |
    | 2012-06-06 11:05:00 |  526 |
    | 2012-06-06 11:10:00 |  529 |
    | 2012-06-06 11:15:00 |  533 |
    | 2012-06-06 11:20:00 |  537 |
    | 2012-06-06 11:25:00 |  540 |
    | 2012-06-06 11:30:00 |  542 |
    | 2012-06-06 11:35:00 |  543 |
    Note that it is a reading in watts for each 5 minutes.
    So, summing is needed to get the breakdown by month and hour.
    
    -- Invoke the SP:
    CALL Pivot('details',    -- Table
               'MONTH(ts)',  -- `base_cols`, to put on left; SUM up over the month
               'HOUR(ts)',   -- `pivot_col` to put across the top; SUM up entries across the hour
               'enwh/1000',  -- The data -- watts converted to KWh
               "WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year",  -- Limit to one year
               '');          -- assumes that the months stay in order
    
    -- The SQL generated:
    SELECT MONTH(ts),
    SUM(IF(HOUR(ts) = "5", enwh/1000, 0)) AS "5",
    SUM(IF(HOUR(ts) = "6", enwh/1000, 0)) AS "6",
    SUM(IF(HOUR(ts) = "7", enwh/1000, 0)) AS "7",
    SUM(IF(HOUR(ts) = "8", enwh/1000, 0)) AS "8",
    SUM(IF(HOUR(ts) = "9", enwh/1000, 0)) AS "9",
    SUM(IF(HOUR(ts) = "10", enwh/1000, 0)) AS "10",
    SUM(IF(HOUR(ts) = "11", enwh/1000, 0)) AS "11",
    SUM(IF(HOUR(ts) = "12", enwh/1000, 0)) AS "12",
    SUM(IF(HOUR(ts) = "13", enwh/1000, 0)) AS "13",
    SUM(IF(HOUR(ts) = "14", enwh/1000, 0)) AS "14",
    SUM(IF(HOUR(ts) = "15", enwh/1000, 0)) AS "15",
    SUM(IF(HOUR(ts) = "16", enwh/1000, 0)) AS "16",
    SUM(IF(HOUR(ts) = "17", enwh/1000, 0)) AS "17",
    SUM(IF(HOUR(ts) = "18", enwh/1000, 0)) AS "18",
    SUM(IF(HOUR(ts) = "19", enwh/1000, 0)) AS "19",
    SUM(IF(HOUR(ts) = "20", enwh/1000, 0)) AS "20",
     SUM(enwh/1000) AS Total
     FROM details WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year GROUP BY MONTH(ts)
     WITH ROLLUP
    
    -- That generated decimal places that I did like:
    | MONTH(ts) | 5      | 6       | 7        | 8        | 9         | 10        | 11        | 12        | 13        | 14
         | 15        | 16       | 17       | 18       | 19      | 20     | Total      |
    +-----------+--------+---------+----------+----------+-----------+-----------+-----------+-----------+-----------+------
    -----+-----------+----------+----------+----------+---------+--------+------------+
    |         1 | 0.0000 |  0.0000 |   1.8510 |  21.1620 |   52.3190 |   73.0420 |   89.3220 |   97.0190 |   88.9720 |   75.
    4970 |   50.9270 |  12.5130 |   0.5990 |   0.0000 |  0.0000 | 0.0000 |   563.2230 |
    |         2 | 0.0000 |  0.0460 |   5.9560 |  35.6330 |   72.4710 |   96.5130 |  112.7770 |  126.0850 |  117.1540 |   96.
    7160 |   72.5900 |  33.6230 |   4.7650 |   0.0040 |  0.0000 | 0.0000 |   774.3330 |
    SELECT MONTH(ts) AS 'Month',
    ROUND(SUM(IF(HOUR(ts) = "5", enwh, 0))/1000) AS "5",
    ...
    ROUND(SUM(IF(HOUR(ts) = "20", enwh, 0))/1000) AS "20",
     ROUND(SUM(enwh)/1000) AS Total
     FROM details WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 YEAR
     GROUP BY MONTH(ts)
     WITH ROLLUP;
    +-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+
    | Month | 5    | 6    | 7    | 8    | 9    | 10   | 11   | 12   | 13   | 14   | 15   | 16   | 17   | 18   | 19   | 20   | Total |
    +-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+
    |     1 |    0 |    0 |    2 |   21 |   52 |   73 |   89 |   97 |   89 |   75 |   51 |   13 |    1 |    0 |    0 |    0 |   563 |
    |     2 |    0 |    0 |    6 |   36 |   72 |   97 |  113 |  126 |  117 |   97 |   73 |   34 |    5 |    0 |    0 |    0 |   774 |
    |     3 |    0 |    0 |    9 |   46 |   75 |  105 |  121 |  122 |  128 |  126 |  105 |   71 |   33 |   10 |    0 |    0 |   952 |
    |     4 |    0 |    1 |   14 |   63 |  111 |  146 |  171 |  179 |  177 |  158 |  141 |  105 |   65 |   26 |    3 |    0 |  1360 |
    |     5 |    0 |    4 |   21 |   78 |  128 |  162 |  185 |  199 |  196 |  187 |  166 |  130 |   81 |   36 |    8 |    0 |  1581 |
    |     6 |    0 |    4 |   17 |   71 |  132 |  163 |  182 |  191 |  193 |  182 |  161 |  132 |   89 |   43 |   10 |    1 |  1572 |
    |     7 |    0 |    3 |   17 |   57 |  121 |  160 |  185 |  197 |  199 |  189 |  168 |  137 |   92 |   44 |   11 |    1 |  1581 |
    |     8 |    0 |    1 |   11 |   48 |  104 |  149 |  171 |  183 |  187 |  179 |  156 |  121 |   76 |   32 |    5 |    0 |  1421 |
    |     9 |    0 |    0 |    6 |   32 |   77 |  127 |  151 |  160 |  159 |  148 |  124 |   93 |   47 |   12 |    1 |    0 |  1137 |
    |    10 |    0 |    0 |    1 |   16 |   54 |   85 |  107 |  115 |  119 |  106 |   85 |   56 |   17 |    2 |    0 |    0 |   763 |
    |    11 |    0 |    0 |    5 |   30 |   57 |   70 |   84 |   83 |   76 |   64 |   35 |    8 |    1 |    0 |    0 |    0 |   512 |
    |    12 |    0 |    0 |    2 |   17 |   39 |   54 |   67 |   75 |   64 |   58 |   31 |    4 |    0 |    0 |    0 |    0 |   411 |
    |  NULL |    0 |   13 |  112 |  516 | 1023 | 1392 | 1628 | 1728 | 1703 | 1570 | 1294 |  902 |  506 |  203 |   38 |    2 | 12629 |
    +-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+

    mrr_cost_based=on - enable cost-based choice whether to use MRR. Currently not recommended, because cost model is not sufficiently tuned yet.

    optimizer_switch
    mrr_buffer_size
    read_rnd_buffer_size
    Handler_mrr_init
    Handler_mrr_init
    Handler_mrr_key_refills
    Handler_mrr_rowid_refills

    And one of the following conditions is also met:

    • The entire thread pool has fewer than thread_pool_max_threads.

    • There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.

  • The entire thread pool has fewer than thread_pool_max_threads.

  • There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.

  • And one of the following conditions is also met:

    • There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads. In this case, the new thread is intended to be a worker thread.

    • There is currently no listener thread in the thread group. For example, if the thread_pool_dedicated_listener system variable is not enabled, then the thread group's listener thread can became a worker thread, so that it could handle some client connection request. In this case, the new thread can become the thread group's listener thread.

  • The entire thread pool has fewer than thread_pool_max_threads.

  • There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.

  • If the thread group already has active worker threads, then the following condition also needs to be met:

    • A worker thread has not been created for the thread group within the throttling interval.

  • And one of the following conditions is also met:

    • The entire thread pool has fewer than thread_pool_max_threads.

    • There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.

  • A worker thread has not been created for the thread group within the throttling interval.

  • 0-(1 + thread_pool_oversubscribe)

    0

    4-7

    50 * THROTTLING_FACTOR

    8-15

    100 * THROTTLING_FACTOR

    16-65536

    20 * THROTTLING_FACTOR

    taskset
    thread_pool_size
    SHOW PROCESSLIST
    information_schema.PROCESSLIST
    thread_pool_dedicated_listener
    SLEEP()
    thread_pool_dedicated_listener
    thread_pool_oversubscribe
    thread_pool_stall_limit
    thread_pool_stall_limit
    SET GLOBAL
    option group
    option file
    thread_pool_stall_limit
    thread_pool_stall_limit
    thread_pool_stall_limit
    thread_pool_oversubscribe
    SET GLOBAL
    option group
    option file
    thread_pool_oversubscribe
    thread_pool_oversubscribe
    thread_pool_oversubscribe

    Dynamic: Yes

  • Data Type: numeric

  • Default Value: 1

  • Range: 1 to 100000

  • Scope: Global

  • Dynamic: No

  • Data Type: numeric

  • Default Value: 0

  • mode is enabled, the server uses the
    for client connections.
  • When the no-threads mode is enabled, the server uses a single thread for all client connections, which is really only usable for debugging.

  • Command line: --thread-handling=name

  • Scope: Global

  • Dynamic: No

  • Data Type: enumeration

  • Default Value: one-thread-per-connection (non-Windows), pool-of-threads (Windows)

  • Valid Values: no-threads, one-thread-per-connection, pool-of-threads.

  • Documentation: Using the thread pool.

  • Notes: In MySQL, the thread pool is only available in MySQL Enterprise. In MariaDB it's available in all versions.

  • Scope:

  • Dynamic:

  • Data Type: boolean

  • Default Value: 0

  • Introduced: MariaDB 10.5.0

  • Scope:
  • Dynamic:

  • Data Type: boolean

  • Default Value: 0

  • Introduced: MariaDB 10.5.0

  • Command line: thread-pool-idle-timeout=#
  • Scope: Global

  • Dynamic: Yes

  • Data Type: numeric

  • Default Value: 60

  • Documentation: Using the thread pool.

  • thread-pool-max-threads=#
  • Scope: Global

  • Dynamic: Yes

  • Data Type: numeric

  • Default Value: 65536

  • Range: 1 to 65536

  • Documentation: Using the thread pool.

  • Command line: thread-pool-min-threads=#
  • Data Type: numeric

  • Default Value: 1

  • Documentation: Using the thread pool.

  • This is primarily for internal use, and it is not meant to be changed for most users.
  • This system variable is only meaningful on Unix.

  • Scope: Global

  • Dynamic: Yes

  • Data Type: numeric

  • Default Value: 3

  • Range: 1 to 65536

  • Documentation: Using the thread pool.

  • Dynamic: Yes

  • Data Type: numeric

  • Default Value: 1000

  • Range: 0 to 4294967295

  • Introduced: MariaDB 10.2.2

  • Documentation: Using the thread pool.

  • Default Value: auto

  • Valid Values: high, low, auto.

  • Introduced: MariaDB 10.2.2

  • Documentation: Using the thread pool.

  • Command line: --thread-pool-size=#
  • Scope: Global

  • Dynamic: Yes

  • Data Type: numeric

  • Default Value: Based on the number of processors (but see MDEV-7806).

  • Range: 1 to 128

  • Documentation: Using the thread pool.

  • This system variable is only meaningful on Unix.
  • Note that if you are migrating from the MySQL Enterprise thread pool plugin, then the unit used in their implementation is 10ms, not 1ms.

  • Command line: --thread-pool-stall-limit=#

  • Scope: Global

  • Dynamic: Yes

  • Data Type: numeric

  • Default Value: 500

  • Range: 1 to 4294967295

  • Documentation: Using the thread pool.

  • numeric
    extra_port
    Thread Pool in MariaDB: Configuring the Extra Port
    Thread Pool in MariaDB: Configuring the Extra Port
    Information Schema Threadpool_Queues
    Information Schema Threadpool_Groups
    Information Schema Threadpool_Queues
    thread_pool_min_threads
    thread pool
    thread group
    thread pool
    thread_pool_idle_timeout
    Thread Groups in the Unix Implementation of the Thread Pool: Thread Group Oversubscription
    Thread pool
    thread groups
    thread pool
    Thread Groups in the Unix Implementation of the Thread Pool
    thread_pool_max_threads
    Thread Groups in the Unix Implementation of the Thread Pool: Thread Group Stalls
    thread pool
    thread pool
    thread_pool_max_threads
    Thread Pool in MariaDB
    thread pool
    character sets
    optimizer_switch
    MDEV-32113
    Making “tbl.utf8mb3_key_column=utf8mb4_expr” sargable

    Data Warehousing Summary Tables

    Preface

    This document discusses the creation and maintenance of "Summary Tables". It is a companion to the document on Data Warehousing Techniques.

    The basic terminology ("Fact Table", "", etc) is covered in that document.

    Summary tables for data warehouse "reports"

    Summary tables are a performance necessity for large tables. MariaDB and MySQL do not provide any automated way to create such, so I am providing techniques here.

    (Other vendors provide something similar with "materialized views".)

    When you have millions or billions of rows, it takes a long time to summarize the data to present counts, totals, averages, etc, in a size that is readily digestible by humans. By computing and saving subtotals as the data comes in, one can make "reports" run much faster. (I have seen 10x to 1000x speedups.) The subtotals go into a "summary table". This document guides you on efficiency in both creating and using such tables.

    General structure of a summary table

    A summary table includes two sets of columns:

    • Main KEY: date + some dimension(s)

    • Subtotals: COUNT(*), SUM(...), ...; but not AVG()

    The "date" might be a DATE (a 3-byte native datatype), or an hour, or some other time interval. A 3-byte MEDIUMINT UNSIGNED 'hour' can be derived from a DATETIME or TIMESTAMP via

    The "dimensions" (a DW term) are some of the columns of the "Fact" table. Examples: Country, Make, Product, Category, Host Non-dimension examples: Sales, Quantity, TimeSpent

    There would be one or more indexes, usually starting with some dimensions and ending with the date field. By ending with the date, one can efficiently get a range of days/weeks/etc. even when each row summarizes only one day.

    There will typically be a "few" summary tables. Often one summary table can serve multiple purposes sufficiently efficiently.

    As a rule of thumb, a summary table will have one-tenth the number of rows as the Fact table. (This number is very loose.)

    Example

    Let's talk about a large chain of car dealerships. The Fact table has all the sales with columns such as datetime, salesman_id, city, price, customer_id, make, model, model_year. One Summary table might focus on sales:

    When to augment the summary table(s)?

    "Augment" in this section means to add new rows into the summary table or increment the counts in existing rows.

    Plan A: "While inserting" rows into the Fact table, augment the summary table(s). This is simple, and workable for a smaller DW database (under 10 Fact table rows per second). For larger DW databases, Plan A likely to be too costly to be practical.

    Plan B: "Periodically", via cron or an EVENT.

    Plan C: "As needed". That is, when someone asks for a report, the code first updates the summary tables that will be needed.

    Plan D: "Hybrid" of B and C. C, by itself, can led to long delays for the report. By also doing B, those delays can be kept low.

    Plan E: (This is not advised.) "Rebuild" the entire summary table from the entire Fact table. The cost of this is prohibitive for large tables. However, Plan E may be needed when you decide to change the columns of a Summary Table, or discover a flaw in the computations. To lessen the impact of an entire build, adapt the chunking techniques in .

    Plan F: "Staging table". This is primarily for very high speed ingestion. It is mentioned briefly in this blog, and discussed more thoroughly in the companion blog: High Speed Ingestion

    Summarizing while Inserting (one row at a time)

    IODKU (Insert On Duplicate Key Update) will update an existing row or create a new row. It knows which to do based on the Summary table's PRIMARY KEY.

    Caution: This approach is costly, and will not scale to an ingestion rate of over, say, 10 rows per second (Or maybe 50/second on SSDs). More discussion later.

    Summarizing periodically vs as-needed

    If your reports need to be up-to-the-second, you need "as needed" or "hybrid". If your reports have less urgency (eg, weekly reports that don't include 'today'), then "periodically" might be best.

    For a daily summaries, augmenting the summary tables could be done right after midnight. But, beware of data coming "late".

    For both "periodic" and "as needed", you need a definitive way of keeping track of where you "left off".

    Case 1: You insert into the Fact table first and it has an AUTO_INCREMENT id: Grab MAX(id) as the upper bound for summarizing and put it either into some other secure place (an extra table), or put it into the row(s) in the Summary table as you insert them. (Caveat: AUTO_INCREMENT ids do not work well in multi-master, including Galera, setups.)

    Case 2: If you are using a 'staging' table, there is no issue. (More on staging tables later.)

    Summarizing while batch inserting

    This applies to multi-row (batch) INSERT and LOAD DATA.

    The Fact table needs an AUTO_INCREMENT id, and you need to be able to find the exact range of ids inserted. (This may be impractical in any multi-master setup.)

    Then perform bulk summarization using

    Summarizing when using a staging table

    Load the data (via INSERTs or [LOAD DATA) en masse into a "staging table". Then perform batch summarization from the Staging table. And batch copy from the Staging table to the Fact table. Note that the Staging table is handy for batching "normalization" during ingestion. See also [

    Summary table: PK or not?

    Let's say your summary table has a DATE, dy, and a dimension, foo. The question is: Should (foo, dy) be the PRIMARY KEY? Or a non-UNIQUE index?

    Case 1: PRIMARY KEY (foo, dy) and summarization is in lock step with, say, changes in dy.

    This case is clean and simple -- until you get to endcases. How will you handle the case of data arriving 'late'? Maybe you will need to recalculate some chunks of data? If so, how?

    Case 2: (foo, dy) is a non-UNIQUE INDEX.

    This case is clean and simple, but it can clutter the summary table because multiple rows can occur for a given (foo, dy) pair. The report will always have to up values because it cannot assume there is only one row, even when it is reporting on a single foo for a single dy. This forced-SUM is not really bad -- you should do it anyway; that way all your reports are written with one pattern.

    Case 3: PRIMARY KEY (foo, dy) and summarization can happen anytime.

    Since you should be using InnoDB, there needs to be an explicit PRIMARY KEY. One approach when you do not have a 'natural' PK is this:

    This case pushes the complexity onto the summarization by doing a IODKU.

    Advice? Avoid Case 1; too messy. Case 2 is ok if the extra rows are not too common. Case 3 may be the closest to "once size fits all".

    Averages, etc.

    When summarizing, include COUNT(*) AS ct and SUM(foo) AS sum_foo. When reporting, the "average" is computed as SUM(sum_foo) / SUM(ct). That is mathematically correct.

    Exception... Let's say you are looking at weather temperatures. And you monitoring station gets the temp periodically, but unreliably. That is, the number of readings for a day varies. Further, you decide that the easiest way to compensate for the inconsistency is to do something like: Compute the avg temp for each day, then average those across the month (or other timeframe).

    Formula for Standard Deviation:

    Where sum_foo2 is SUM(foo * foo) from the summary table. sum_foo and sum_foo2 should be FLOAT. FLOAT gives you about 7 significant digits, which is more than enough for things like average and standard deviation. FLOAT occupies 4 bytes. DOUBLE would give you more precision, but occupies 8 bytes. INT and BIGINT are not practical because they may lead to complaints about overflow.

    Staging table

    The idea here is to first load a set of Fact records into a "staging table", with the following characteristics (at least):

    • The table is repeatedly populated and truncated

    • Inserts could be individual or batched, and from one or many clients

    • SELECTs will be table scans, so no indexes needed

    • Inserting will be fast (InnoDB may be the fastest)

    If you have bulk inserts (Batch INSERT or LOAD DATA) then consider doing the normalization and summarization immediately after each bulk insert.

    More details: High Speed Ingestion

    Extreme design

    Here is a more complex way to design the system, with the goal of even more scaling.

    • Use master-slave setup: ingest into master; report from slave(s).

    • Feed ingestion through a staging table (as described above)

    • Single-source of data: ENGINE=MEMORY; multiple sources: InnoDB

    Explanation and comments:

    • ROW + ignore_db avoids replicating Staging, yet replicates the INSERTs based on it. Hence, it lightens the write load on the Slaves

    • If using MEMORY, remember that it is volatile -- recover from a crash by starting the ingestion over.

    • To aid with debugging, TRUNCATE or re-CREATE Staging at the start of the next cycle.

    • Staging needs no indexes -- all operations read all rows from it.

    Stats on the system that this 'extreme design' came from: Fact Table: 450GB, 100M rows/day (batch of 4M/hour), 60 day retention (60+24 partitions), 75B/row, 7 summary tables, under 10 minutes to ingest and summarize the hourly batch. The INSERT..SELECT handled over 20K rows/sec going into the Fact table. Spinning drives (not SSD) with RAID-10.

    "Left Off"

    One technique involves summarizing some of the data, then recording where you "left off", so that next time, you can start there. There are some subtle issues with "left off" that you should be cautious of.

    If you use a DATETIME or TIMESTAMP as "left off", beware of multiple rows with the same value.

    • Plan A: Use a compound "left off" (eg, TIMESTAMP + ID). This is messy, error prone, etc.

    • Plan B: WHERE ts >= $left_off AND ts < $max_ts -- avoids dups, but has other problems (below)

    • Separate threads could COMMIT TIMESTAMPs out of order.

    If you use an AUTO_INCREMENT as "left off" beware of:

    • In InnoDB, separate threads could COMMIT ids in the 'wrong' order.

    • Multi-master (including Galera and InnoDB Cluster), could lead to ordering issues.

    So, nothing works, at least not in a multi-threaded environment?

    If you can live with an occasional hiccup (skipped record), then maybe this is 'not a problem' for you.

    The "Flip-Flop Staging" is a safe alternative, optionally combined with the "Extreme Design".

    Flip-flop staging

    If you have many threads simultaneously INSERTing into one staging table, then here is an efficient way to handle a large load: Have a process that flips that staging table with another, identical, staging table, and performs bulk normalization, Fact insertion, and bulk summarization.

    The flipping step uses a fast, atomic, RENAME.

    Here is a sketch of the code:

    Meanwhile, ingestion can continue writing to Staging. The ingestion INSERTs will conflict with the RENAME, but will be resolved gracefully and silently and quickly.

    How fast should you flip-flop? Probably the best scheme is to

    • Have a job that flip-flops in a tight loop (no delay, or a small delay, between iterations), and

    • Have a CRON that serves only as a "keep-alive" to restart the job if it dies.

    If Staging is 'big', an iteration will take longer, but run more efficiently. Hence, it is self-regulating.

    In a (or InnoDB Cluster?) environment, each node could be receiving input. If can afford to loose a few rows, have Staging be a non-replicated MEMORY table. Otherwise, have one Staging per node and be InnoDB; it will be more secure, but slower and not without problems. In particular, if a node dies completely, you somehow need to process its Staging table.

    Multiple summary tables

    • Look at the reports you will need.

    • Design a summary table for each.

    • Then look at the summary tables -- you are likely to find some similarities.

    • Merge similar ones.

    To look at what a report needs, look at the WHERE clause that would provide the data. Some examples, assuming data about service records for automobiles: The GROUP BY to gives a clue of what the report might be about.

    1. WHERE make = ? AND model_year = ? GROUP BY service_date, service_type

    2. WHERE make = ? AND model = ? GROUP BY service_date, service_type

    3. WHERE service_type = ? GROUP BY make, model, service_date

    4. WHERE service_date between ? and ? GROUP BY make, model, model_year

    You need to allow for 'ad hoc' queries? Well, look at all the ad hoc queries -- they all have a date range, plus nail down one or two other things. (I rarely see something as ugly as '%CL%' for nailing down another dimension.) So, start by thinking of date plus one or two other dimensions as the 'key' into a new summary table. Then comes the question of what data might be desired -- counts, sums, etc. Eventually you have a small set of summary tables. Then build a front end to allow them to pick only from those possibilities. It should encourage use of the existing summary tables, not be truly 'open ended'.

    Later, another 'requirement' may surface. So, build another summary table. Of course, it may take a day to initially populate it.

    Games on summary tables

    Does one ever need to summarize a summary table? Yes, but only in extreme situations. Usually a 'weekly' report can be derived from a 'daily' summary table; building a separate weekly summary table not being worth the effort.

    Would one ever PARTITION a Summary Table? Yes, in extreme situations, such as the table being large, and

    • Need to purge old data (unlikely), or

    • Recent' data is usually requested, and the index(es) fail to prevent table scans (rare). ("Partition pruning" to the rescue.)

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    Examples

    This page is licensed: CC BY-SA / Gnu FDL

    Data Warehousing Techniques

    Preface

    This document discusses techniques for improving performance for data-warehouse-like tables in MariaDB and MySQL.

    • How to load large tables.

    • .

    • Developing 'summary tables' to make 'reports' efficient.

    • Purging old data.

    Details on summary tables is covered in the companion document: .

    Terminology

    This list mirrors "Data Warehouse" terminology.

    • Fact table -- The one huge table with the 'raw' data.

    • Summary table -- a redundant table of summarized data that could -- use for efficiency

    • Dimension -- columns that identify aspects of the dataset (region, country, user, SKU, zipcode, ...)

    • Normalization table (dimension table) -- mapping between strings an ids; used for space and speed.

    Fact table

    Techniques that should be applied to the huge Fact table.

    • id INT/BIGINT UNSIGNED NOT NULL AUTO_INCREMENT

    • PRIMARY KEY (id)

    • Probably no other INDEXes

    • Accessed only via id

    There are exceptions where the Fact table must be accessed to retrieve multiple rows. However, you should minimize the number of INDEXes on the table because they are likely to be costly on INSERT.

    Why keep the Fact table?

    Once you have built the Summary table(s), there is not much need for the Fact table. One option that you should seriously consider is to not have a Fact table. Or, at least, you could purge old data from it sooner than you purge the Summary tables. Maybe even keep the Summary tables forever.

    Case 1: You need to find the raw data involved in some event. But how will you find those row(s)? This is where a secondary index may be required.

    If a secondary index is bigger than can be cached in RAM, and if the column(s) being indexed is random, then each row inserted may cause a disk hit to update the index. This limits insert speed to something like 100 rows per second (on ordinary disks). Multiple random indexes slow down insertion further. RAID striping and/or SSDs speed up insertion. Write caching helps, but only for bursts.

    Case 2: You need some event, but you did not plan ahead with the optimal INDEX. Well, if the data is PARTITIONed on date, so even if you have a clue of when the event occurred, "partition pruning" will keep the query from being too terribly slow.

    Case 3: Over time, the application is likely to need new 'reports', which may lead to a new Summary table. At this point, it would be handy to scan through the old data to fill up the new table.

    Case 4: You find a flaw in the summarization, and need to rebuild an existing Summary table.

    Cases 3 and 4 both need the "raw" data. But they don't necessarily need the data sitting in a database table. It could be in the pre-database format (such as log files). So, consider not building the Fact table, but simply keep the raw data, comressed, on some file system.

    Batching the load of the Fact table

    When talking about billions of rows in the Fact table, it is essentially mandatory that you "batch" the inserts. There are two main ways:

    • INSERT INTO Fact (.,.,.) VALUES (.,.,.), (.,.,.), ...; -- "Batch insert"

    • LOAD DATA ...;

    A third way is to INSERT or LOAD into a Staging table, then

    • INSERT INTO Fact SELECT * FROM Staging; This INSERT..SELECT allows you to do other things, such as normalization. More later.

    Batched INSERT Statement

    Chunk size should usually be 100-1000 rows.

    • 100-1000 an insert will run 10 times as fast as single-row inserts.

    • Beyond 100, you may be interfering replication and SELECTs.

    • Beyond 1000, you are into diminishing returns -- virtually no further performance gains.

    • Don't go past, say, 1MB for the constructed INSERT statement. This deals with packet sizes, etc. (1MB is unlikely to be hit for a Fact table.) Decide whether your application should lean toward the 100 or the 1000.

    If your data is coming in continually, and you are adding a batching layer, let's do some math. Compute your ingestion rate -- R rows per second.

    • If R < 10 (= 1M/day = 300M/year) -- single-row INSERTs would probably work fine (that is, batching is optional)

    • If R < 100 (3B records per year) -- secondary indexes on Fact table may be ok

    • If R < 1000 (100M records/day) -- avoid secondary indexes on Fact table.

    • If R > 1000 -- Batching may not work. Decide how long (S seconds) you can stall loading the data in order to collect a batch of rows.

    If batching seems viable, then design the batching layer to gather for S seconds or 100-1000 rows, whichever comes first.

    (Note: Similar math applies to rapid UPDATEs of a table.)

    Normalization (Dimension) table

    Normalization is important in Data Warehouse applications because it significantly cuts down on the disk footprint and improves performance. There are other reasons for normalizing, but space is the important one for DW.

    Here is a typical pattern for a Dimension table:

    Notes:

    • MEDIUMINT is 3 bytes with UNSIGNED range of 0..16M; pick SMALLINT, INT, etc, based on a conservative estimate of how many 'foo's you will eventually have.

    • datatype sizes

    • There may be more than one VARCHAR in the table. Example: For cities, you might have City and Country.

    • InnoDB is better than MyISAM because of way the two keys are structured.

    Batched normalization

    I bring this up as a separate topic because of some of the subtle issues that can happen.

    You may be tempted to do

    It has the problem of "burning" AUTO_INCREMENT ids. This is because MariaDB pre-allocates ids before getting to "IGNORE". That could rapidly increase the AUTO_INCREMENT values beyond what you expected.

    Better is this...

    Notes:

    • The LEFT JOIN .. IS NULL finds the foos that are not yet in Foos.

    • This INSERT..SELECT must not be done inside the transaction with the rest of the processing. Otherwise, you add to deadlock risks, leading to burned ids.

    • IGNORE is used in case you are doing the INSERT from multiple processes simultaneously.

    Once that INSERT is done, this will find all the foo_ids it needs:

    An advantage of "Batched Normalization" is that you can summarize directly from the Staging table. Two approaches:

    Case 1: PRIMARY KEY (dy, foo) and summarization is in lock step with, say, changes in dy.

    • This approach can have troubles if new data arrives after you have summarized the day's data.

    Case 2: (dy, foo) is a non-UNIQUE INDEX.

    • Same code as Case 1.

    • By having the index be non-UNIQUE, delayed data simply shows up as extra rows.

    • You need to take care to avoid summarizing the data twice. (The id on the Fact table may be a good tool for that.)

    Case 3: PRIMARY KEY (dy, foo) and summarization can happen anytime.

    Too many choices?

    This document lists a number of ways to do things. Your situation may lead to one approach being more/less acceptable. But, if you are thinking "Just tell me what to do!", then here:

    • Batch load the raw data into a temporary table (Staging).

    • Normalize from Staging -- use code in Case 3.

    • INSERT .. SELECT to move the data from Staging into the Fact table

    Those techniques should perform well and scale well in most cases. As you develop your situation, you may discover why I described alternative solutions.

    Purging old data

    Typically the Fact table is PARTITION BY RANGE (10-60 ranges of days/weeks/etc) and needs purging (DROP PARTITION) periodically. This discusses a safe/clean way to design the partitioning and do the DROPs: Purging PARTITIONs

    Master / slave

    For "read scaling", backup, and failover, use master-slave replication or something fancier. Do ingestion only on a single active master; it replicate to the slave(s). Generate reports on the slave(s).

    Sharding

    "Sharding" is the splitting of data across multiple servers. (In contrast, and have the same data on all servers, requiring all data to be written to all servers.)

    With the non-sharding techniques described here, terabyte(s) of data can be handled by a single machine. Tens of terabytes probably requires sharding.

    Sharding is beyond the scope of this document.

    How fast? How big?

    With the techniques described here, you may be able to achieve the following performance numbers. I say "may" because every data warehouse situation is different, and you may require performance-hurting deviations from what I describe here. I give multiple options for some aspects; these may cover some of your deviations.

    One big performance killer is UUID/GUID keys. Since they are very 'random', updates of them (at scale) are limited to 1 row = 1 disk hit. Plain disks can handle only 100 hits/second. RAID and/or SSD can increase that to something like 1000 hits/sec. Huge amounts of RAM (for caching the random index) are a costly solution. It is possible to turn type-1 UUIDs into roughly-chronological keys, thereby mittigating the performance problems if the UUIDs are written/read with some chronological clustering. UUID discussion

    Hardware, etc:

    • Single SATA drive: 100 IOPs (Input/Output operations per second)

    • RAID with N physical drives -- 100*N IOPs (roughly)

    • SSD -- 5 times as fast as rotating media (in this context)

    • Batch INSERT -- 100-1000 rows is 10 times as fast as INSERTing 1 row at a time (see above)

    "Count the disk hits" -- back-of-envelope performance analysis

    • Random accesses to a table/index -- count each as a disk hit.

    • At-the-end accesses (INSERT chronologically or with AUTO_INCREMENT; range SELECT) -- count as zero hits.

    • In between (hot/popular ids, etc) -- count as something in between

    • For INSERTs, do the analysis on each index; add them up.

    More on Count the Disk Hits

    How fast?

    Look at your data; compute raw rows per second (or hour or day or year). There are about 30M seconds in a year; 86,400 seconds per day. Inserting 30 rows per second becomes a billion rows per year.

    10 rows per second is about all you can expect from an ordinary machine (after allowing for various overheads). If you have less than that, you don't have many worries, but still you should probably create Summary tables. If more than 10/sec, then batching, etc, becomes vital. Even on spiffy hardware, 100/sec is about all you can expect without utilizing the techniques here.

    Not so fast?

    Let's say your insert rate is only one-tenth of your disk IOPs (eg, 10 rows/sec vs 100 IOPs). Also, let's say your data is not "bursty"; that is, the data comes in somewhat soothly throughout the day.

    Note that 10 rows/sec (300M/year) implies maybe 30GB for data + indexes + normalization tables + summary tables for 1 year. I would call this "not so big".

    Still, the and summarization are important. Normalization keeps the data from being, say, twice as big. Summarization speeds up the reports by orders of magnitude.

    Let's design and analyse a "simple ingestion scheme" for 10 rows/second, without 'batching'.

    Depending on the number and randomness of your indexes, etc, 10 Fact rows may (or may not) take less than 100 IOPs.

    Also, note that as the data grows over time, random indexes will become less and less likely to be cached. That is, even if runs fine with 1 year's worth of data, it may be in trouble with 2 year's worth.

    For those reasons, I started this discussion with a wide margin (10 rows versus 100 IOPs).

    References

    • Summary Tables

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Groupwise Max in MariaDB

    The problem

    You want to find the largest row in each group of rows. An example is looking for the largest city in each state. While it is easy to find the MAX(population) ... GROUP BY state, it is hard to find the name of the city associated with that population. Alas, MySQL and MariaDB do not have any syntax to provide the solution directly.

    This article is under construction, mostly for cleanup. The content is reasonably accurate during construction.

    The article presents two "good" solutions. They differ in ways that make neither of them 'perfect'; you should try both and weigh the pros and cons.

    Also, a few "bad" solutions will be presented, together with why they were rejected.

    MySQL manual gives 3 solutions; only the "Uncorrelated" one is "good", the other two are "bad".

    Sample data

    To show how the various coding attempts work, I have devised this simple task: Find the largest city in each Canadian province. Here's a sample of the source data (5493 rows):

    Here's the desired output (13 rows):

    Duplicate max

    One thing to consider is whether you want -- or do not want -- to see multiple rows for tied winners. For the dataset being used here, that would imply that the two largest cities in a province had identical populations. For this case, a duplicate would be unlikely. But there are many groupwise-max use cases where duplictes are likely.

    The two best algorithms differ in whether they show duplicates.

    Using an uncorrelated subquery

    Characteristics:

    • Superior performance or medium performance

    • It will show duplicates

    • Needs an extra index

    • Probably requires 5.6

    An 'uncorrelated subquery':

    But this also 'requires' an extra index: INDEX(province, population). In addition, MySQL has not always been able to use that index effectively, hence the "requires 5.6". (I am not sure of the actual version.)

    Without that extra index, you would need 5.6, which has the ability to create indexes for subqueries. This is indicated by <auto_key0> in the EXPLAIN. Even so, the performance is worse with the auto-generated index than with the manually generated one.

    With neither the extra index, nor 5.6, this 'solution' would belong in 'The Duds' because it would run in O(N*N) time.

    Using @variables

    Characteristics:

    • Good performance

    • Does not show duplicates (picks one to show)

    • Consistent O(N) run time (N = number of input rows)

    • Only one scan of the data

    For your application, change the lines with comments.

    The duds

    *** 'correlated subquery' (from MySQL doc):**

    O(N*N) (that is, terrible) performance

    *** LEFT JOIN (from MySQL doc):**

    Medium performance (2N-3N, depending on join_buffer_size).

    For O(N*N) time,... It will take one second to do a groupwise-max on a few thousand rows; a million rows could take hours.

    Top-N in each group

    This is a variant on "groupwise-max" wherein you desire the largest (or smallest) N items in each group. Do these substitutions for your use case:

    • province --> your 'GROUP BY'

    • Canada --> your table

    • 3 --> how many of each group to show

    • population --> your numeric field for determining "Top-N"

    Output:

    The performance of this is O(N), actually about 3N, where N is the number of source rows.

    EXPLAIN EXTENDED gives

    Explanation, shown in the same order as the EXPLAIN, but numbered chronologically: 3. Get the subquery id=2 (init) 4. Scan the output from subquery id=3 (x) 2. Subquery id=3 -- the table scan of Canada

    1. Subquery id=2 -- init for simply initializing the two @variables Yes, it took two sorts, though probably in RAM.

    Main Handler values:

    Top-n in each group, take II

    This variant is faster than the previous, but depends on city being unique across the dataset. (from openark.org)

    Output. Note how there can be more than 3 cities per province:

    Main Handler values:

    Top-n using MyISAM

    (This does not need your table to be MyISAM, but it does need MyISAM tmp table for its 2-column PRIMARY KEY feature.) See previous section for what changes to make for your use case.

    The main handler values (total of all operations):

    Both "Top-n" formulations probably take about the same amount of time.

    Windowing functions

    Hot off the press from Percona Live... has "windowing functions", which make "groupwise max" much more straightforward.

    The code: TBD

    Postlog

    Developed a first posted, Feb, 2015; Add MyISAM approach: July, 2015; Openark's method added: Apr, 2016; Windowing: Apr 2016

    I did not include the technique(s) using GROUP_CONCAT. They are useful in some situations with small datasets. They can be found in the references below.

    See also

    • This has some of these algorithms, plus some others:

    • Other references:

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Query Limits and Timeouts

    This article describes the different methods MariaDB provides to limit/timeout a query:

    LIMIT

    The LIMIT clause restricts the number of returned rows.

    Stops the query after 'rows_limit' number of rows have been examined.

    sql_safe_updates

    If the variable is set, one can't execute an or statement unless one specifies a key constraint in the WHERE clause or provide a LIMIT clause (or both).

    sql_select_limit

    acts as an automatic LIMIT row_count to any query.

    The above is the same as:

    max_join_size

    If the variable (also called sql_max_join_size) is set, then it will limit any SELECT statements that probably need to examine more thanMAX_JOIN_SIZE rows.

    max_statement_time

    If the variable is set, any query (excluding stored procedures) taking longer than the value of max_statement_time (specified in seconds) to execute will be aborted. This can be set globally, by session, as well as per user and per query. See .

    See Also

    • variable

    This page is licensed: CC BY-SA / Gnu FDL

    Thread Pool in MariaDB

    Problems That Thread Pools Solve

    The task of scalable server software (and a DBMS like MariaDB is an example of such software) is to maintain top performance with an increasing number of clients. MySQL traditionally assigned a thread for every client connection, and as the number of concurrent users grows this model shows performance drops. Many active threads are a performance killer, because increasing the number of threads leads to extensive context switching, bad locality for CPU caches, and increased contention for hot locks. An ideal solution that would help to reduce context switching is to maintain a lower number of threads than the number of clients. But this number should not be too low either, since we also want to utilize CPUs to their fullest, so ideally, there should be a single active thread for each CPU on the machine.

    thread_group_id = connection_id % thread_pool_size
    THROTTLING_FACTOR = thread_pool_stall_limit / MAX (500,thread_pool_stall_limit)
    SET GLOBAL thread_pool_stall_limit=300;
    [mariadb]
    ..
    thread_handling=pool-of-threads
    thread_pool_size=32
    thread_pool_stall_limit=300
    SET GLOBAL thread_pool_oversubscribe=10;
    [mariadb]
    ..
    thread_handling=pool-of-threads
    thread_pool_size=32
    thread_pool_stall_limit=300
    thread_pool_oversubscribe=10
    CREATE TABLE users (
      user_name_mb4 VARCHAR(100) COLLATE utf8mb4_general_ci,
      ...
    );
    CREATE TABLE orders (
      user_name_mb3 VARCHAR(100) COLLATE utf8mb3_general_ci,
      ...,
      INDEX idx1(user_name_mb3)
    );
    SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
    CONVERT(orders.user_name_mb3 USING utf8mb4) = users.user_name_mb4
    EXPLAIN SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
    +------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
    | id   | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra                                           |
    +------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
    |    1 | SIMPLE      | users  | ALL  | NULL          | NULL | NULL    | NULL | 1000  |                                                 |
    |    1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 10330 | Using where; Using join buffer (flat, BNL join) |
    +------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
    SET optimizer_switch='cset_narrowing=ON';
    
    EXPLAIN SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
    +------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
    | id   | select_type | table  | type | possible_keys | key  | key_len | ref                 | rows | Extra                 |
    +------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
    |    1 | SIMPLE      | users  | ALL  | NULL          | NULL | NULL    | NULL                | 1000 | Using where           |
    |    1 | SIMPLE      | orders | ref  | idx1          | idx1 | 303     | users.user_name_mb4 | 1    | Using index condition |
    +------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
    SET optimizer_switch='cset_narrowing=ON';
    SELECT ... LIMIT row_count
    OR
    SELECT ... LIMIT OFFSET, row_count
    OR
    SELECT ... LIMIT row_count OFFSET OFFSET
    LIMIT ROWS EXAMINED
    sql_safe_updates
    UPDATE
    DELETE
    sql_select_limit
    SELECT
    max_join_size
    max_statement_time
    Aborting statements that take longer than a certain time to execute
    WAIT and NOWAIT
    Aborting statements that take longer than a certain time to execute
    lock_wait_timeout
    SELECT ... LIMIT ROWS EXAMINED rows_limit;
    SET @@SQL_SAFE_UPDATES=1
    UPDATE tbl_name SET not_key_column=val;
    -> ERROR 1175 (HY000): You are using safe update mode 
      and you tried to update a table without a WHERE that uses a KEY column
    SET @@SQL_SELECT_LIMIT=1000
    SELECT * FROM big_table;
    SELECT * FROM big_table LIMIT 1000;
    SET @@MAX_JOIN_SIZE=1000;
    ->ERROR 1104 (42000): The SELECT would examine more than MAX_JOIN_SIZE ROWS; 
    SELECT COUNT(null_column) FROM big_table;
      CHECK your WHERE AND USE SET SQL_BIG_SELECTS=1 OR SET MAX_JOIN_SIZE=# IF the SELECT IS okay

    Normalization can be done in bulk, hence efficiently

  • Copying to the Fact table will be fast

  • Summarization can be done in bulk, hence efficiently

  • "Bursty" ingestion is smoothed by this process

  • Flip-flop a pair of Staging tables

  • Use binlog_ignore_db to avoid replicating staging -- necessitating putting it in a separate database.

  • Do the summarization from Staging

  • Load Fact via INSERT INTO Fact ... SELECT FROM Staging ...

  • Deleting in chunks
    data-warehousing-high-speed-ingestion|High Speed Ingestion
    SUM()
    binlog_format = ROW
    Rick James' site
    summarytables
    1876
    1766831
    1766831
    Galera

    Normalization -- The process of building the mapping ('New York City' <-> 123)

    All VARCHARs are "normalized"; ids are stored instead
  • ENGINE = InnoDB

  • All "reports" use summary tables, not the Fact table

  • Summary tables may be populated from ranges of id (other techniques described below)

  • If S < 0.1s -- May not be able to keep up

    The secondary key is effectively (email_id, email), hence 'covering' for certain queries.

  • It is OK to not specify an AUTO_INCREMENT to be UNIQUE.

  • Summarize from Staging to Summary table(s) via IODKU (Insert ... On Duplicate Key Update).
  • Drop the Staging

  • Purge "old" data -- Do not use DELETE or TRUNCATE, design so you can use DROP PARTITION (see above)

  • Think of each INDEX (except the PRIMARY KEY on InnoDB) as a separate table

  • Consider access patterns of each table/index: random vs at-the-end vs something in between

  • For SELECTs, do the analysis on the one index used, plus the table. (Use of 2 indexes is rare.) Insert cost, based on datatype of first column in an index:

  • AUTO_INCREMENT -- essentially 0 IOPs

  • DATETIME, TIMESTAMP -- essentially 0 for 'current' times

  • UUID/GUID -- 1 per insert (terrible)

  • Others -- depends on their patterns SELECT cost gets a little tricky:

  • Range on PRIMARY KEY -- think of it as getting 100 rows per disk hit.

  • IN on PRIMARY KEY -- 1 disk hit per item in IN

  • "=" -- 1 hit (for 1 row)

  • Secondary key -- First compute the hits for the index, then...

  • Think of each row as needing 1 disk hit.

  • However, if the rows are likely to be 'near' each other (based on the PRIMARY KEY), then it could be < 1 disk hit/row.

  • Summary Tables
    replication
    sec. 3.3.2: Dimensional Model and "Star schema"
    Rick James' site
    datawarehouse
    Galera
    If all goes well, it will run in O(M) where M is the number of output rows.
    city --> more field(s) you want to show
  • Change the SELECT and ORDER BY if you desire

  • DESC to get the 'largest'; ASC for the 'smallest'

  • Adding a large LIMIT to a subquery may make things work.

  • StackOverflow thread

  • row_number(), rank(), dense_rank()

  • Perentile blog

  • Peter Brawley's blog
    Jan Kneschke's blog from 2007
    StackOverflow discussion of 'Uncorrelated'
    Inner ORDER BY thrown away
    Rick James' site
    groupwise_max
    MariaDB Thread Pool Features

    MariaDB has a dynamic and adaptive thread pool, aimed at optimizing resource utilization and preventing deadlocks.

    For example, a thread may depend on another thread's completion, and they may block each other via locks and/or I/O. It is hard, and sometimes impossible, to predict how many threads are ideal or even sufficient to prevent deadlocks in every situation. MariaDB implements a dynamic and adaptive pool that takes care of creating new threads in times of high demand, and retiring threads if they have nothing to do. This is a complete reimplementation of the legacy pool-of-threads scheduler, with the following goals:

    • Make the pool dynamic, so that it will grow and shrink whenever required.

    • Minimize the amount of overhead that is required to maintain the thread pool itself.

    • Make the best use of underlying OS capabilities. For example, if a native thread pool implementation is available, it should be used. If not, the best I/O multiplexing method should be used.

    • Limit the resources used by threads.

    There are currently two different low-level implementations – depending on OS. One implementation is designed specifically for Windows which utilizes a native CreateThreadpool API. The second implementation is primarily intended to be used in Unix-like systems. Because the implementations are different, some system variables differ between Windows and Unix.

    When to Use the Thread Pool

    Thread pools are most efficient in situations where queries are relatively short and the load is CPU-bound, such as in OLTP workloads. If the workload is not CPU-bound, then you might still benefit from limiting the number of threads to save memory for the database memory buffers.

    When the Thread Pool is Less Efficient

    There are special, rare cases where the thread pool is likely to be less efficient.

    • If you have a very bursty workload, then the thread pool may not work well for you. These tend to be workloads in which there are long periods of inactivity followed by short periods of very high activity by many users. These also tend to be workloads in which delays cannot be tolerated, so the throttling of thread creation that the thread pool uses is not ideal. Even in this situation, performance can be improved by tweaking how often threads are retired. For example, with thread_pool_idle_timeout on Unix, or with thread_pool_min_threads on Windows.

    • If you have many concurrent, long, non-yielding queries, then the thread pool may not work well for you. In this context, a "non-yielding" query is one that never waits or which does not indicate waits to the thread pool. These kinds of workloads are mostly used in data warehouse scenarios. Long-running, non-yielding queries will delay execution of other queries. However, the thread pool has stall detection to prevent them from totally monopolizing the thread pool. See Thread Groups in the Unix Implementation of the Thread Pool: Thread Group Stalls for more information. Even when the whole thread pool is blocked by non-yielding queries, you can still connect to the server through the extra-port TCP/IP port.

    • If you rely on the fact that simple queries always finish quickly, no matter how loaded your database server is, then the thread pool may not work well for you. When the thread pool is enabled on a busy server, even simple queries might be queued to be executed later. This means that even if the statement itself doesn't take much time to execute, even a simple SELECT 1, might take a bit longer when the thread pool is enabled than with one-thread-per-connection if it gets queued.

    Configuring the Thread Pool

    The thread_handling system variable is the primary system variable that is used to configure the thread pool.

    There are several other system variables as well, which are described in the sections below. Many of the system variables documented below are dynamic, meaning that they can be changed with SET GLOBAL on a running server.

    Generally, there is no need to tweak many of these system variables. The goal of the thread pool was to provide good performance out-of-the box. However, the system variable values can be changed, and we intended to expose as many knobs from the underlying implementation as we could. Feel free to tweak them as you see fit.

    If you find any issues with any of the default behavior, then we encourage you to .

    See Thread Pool System and Status Variables for the full list of the thread pool's system variables.

    Configuring the Thread Pool on Unix

    On Unix, if you would like to use the thread pool, then you can use the thread pool by setting the thread_handling system variable to pool-of-threads in a server option group in an option file prior to starting up the server. For example:

    The following system variables can also be configured on Unix:

    • thread_pool_size – The number of thread groups in the thread pool, which determines how many statements can execute simultaneously. The default value is the number of CPUs on the system. When setting this system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting this system variable's value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher. See Thread Groups in the Unix Implementation of the Thread Pool for more information.

    • thread_pool_max_threads – The maximum number of threads in the thread pool. Once this limit is reached, no new threads will be created in most cases. In rare cases, the actual number of threads can slightly exceed this, because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks. The default value is 65536.

    • – The number of milliseconds between each stall check performed by the timer thread. The default value is 500. Stall detection is used to prevent a single client connection from monopolizing a thread group. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel. However, note that the timer thread will not create a new worker thread if the number of threads in the thread pool is already greater than or equal to the maximum defined by the variable, unless the thread group does not already have a listener thread. See for more information.

    • – Determines how many worker threads in a thread group can remain active at the same time once a thread group is oversubscribed due to stalls. The default value is 3. Usually, a thread group only has one active worker thread at a time. However, the timer thread can add more active worker threads to a thread group if it detects a stall. There are trade-offs to consider when deciding whether to allow only one thread per CPU to run at a time, or whether to allow more than one thread per CPU to run at a time. Allowing only one thread per CPU means that the thread can have unrestricted access to the CPU while its running, but it also means that there is additional overhead from putting threads to sleep or waking them up more frequently. Allowing more than one thread per CPU means that the threads have to share the CPU, but it also means that there is less overhead from putting threads to sleep or waking them up. This is primarily for internal use, and it is not meant to be changed for most users. See for more information.

    • – The number of seconds before an idle worker thread exits. The default value is 60. If there is currently no work to do, how long should an idle thread wait before exiting?

    Configuring the Thread Pool on Windows

    The Windows implementation of the thread pool uses a native thread pool created with the CreateThreadpool API.

    On Windows, if you would like to use the thread pool, then you do not need to do anything, because the default for the thread_handling system variable is already preset to pool-of-threads.

    However, if you would like to use the old one thread per-connection behavior on Windows, then you can use that by setting the thread_handling system variable to one-thread-per-connection in a server option group in an option file prior to starting up the server. For example:

    On older versions of Windows, such as XP and 2003, pool-of-threads is not implemented, and the server will silently switch to using the legacyone-thread-per-connection method.

    The native CreateThreadpool API allows applications to set the minimum and maximum number of threads in the pool. The following system variables can be used to configure those values on Windows:

    • thread_pool_min_threads – The minimum number of threads in the pool. Default is 1. This applicable in a special case of very “bursty” workloads. Imagine having longer periods of inactivity after periods of high activity. While the thread pool is idle, Windows would decide to retire pool threads (based on experimentation, this seems to happen after thread had been idle for 1 minute). Next time high load comes, it could take some milliseconds or seconds until the thread pool size stabilizes again to optimal value. To avoid thread retirement, one could set the parameter to a higher value.

    • thread_pool_max_threads – The maximum number of threads in the pool. Threads are not created when this value is reached. The default is 1000. This parameter can be used to prevent the creation of new threads if the pool can have short periods where many or all clients are blocked (for example, with FLUSH TABLES WITH READ LOCK, high contention on row locks, or similar). New threads are created if a blocking situation occurs (such as after a throttling interval), but sometimes you want to cap the number of threads, if you’re familiar with the application and need to, for example, save memory. If your application constantly pegs at 500 threads, it might be a strong indicator for high contention in the application, and the thread pool does not help much.

    Configuring Priority Scheduling

    It is possible to configure connection prioritization. The priority behavior is configured by the thread_pool_priority system variable.

    By default, if thread_pool_priority is set to auto, then queries would be given a higher priority, in case the current connection is inside a transaction. This allows the running transaction to finish faster, and has the effect of lowering the number of transactions running in parallel. The default setting will generally improve throughput for transactional workloads. But it is also possible to explicitly set the priority for the current connection to either 'high' or 'low'.

    There is also a mechanism in place to ensure that higher priority connections are not monopolizing the worker threads in the pool (which would cause indefinite delays for low priority connections). On Unix, low priority connections are put into the high priority queue after the timeout specified by the thread_pool_prio_kickup_timer system variable.

    Configuring the Extra Port

    MariaDB allows you to configure an extra port for administrative connections. This is primarily intended to be used in situations where all threads in the thread pool are blocked, and you still need a way to access the server. However, it can also be used to ensure that monitoring systems (including MaxScale's monitors) always have access to the system, even when all connections on the main port are used. This extra port uses the old one-thread-per-connection thread handling.

    You can enable this and configure a specific port by setting the extra_port system variable.

    You can configure a specific number of connections for this port by setting the extra_max_connections system variable.

    These system variables can be set in a server option group in an option file prior to starting up the server. For example:

    Once you have the extra port configured, you can use the mariadb client with the -P option to connect to the port.

    Monitoring Thread Pool Activity

    Currently there are two status variables exposed to monitor pool activity.

    Variable
    Description

    Number of threads in the thread pool. In rare cases, this can be slightly higher than , because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks.

    Number of inactive threads in the thread pool. Threads become inactive for various reasons, such as by waiting for new work. However, an inactive thread is not necessarily one that has not been assigned work. Threads are also considered inactive if they are being blocked while waiting on disk I/O, or while waiting on a lock, etc. This status variable is only meaningful on Unix.

    Thread Groups in the Unix Implementation of the Thread Pool

    On Unix, the thread pool implementation uses objects called thread groups to divide up client connections into many independent sets of threads. See Thread Groups in the Unix Implementation of the Thread Pool for more information.

    Fixing a Blocked Thread Pool

    When using global locks, even with a high value on the thread_pool_max_threads system variable, it is still possible to block the entire pool.

    Imagine the case where a client performs FLUSH TABLES WITH READ LOCK then pauses. If then the number of other clients connecting to the server to start write operations exceeds the maximum number of threads allowed in the pool, it can block the Server. This makes it impossible to issue the UNLOCK TABLES statement. It can also block MaxScale from monitoring the Server.

    To mitigate the issue, MariaDB allows you to configure an extra port for administrative connections. See Configuring the Extra Port for information on how to configure this.

    Once you have the extra port configured, you can use the mariadb client with the -P option to connect to the port.

    This ensures that your administrators can access the server in cases where the number of threads is already equal to the configured value of the thread_pool_max_threads system variable, and all threads are blocked. It also ensures that MaxScale can still access the server in such situations for monitoring information.

    Once you are connected to the extra port, you can solve the issue by increasing the value on the thread_pool_max_threads system variable, or by killing the offending connection, (that is, the connection that holds the global lock, which would be in the sleep state).

    Information Schema

    The following Information Schema tables relate to the thread pool:

    • Information Schema THREAD_POOL_GROUPS Table

    • Information Schema THREAD_POOL_QUEUES Table

    • Information Schema THREAD_POOL_STATS Table

    • Information Schema THREAD_POOL_WAITS Table

    MariaDB Thread Pool vs Oracle MySQL Enterprise Thread Pool

    Commercial editions of MySQL since 5.5 include an Oracle MySQL Enterprise thread pool implemented as a plugin, which delivers similar functionality. A detailed discussion about the design of the feature is at Mikael Ronstrom's blog. Here is the summary of similarities and differences, based on the above materials.

    Similarities

    • On Unix, both MariaDB and Oracle MySQL Enterprise Thread Pool will partition client connections into groups. The thread_pool_size parameter thus has the same meaning for both MySQL and MariaDB.

    • Both implementations use similar schema checking for thread stalls, and both have the same parameter name for thread_pool_stall_limit (though in MariaDB it is measured using millisecond units, not 10ms units like in Oracle MySQL).

    Differences

    • The Windows implementation is completely different – MariaDB's uses native Windows thread pooling, while Oracle's implementation includes a convenience function WSAPoll() (provided for convenience to port Unix applications). As a consequence of relying on WSAPoll(), Oracle's implementation does not work with named pipes and shared memory connections.

    • MariaDB uses the most efficient I/O multiplexing facilities for each operating system: Windows (the I/O completion port is used internally by the native thread pool), Linux (epoll), Solaris (event ports), FreeBSD and OSX (kevent). Oracle uses optimized I/O multiplexing only on Linux, with epoll, and uses poll() otherwise.

    • Unlike Oracle MySQL Enterprise Thread Pool, MariaDB's one is built in, not a plugin.

    MariaDB Thread Pool vs Percona Thread Pool

    Percona's implementation is a port of the MariaDB's thread pool with some added features. In particular, Percona added priority scheduling to its 5.5-5.7 releases. MariaDB and Percona priority scheduling works in a similar fashion, but there are some differences in details.

    • MariaDB's thread_pool_priority=auto,high, low correspond to Percona's thread_pool_high_prio_mode=transactions,statements,none.

    • Percona has a thread_pool_high_prio_tickets connection variable to allow every nth low priority query to be put into the high priority queue. MariaDB does not have a corresponding setting.

    • MariaDB has a thread_pool_prio_kickup_timer setting, which Percona does not have.

    Running Benchmarks

    When running sysbench and maybe other benchmarks, that create many threads on the same machine as the server, it is advisable to run benchmark driver and server on different CPUs to get the realistic results. Running lots of driver threads and only several server threads on the same CPUs will have the effect that OS scheduler will schedule benchmark driver threads to run with much higher probability than the server threads, that is driver will pre-empt the server. Use "taskset –c" on Linuxes, and "set /affinity" on Windows to separate benchmark driver and server CPUs, as the preferred method to fix this situation.

    A possible alternative on Unix (if taskset or a separate machine running the benchmark is not desired for some reason) would be to increase thread_pool_size to make the server threads more "competitive" against the client threads.

    When running sysbench, a good rule of thumb could be to give 1/4 of all CPUs to the sysbench, and 3/4 of CPUs to mariadbd. It is also good idea to run sysbench and mariadbd on different NUMA nodes, if possible.

    Notes

    The thread_cache_size system variable is not used when the thread pool is used and the Threads_cached status variable will have a value of 0.

    See Also

    • Thread Pool System and Status Variables

    This page is licensed: CC BY-SA / Gnu FDL

    optimizer_switch

    optimizer_switch is a server variable that one can use to enable/disable specific optimizations.

    Syntax

    To set or unset the various optimizations, use the following syntax:

    The cmd takes the following format:

    Syntax
    Description

    There is no need to list all flags - only those that are specified in the command will be affected.

    Available Flags

    Below is a list of all optimizer_switch flags available in MariaDB:

    Flag and MariaDB default
    Supported in MariaDB since

    Defaults

    From version
    Default optimizer_switch setting

    See Also

    This page is licensed: CC BY-SA / Gnu FDL

    FLOOR(UNIX_TIMESTAMP(dt) / 3600)
       FROM_UNIXTIME(hour * 3600)
    PRIMARY KEY(city, datetime),
       Aggregations: ct, sum_price
       
       # Core of INSERT..SELECT:
       DATE(datetime) AS DATE, city, COUNT(*) AS ct, SUM(price) AS sum_price
       
       # Reporting average price FOR last month, broken down BY city:
       SELECT city,
              SUM(sum_price) / SUM(ct) AS 'AveragePrice'
          FROM SalesSummary
          WHERE datetime BETWEEN ...
          GROUP BY city;
       
       # Monthly sales, nationwide, FROM same summary TABLE:
       SELECT MONTH(datetime) AS 'Month',
              SUM(ct)         AS 'TotalSalesCount'
              SUM(sum_price)  AS 'TotalDollars'
          FROM SalesSummary
          WHERE datetime BETWEEN ...
          GROUP BY MONTH(datetime);
       # This might benefit FROM a secondary INDEX(datetime)
    INSERT INTO Fact ...;
        INSERT INTO Summary (..., ct, foo, ...) VALUES (..., 1, foo, ...)
            ON DUPLICATE KEY UPDATE ct = ct+1, sum_foo = sum_foo + VALUES(foo), ...;
    FROM Fact
       WHERE id BETWEEN min_id AND max_id
    id INT UNSIGNED AUTO_INCREMENT NOT NULL,
       ...
       PRIMARY KEY(foo, dy, id),  -- `id` added to make unique
       INDEX(id)                  -- sufficient to keep AUTO_INCREMENT happy
    SQRT( SUM(sum_foo2)/SUM(ct) - POWER(SUM(sum_foo)/SUM(ct), 2) )
    # Prep FOR flip:
        CREATE TABLE new LIKE Staging;
    
        # Swap (flip) Staging tables:
        RENAME TABLE Staging TO old, new TO Staging;
    
        # Normalize new `foo`s:
        # (autocommit = 1)
        INSERT IGNORE INTO Foos SELECT fpp FROM old LEFT JOIN Foos ...
    
        # Prep FOR possible deadlocks, etc
        WHILE...
        START TRANSACTION;
    
        # ADD TO Fact:
        INSERT INTO Fact ... FROM old JOIN Foos ...
    
        # Summarize:
        INSERT INTO Summary ... FROM old ... GROUP BY ...
    
        COMMIT;
        end-WHILE
    
        # Cleanup:
        DROP TABLE old;
    CREATE TABLE Emails (
            email_id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,  -- don't make bigger than needed
            email VARCHAR(...) NOT NULL,
            PRIMARY KEY (email),  -- for looking up one way
            INDEX(email_id)  -- for looking up the other way (UNIQUE is not needed)
        ) ENGINE = InnoDB;  -- to get clustering
    INSERT IGNORE INTO Foos
            SELECT DISTINCT foo FROM Staging;  -- not wise
    INSERT IGNORE INTO Foos
            SELECT DISTINCT foo
                FROM Staging
                LEFT JOIN Foos ON Foos.foo = Staging.foo
                WHERE Foos.foo_id IS NULL;
    INSERT INTO Fact (..., foo_id, ...)
            SELECT ..., Foos.foo_id, ...
                FROM Staging
                JOIN Foos ON Foos.foo = Staging.foo;
    INSERT INTO Summary (dy, foo, ct, blah_total)
            SELECT  DATE(dt) AS dy, foo,
                    COUNT(*) AS ct, SUM(blah) AS blah_total)
                FROM Staging
                GROUP BY 1, 2;
    INSERT INTO Summary (dy, foo, ct, blah_total)
            ON DUPLICATE KEY UPDATE
                ct = ct + VALUE(ct),
                blah_total = blah_total + VALUE(bt)
            SELECT  DATE(dt) AS dy, foo,
                    COUNT(*) AS ct, SUM(blah) AS bt)
                FROM Staging
                GROUP BY 1, 2;
    # Normalize:
        $foo_id = SELECT foo_id FROM Foos WHERE foo = $foo;
        IF NO $foo_id, THEN
            INSERT IGNORE INTO Foos ...
    
        # Inserts:
        BEGIN;
            INSERT INTO Fact ...;
            INSERT INTO Summary ... ON DUPLICATE KEY UPDATE ...;
        COMMIT;
        # (plus code TO deal WITH errors ON INSERTs OR COMMIT)
    +------------------+----------------+------------+
    | province         | city           | population |
    +------------------+----------------+------------+
    | Saskatchewan     | Rosetown       |       2309 |
    | British Columbia | Chilliwack     |      51942 |
    | Nova Scotia      | Yarmouth       |       7500 |
    | Alberta          | Grande Prairie |      41463 |
    | Quebec           | Sorel          |      33591 |
    | Ontario          | Moose Factory  |       2060 |
    | Ontario          | Bracebridge    |       8238 |
    | British Columbia | Nanaimo        |      84906 |
    | Manitoba         | Neepawa        |       3151 |
    | Alberta          | Grimshaw       |       2560 |
    | Saskatchewan     | Carnduff       |        950 |
    ...
    +---------------------------+---------------+------------+
    | province                  | city          | population |
    +---------------------------+---------------+------------+
    | Alberta                   | Calgary       |     968475 |
    | British Columbia          | Vancouver     |    1837970 |
    | Manitoba                  | Winnipeg      |     632069 |
    | New Brunswick             | Saint John    |      87857 |
    | Newfoundland and Labrador | Corner Brook  |      18693 |
    | Northwest Territories     | Yellowknife   |      15866 |
    | Nova Scotia               | Halifax       |     266012 |
    | Nunavut                   | Iqaluit       |       6124 |
    | Ontario                   | Toronto       |    4612187 |
    | Prince Edward Island      | Charlottetown |      42403 |
    | Quebec                    | Montreal      |    3268513 |
    | Saskatchewan              | Saskatoon     |     198957 |
    | Yukon                     | Whitehorse    |      19616 |
    +---------------------------+---------------+------------+
    SELECT  c1.province, c1.city, c1.population
        FROM  Canada AS c1
        JOIN
          ( SELECT  province, MAX(population) AS population
                FROM  Canada
                GROUP BY  province
          ) AS c2 USING (province, population)
        ORDER BY c1.province;
    SELECT
            province, city, population   -- The desired columns
        FROM
          ( SELECT  @prev := '' ) init
        JOIN
          ( SELECT  province != @prev AS first,  -- `province` is the 'GROUP BY'
                    @prev := province,           -- The 'GROUP BY'
                    province, city, population   -- Also the desired columns
                FROM  Canada           -- The table
                ORDER BY
                    province,          -- The 'GROUP BY'
                    population DESC    -- ASC for MIN(population), DESC for MAX
          ) x
        WHERE  first
        ORDER BY  province;     -- Whatever you like
    SELECT  province, city, population
        FROM  Canada AS c1
        WHERE  population =
          ( SELECT  MAX(c2.population)
                FROM  Canada AS c2
                WHERE  c2.province= c1.province
          )
        ORDER BY  province;
    SELECT  c1.province, c1.city, c1.population
        FROM  Canada AS c1
        LEFT JOIN  Canada AS c2 ON c2.province = c1.province
          AND  c2.population > c1.population
        WHERE  c2.province IS NULL
        ORDER BY province;
    SELECT
            province, n, city, population
        FROM
          ( SELECT  @prev := '', @n := 0 ) init
        JOIN
          ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                    @prev := province,
                    province, city, population
                FROM  Canada
                ORDER BY
                    province   ASC,
                    population DESC
          ) x
        WHERE  n <= 3
        ORDER BY  province, n;
    +---------------------------+------+------------------+------------+
    | province                  | n    | city             | population |
    +---------------------------+------+------------------+------------+
    | Alberta                   |    1 | Calgary          |     968475 |
    | Alberta                   |    2 | Edmonton         |     822319 |
    | Alberta                   |    3 | Red Deer         |      73595 |
    | British Columbia          |    1 | Vancouver        |    1837970 |
    | British Columbia          |    2 | Victoria         |     289625 |
    | British Columbia          |    3 | Abbotsford       |     151685 |
    | Manitoba                  |    1 | ...
    +----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
    | id | select_type | table      | type   | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
    +----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
    |  1 | PRIMARY     | <derived2> | system | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using filesort |
    |  1 | PRIMARY     | <derived3> | ALL    | NULL          | NULL | NULL    | NULL | 5484 |   100.00 | Using where    |
    |  3 | DERIVED     | Canada     | ALL    | NULL          | NULL | NULL    | NULL | 5484 |   100.00 | Using filesort |
    |  2 | DERIVED     | NULL       | NULL   | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used |
    +----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
    | Handler_read_rnd           | 39    |
    | Handler_read_rnd_next      | 10971 |
    | Handler_write              | 5485  |  -- #rows in Canada (+1)
    SELECT  province, city, population
            FROM  Canada
            JOIN
              ( SELECT  GROUP_CONCAT(top_in_province) AS top_cities
                    FROM
                      ( SELECT  SUBSTRING_INDEX(
                                       GROUP_CONCAT(city ORDER BY  population DESC),
                                ',', 3) AS top_in_province
                            FROM  Canada
                            GROUP BY  province
                      ) AS x
              ) AS y
            WHERE  FIND_IN_SET(city, top_cities)
            ORDER BY  province, population DESC;
    | Alberta                   | Calgary          |     968475 |
    | Alberta                   | Edmonton         |     822319 |
    | Alberta                   | Red Deer         |      73595 |
    | British Columbia          | Vancouver        |    1837970 |
    | British Columbia          | Victoria         |     289625 |
    | British Columbia          | Abbotsford       |     151685 |
    | British Columbia          | Sydney           |          0 | -- Nova Scotia's second largest is Sydney
    | Manitoba                  | Winnipeg         |     632069 |
    | Handler_read_next          | 5484  | -- table size
    | Handler_read_rnd_next      | 5500  | -- table size + number of provinces
    | Handler_write              | 14    | -- number of provinces (+1)
    -- build tmp table to get numbering
        -- (Assumes auto_increment_increment = 1)
        CREATE TEMPORARY TABLE t (
            nth MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
            PRIMARY KEY(province, nth)
        ) ENGINE=MyISAM
            SELECT province, NULL AS nth, city, population
                FROM Canada
                ORDER BY population DESC;
        -- Output the biggest 3 cities in each province:
        SELECT province, nth, city, population
            FROM t
            WHERE nth <= 3
            ORDER BY province, nth;
    
    +---------------------------+-----+------------------+------------+
    | province                  | nth | city             | population |
    +---------------------------+-----+------------------+------------+
    | Alberta                   |   1 | Calgary          |     968475 |
    | Alberta                   |   2 | Edmonton         |     822319 |
    | Alberta                   |   3 | Red Deer         |      73595 |
    | British Columbia          |   1 | Vancouver        |    1837970 |
    | British Columbia          |   2 | Victoria         |     289625 |
    | British Columbia          |   3 | Abbotsford       |     151685 |
    | Manitoba                  |  ...
    
    SELECT FOR CREATE:
    +----+-------------+--------+------+---------------+------+---------+------+------+----------------+
    | id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra          |
    +----+-------------+--------+------+---------------+------+---------+------+------+----------------+
    |  1 | SIMPLE      | Canada | ALL  | NULL          | NULL | NULL    | NULL | 5484 | Using filesort |
    +----+-------------+--------+------+---------------+------+---------+------+------+----------------+
    Other SELECT:
    +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
    | id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
    +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
    |  1 | SIMPLE      | t     | index | NULL          | PRIMARY | 104     | NULL |   22 | Using where |
    +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
    | Handler_read_rnd_next      | 10970 |
    | Handler_write              | 5484  |  -- number of rows in Canada (write tmp table)
    [mariadb]
    ...
    thread_handling=pool-of-threads
    [mariadb]
    ...
    thread_handling=one-thread-per-connection
    [mariadb]
    ...
    extra_port = 8385
    extra_max_connections = 10
    $ mariadb -u root -P 8385 -p
    $ mariadb -u root -P 8385 -p
    SET [GLOBAL|SESSION] optimizer_switch='cmd[,cmd]...';

    default

    duplicateweedout=on

    engine_condition_pushdown=off

    (deprecated in , removed in )

    ()

    index_merge=on

    index_merge_intersection=on

    index_merge_sort_union=on

    index_merge_union=on#

    materialization=on (, )

    default

    Reset all optimizations to their default values.

    optimization_name=default

    Set the specified optimization to its default value.

    optimization_name=on

    Enable the specified optimization.

    optimization_name=off

    Disable the specified optimization.

    condition_pushdown_for_derived=on

    condition_pushdown_for_subquery=on

    condition_pushdown_from_having=on

    cset_narrowing=on/off

    MariaDB 10.6.16, MariaDB 10.11.6, , and

    derived_merge=on

    derived_with_keys=on

    MariaDB 12.0.1

    index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, duplicateweedout=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=on, sargable_casefold=on

    index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=on, sargable_casefold=on

    index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=off, sargable_casefold=on

    MariaDB 10.6.16, MariaDB 10.11.6, , and

    index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=off

    index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, engine_condition_pushdown=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on,condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on

    MariaDB 10.6.13, MariaDB 10.11.3

    index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, engine_condition_pushdown=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on,condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=off

    Quickly finding optimizer_switch values that are on or off
    The optimizer converts certain big IN predicates into IN subqueries
    optimizer_adjust_secondary_key_cost
    Optimizer hints in SELECT

    thread_pool_stall_limit
    thread_pool_max_threads
    Thread Groups in the Unix Implementation of the Thread Pool: Thread Group Stalls
    thread_pool_oversubscribe
    Thread Groups in the Unix Implementation of the Thread Pool: Thread Group Oversubscription
    thread_pool_idle_timeout
    Threadpool_threads
    thread_pool_max_threads
    Threadpool_idle_threads

    General Thread States

    This article documents the major general thread states. More specific lists related to delayed inserts, replication, the query cache and the event scheduler are listed in:

    • Event Scheduler Thread States

    • Query Cache Thread States

    • Master Thread States

    These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the

    Value
    Description

    This page is licensed: CC BY-SA / Gnu FDL

    Expanded New-Style Optimizer Hints

    New-style optimizer hints were introduced in MariaDB 12.0 and 12.1.

    Description

    Each individual hint is hint name and arguments. In case there are no arguments, the () parentheses are still present:

    converting HEAP to Aria

    Converting an internal temporary table into an on-disk temporary table.

    converting HEAP to MyISAM

    Converting an internal temporary table into an on-disk temporary table.

    copy to tmp table

    A new table has been created as part of an statement, and rows are about to be copied into it.

    Copying to group table

    Sorting the rows by group and copying to a temporary table, which occurs when a statement has different and criteria.

    Copying to tmp table

    Copying to a temporary table in memory.

    Copying to tmp table on disk

    Copying to a temporary table on disk, as the resultset is too large to fit into memory.

    Creating index

    Processing an for an or table.

    Creating sort index

    Processing a statement resolved using an internal temporary table.

    creating table

    Creating a table (temporary or non-temporary).

    Creating tmp table

    Creating a temporary table (in memory or on-disk).

    deleting from main table

    Deleting from the first table in a multi-table , saving columns and offsets for use in deleting from the other tables.

    deleting from reference tables

    Deleting matched rows from secondary reference tables as part of a multi-table .

    discard_or_import_tablespace

    Processing an or statement.

    end

    State before the final cleanup of an , , , , , or statement.

    executing

    Executing a statement.

    Execution of init_command

    Executing statements specified by the --init_command option.

    filling schema table

    A table in the database is being built.

    freeing items

    Freeing items from the after executing a command. Usually followed by the cleaning up state.

    Flushing tables

    Executing a statement and waiting for other threads to close their tables.

    FULLTEXT initialization

    Preparing to run a search. This includes running the fulltext search (MATCH ... AGAINST) and creating a list of the result in memory

    init

    About to initialize an , , , , or statement. Could be performaing cleanup, or flushing the or InnoDB log.

    Killed

    Thread will abort next time it checks the kill flag. Requires waiting for any locks to be released.

    Locked

    Query has been locked by another query.

    logging slow query

    Writing statement to the .

    NULL

    State used for .

    login

    Connection thread has not yet been authenticated.

    manage keys

    Enabling or disabling a table index.

    Opening table[s]

    Trying to open a table. Usually very quick unless the limit set by has been reached, or an or is in progress.

    optimizing

    Server is performing initial optimizations in for a query.

    preparing

    State occurring during query optimization.

    Purging old relay logs

    Relay logs that are no longer needed are being removed.

    query end

    Query has finished being processed, but items have not yet been freed (the freeing items state.

    Reading file

    Server is reading the file (for example during ).

    Reading from net

    Server is reading a network packet.

    Removing duplicates

    Duplicated rows being removed before sending to the client. This happens when SELECT DISTINCT is used in a way that the distinct operation could not be optimized at an earlier point.

    removing tmp table

    Removing an internal temporary table after processing a statement.

    rename

    Renaming a table.

    rename result table

    Renaming a table that results from an statement having created a new table.

    Reopen tables

    Table is being re-opened after thread obtained a lock but the underlying table structure had changed, so the lock was released.

    Repair by sorting

    Indexes are being created with the use of a sort. Much faster than the related Repair with keycache.

    Repair done

    Multi-threaded repair has been completed.

    Repair with keycache

    Indexes are being created through the key cache, one-by-one. Much slower than the related Repair by sorting.

    Rolling back

    A transaction is being rolled back.

    Saving state

    New table state is being saved. For example, after, analyzing a table, the key distributions, rowcount etc. are saved to the .MYI file.

    Searching rows for update

    Finding matching rows before performing an , which is needed when the UPDATE would change the index used for the UPDATE

    Sending data

    Sending data to the client as part of processing a statement or other statements that returns data like . Often the longest-occurring state as it also include all reading from tables and disk read activities. Where an aggregation or un-indexed filtering occurs there is significantly more rows read than what is sent to the client.

    setup

    Setting up an operation.

    Sorting for group

    Sorting as part of a

    Sorting for order

    Sorting as part of an

    Sorting index

    Sorting index pages as part of a table optimization operation.

    Sorting result

    Processing a statement using a non-temporary table.

    statistics

    Calculating statistics as part of deciding on a query execution plan. Usually a brief state unless the server is disk-bound.

    System lock

    Requesting or waiting for an external lock for a specific table. The determines what kind of external lock to use. For example, the storage engine uses file-based locks. However, MyISAM's external locks are disabled by default, due to the default value of the system variable. Transactional storage engines such as also register the transaction or statement with MariaDB's while in this thread state. See for more information about that.

    Table lock

    About to request a table's internal lock after acquiring the table's external lock. This thread state occurs after the System lock thread state.

    update

    About to start updating table.

    Updating

    Searching for and updating rows in a table.

    updating main table

    Updating the first table in a multi-table update, and saving columns and offsets for use in the other tables.

    updating reference tables

    Updating the secondary (reference) tables in a multi-table update

    updating status

    This state occurs after a query's execution is complete. If the query's execution time exceeds , then is incremented, and if the is enabled, then the query is logged. If the plugin is enabled, then the query is also logged into the audit log at this stage. If the plugin is enabled, then CPU statistics are also updated at this stage.

    User lock

    About to request or waiting for an advisory lock from a call. For , means requesting a lock only.

    User sleep

    A call has been invoked.

    Waiting for commit lock

    is waiting for a commit lock, or a statement resulting in an explicit or implicit commit is waiting for a read lock to be released. This state was called Waiting for all running commits to finish in earlier versions.

    Waiting for global read lock

    Waiting for a global read lock.

    Waiting for table level lock

    External lock acquired,and internal lock about to be requested. Occurs after the System lock state. In earlier versions, this was called Table lock.

    Waiting for xx lock

    Waiting to obtain a lock of type xx.

    Waiting on cond

    Waiting for an unspecified condition to occur.

    Writing to net

    Writing a packet to the network.

    After create

    The function that created (or tried to create) a table (temporary or non-temporary) has just ended.

    Analyzing

    Calculating table key distributions, such as when running an ANALYZE TABLE statement.

    checking permissions

    Checking to see whether the permissions are adequate to perform the statement.

    Checking table

    Checking the table.

    cleaning up

    Preparing to reset state variables and free memory after executing a command.

    closing tables

    Flushing the changes to disk and closing the table. This state will only persist if the disk is full or under extremely high load.

    Slave Connection Thread States
    Slave I/O Thread States
    Slave SQL Thread States
    SHOW PROCESSLIST
    Information Schema PROCESSLIST Table
    Performance Schema threads Table
    Incorrect hints produce warnings (a setting to make them errors is not implemented yet).

    Hints that are not ignored are kept in the query text (you can see them in SHOW PROCESSLIST, Slow Query Log, EXPLAIN EXTENDED). Hints that were incorrect and were ignored are removed from there.

    Hint Hierarchy

    Hints can be:

    • global - they apply to whole query;

    • table-level - they apply to a table;

    • index-level - they apply to an index in a table.

    Table-Level Hints

    Index-Level Hints

    Index-level hints apply to indexes. Possible syntax variants are:

    Effect of Optimizer Hints

    The optimizer can be controlled by

    1. server variables - optimizer_switch, join_cache_level, and so forth;

    2. old-style hints;

    3. new-style hints.

    Old-style hints do not overlap with server variable settings.

    New-style hints are more specific than server variable settings, so they override the server variable settings.

    Hints are "narrowly interpreted" and "best effort" - if a hint dictates to do something, for example:

    It means: When considering a query plan that involves using t1_index1 in a way that one can use MRR, use MRR. If the query planning is such that use of t1_index1 doesn't allow to use MRR, it won't be used.

    The optimizer may also consider using t1_index2 and pick that over using t1_index1. In such cases, the hint is effectively ignored and no warning is given.

    Query Block Naming

    The QB_NAME hint is used to assign a name to the query block the hint is in. The Query block is either a SELECT statement or a top-level construct of an UPDATE or DELETE statement.

    The name can then can be used

    • to refer to the query block;

    • to refer to a table in the query block as table_name@query_block_name.

    Query block scope is the whole statement. It is invalid to use the same name for multiple query blocks. You can refer to the query block "down into subquery", "down into derived table", "up to the parent" and "to a right sibling in the UNION". You cannot refer "to a left sibling in a UNION".

    Hints inside views are not supported, yet. You can neither use hints in VIEW definitions, nor control query plans inside non-merged views. (This is because QB_NAME binding are done "early", before we know that some tables are views.)

    SELECT#N NAMES

    Besides the given name, any query block is given a name select#n (where #n stands for a number). You can see it when running EXPLAIN EXTENDED:

    It is not possible to use it in the hint text:

    QB_NAME in CTEs

    Hints that control @name will control the first use of the CTE (common table expression).

    Available Expanded Optimizer Hints

    NO_ROWID_FILTER

    This hint is available from MariaDB 12.1.

    Does not consider ROWID filter for the scope of the hint (all tables in the query block, specific table, and specific indexes). See ROWID_FILTER for details.

    NO_SPLIT_MATERIALIZED

    This hint is available from MariaDB 12.1.

    When a derived table is materialized, MariaDB processes and stores the results of that derived table temporarily before joining it with other tables. The "lateral derived" optimization specifically looks for ways to optimize these types of derived tables. It does that by pushing a splitting condition down into the derived table, to limit the number of rows materialized into the derived table. The SPLIT_MATERIALIZED hint forces this behavior, while NO_SPLIT_MATERIALIZED prevents it.

    NO_SPLIT_MATERIALIZED(X) disables the use of split-materialized optimization in the context of X :

    ROWID_FILTER

    This hint is available from MariaDB 12.1.

    Like NO_RANGE_OPTIMIZATION or MRR, this hint can be applied to:

    • Query blocks — NO_ROWID_FILTER()

    • Table — NO_ROWID_FILTER(table_name)

    • Specific indexes — NO_ROWID_FILTER(table_name index1 index2 ...)

    Forces the use of ROWID_FILTER for the table index it targets:

    • For query blocks and tables, it enables the use of the ROWID filter, assuming it is disabled globally.

    • For indexes, it forces its use, regardless of the costs. The following query forces the use of the ROWID filter made from t1.idx1 if the chosen plan allows so (that is, if the access method to t1 allows it):

    Assuming the optimizer would pick idx2 for table t1 if the hint was not used, this could result in the usage of both idx2 and idx1 if the hint is used. That might become more expensive than a full table scan, or result in a change of the join order.

    Therefore, do not "blindly" use this filter, but rather make sure its use doesn't have a negative impact as described.

    SPLIT_MATERIALIZED

    This hint is available from MariaDB 12.1.

    When a derived table is materialized, MariaDB processes and stores the results of that derived table temporarily before joining it with other tables. The "lateral derived" optimization specifically looks for ways to optimize these types of derived tables. It does that by pushing a splitting condition down into the derived table, to limit the number of rows materialized into the derived table. The SPLIT_MATERIALIZED hint forces this behavior, while NO_SPLIT_MATERIALIZED prevents it.

    SPLIT_MATERIALIZED(X) enables and forces the use of split-materialized optimization in the context of X, unless it is impossible to do (for instance, because a table is not a materialized derived table).

    The following hints are available from MariaDB 12.0, unless indicated otherwise.

    Hints are placed after the main statement verb.

    They can also appear after the SELECT keyword in any subquery:

    There can be one or more hints separated with space:

    JOIN_INDEX and NO_JOIN_INDEX

    This hint is available from MariaDB 12.1.

    An index-level hint that enables or disables the specified indexes for an access method (range, ref, etc.). Equivalent to FORCE INDEX FOR JOIN and IGNORE INDEX FOR JOIN.

    GROUP_INDEX and NO_GROUP_INDEX

    This hint is available from MariaDB 12.1.

    An index-level hint that enables or disables the specified indexes for index scans for GROUP BY operations. Equivalent to FORCE INDEX FOR GROUP BY and IGNORE INDEX FOR GROUP BY.

    ORDER_INDEX and NO_ORDER_INDEX

    This hint is available from MariaDB 12.1.

    An index-level hint that enables or disables the specified indexes for sorting rows. Equivalent to FORCE INDEX FOR ORDER BY and IGNORE INDEX FOR ORDER BY.

    INDEX and NO_INDEX

    This hint is available from MariaDB 12.1.

    An index-level hint that enables or disables the specified indexes, for all scopes (join access method, GROUP BY, or sorting). Equivalent to FORCE INDEX and IGNORE INDEX.

    Syntax

    Behavior

    The hints operate by modifying the set of keys the optimizer considers for SELECT statements. The specific behavior depends on whether specific index keys are provided within the hint.

    INDEX_MERGE and NO_INDEX_MERGE

    This hint is available from MariaDB 12.2.

    The INDEX_MERGE and NO_INDEX_MERGE optimizer hints provide granular control over the optimizer's use of index merge strategies. They allow users to override the optimizer's cost-based calculations and global switch settings, to force or prevent the merging of indexes for specific tables.

    Syntax

    Behavior

    The hints operate by modifying the set of keys the optimizer considers for merge operations. The specific behavior depends on whether specific index keys are provided within the hint.

    INDEX_MERGE Hint

    This hint instructs the optimizer to employ an index merge strategy.

    • Without arguments: When specified as INDEX_MERGE(tbl), the optimizer considers all available keys for that table and selects the cheapest index merge combination.

    • With specific keys: When specified with keys, for instance, INDEX_MERGE(tbl key1, key2), the optimizer considers only the listed keys for the merge operation. All other keys are excluded from consideration for index merging.

    The INDEX_MERGE hint overrides the global optimizer_switch. Even if a specific strategy (such as index_merge_intersection) is disabled globally, the hint forces the optimizer to attempt the strategy using the specified keys.

    NO_INDEX_MERGE Hint

    This hint instructs the optimizer to avoid index merge strategies.

    • Without arguments: When specified as NO_INDEX_MERGE(tbl), index merge optimizations are completely disabled for the specified table.

    • With specific keys: When specified with keys, for instance, NO_INDEX_MERGE(tbl key1), the listed keys are excluded from consideration. The optimizer may still perform a merge using other available keys. However, if excluding the listed keys leaves insufficient row-ordered retrieval (ROR) scans available, no merge is performed.

    Algorithm Selection and Limitations

    While these hints control which keys are candidates for merging, they do not directly dictate the specific merge algorithm (Intersection, Union, or Sort-Union).

    • Indirect Control: You can influence the strategy indirectly by combining these hints with optimizer_switch settings, but specific algorithm selection is not guaranteed.

    • Invalid Hints: If a hint directs the optimizer to use specific indexes, but those indexes do not provide sufficient ROR scans to form a valid plan, the server is unable to honor the hint. in this scenario, the server emits a warning.

    Examples

    In the following examples, the index_merge_intersection switch is globally disabled. However, the INDEX_MERGE hint forces the optimizer to consider specific keys (f2 and f4), resulting in an intersection strategy.

    You can see that we disable intersection with NO_INDEX_MERGE for the following query and the behavior reflects in the EXPLAIN output. The query after that shows with the hint enabling merge–an intersection of f3,f4 is used. In the last example, a different intersection is used: f3,PRIMARY.

    No intersection (no merged indexes):

    Intersection of keys f3, f4:

    Intersection of keys PRIMARY, f3:

    NO_RANGE_OPTIMIZATION

    This hint is available from MariaDB 12.1.

    An index-level hint that disables range optimization for certain index(es):

    NO_ICP

    This hint is available from MariaDB 12.0.

    An index-level hint that disables Index Condition Pushdown for the indexes. ICP+BKA is disabled as well.

    MRR and NO_MRR

    This hint is available from MariaDB 12.0.

    Index-level hints to force or disable use of MRR.

    This controls:

    • MRR optimization for range access;

    • BKA.

    BKA() and NO_BKA()

    This hint is available from MariaDB 12.0.

    Query block or table-level hints.

    BKA() also enables MRR to make BKA possible. (This is different from session variables, where you need to enable MRR separately). This also enables BKAH.

    BNL() and NO_BNL()

    This hint is available from MariaDB 12.0.

    Controls BNL-H.

    The implementation is "BNL() hint effectively increases join_cache_level up to 4 " .. for the table(s) it applies to.

    MAX_EXECUTION_TIME()

    This hint is available from MariaDB 12.0.

    Global-level hint to limit query execution time

    A query that doesn't finish in the time specified will be aborted with an error.

    If @@max_statement_time is set, the hint will be ignored and a warning produced. Note that this contradicts the stated principle that "new-style hints are more specific than server variable settings, so they override the server variable settings".

    SPLIT_MATERIALIZED(X) and NO_SPLIT_MATERIALIZED(X)

    This hint is available from MariaDB 12.1.

    Enables or disables the use of the Split Materialized Optimization (also called the Lateral Derived Optimization).

    DERIVED_CONDITION_PUSHDOWN and NO_DERIVED_CONDITION_PUSHDOWN

    This hint is available from MariaDB 12.1.

    Enables or disables the use of condition pushdown for derived tables.

    MERGE and NO_MERGE

    This hint is available from MariaDB 12.1.

    Table-level hint that enables the use of merging, or disables and uses materialization, for the specified tables, views or common table expressions.

    SUBQUERY

    This hint is available from MariaDB 12.0.

    Query block-level hint.

    This controls non-semi-join subqueries. The parameter specifies which subquery to use. Use of this hint disables conversion of subquery into semi-join.

    For details, see the Subquery Hints section.

    SEMIJOIN and NO_SEMIJOIN

    This hint is available from MariaDB 12.0.

    Query block-level hints.

    This controls the conversion of subqueries to semi-joins and which semi-join strategies are allowed.

    where the strategy is one of DUPSWEEDOUT, FIRSTMATCH, LOOSESCAN, MATERIALIZATION.

    Hints are placed after the main statement verb.

    They can also appear after the SELECT keyword in any subquery:

    There can be one or more hints separated with space:

    Join Order Hints

    Join order hints are available from MariaDB 12.0.

    Syntax of the JOIN_FIXED_ORDER hint:

    Syntax of other join-order hints:

    Available Join Order Hints

    For the following join order hint syntax,

    • tbl is the name of a table used in the statement. A hint that names tables applies to all tables that it names. The JOIN_FIXED_ORDER hint names no tables and applies to all tables in the FROM clause of the query block in which it occurs;

    • query_block_name is the query block to which the hint applies. If the hint includes no leading @query_block_name, it applies to the query block in which it occurs. When using the tbl@query_block_name syntax, the hint applies to the named table in the named query block. To assign a name to a query block, see .

    General notes:

    • If a table has an alias, hints must refer to the alias, not the table name.

    • Table names in hints cannot be qualified with schema names.

    JOIN_FIXED_ORDER([@query_block_name])

    Forces the optimizer to join tables using the order in which they appear in the FROM clause. This is the same as specifying SELECT STRAIGHT_JOIN.

    JOIN_ORDER([@query_block_name] tbl [, tbl] ...)

    Instructs the optimizer to join tables using the specified table order. The hint applies to the named tables. The optimizer may place tables that are not named anywhere in the join order, including between specified tables.

    • Alternative syntax: JOIN_ORDER(tbl[@query_block_name] [, tbl[@query_block_name]] ...)

    JOIN_PREFIX([@query_block_name] tbl [, tbl] ...)

    Instructs the optimizer to join tables using the specified table order for the first tables of the join execution plan. The hint applies to the named tables. The optimizer places all other tables after the named tables.

    • Alternative syntax: JOIN_PREFIX(tbl[@query_block_name] [, tbl[@query_block_name]] ...)

    JOIN_SUFFIX([@query_block_name] tbl [, tbl] ...)

    Instructs the optimizer to join tables using the specified table order for the last tables of the join execution plan. The hint applies to the named tables. The optimizer places all other tables before the named tables.

    Subquery Hints

    Subquery hints are available from MariaDB 12.0.

    Overview

    Subquery hints determine:

    • If semijoin transformations are to be used;

    • Which semijoin strategies are permitted;

    • When semijoins are not used, whether to use subquery materialization or IN-to-EXISTS transformations.

    Syntax

    • hint_name: The following hint names are permitted to enable or disable the named semijoin strategies: SEMIJOIN, NO_SEMIJOIN.

    • strategy: Enable or disable a semi-join strategy. The following strategy names are permitted: DUPSWEEDOUT, FIRSTMATCH, LOOSESCAN, MATERIALIZATION.

    Strategies

    For SEMIJOIN hints, if no strategies are named, semi-join is used based on the strategies enabled according to the optimizer_switch system variable, if possible. If strategies are named, but inapplicable for the statement, DUPSWEEDOUT is used.

    For NO_SEMIJOIN hints, semi-join is not used if no strategies are named. If named strategies rule out all applicable strategies for the statement, DUPSWEEDOUT is used.

    If a subquery is nested within another, and both are merged into a semi-join of an outer query, any specification of semi-join strategies for the innermost query are ignored. SEMIJOIN and NO_SEMIJOIN hints can still be used to enable or disable semi-join transformations for such nested subqueries.

    If DUPSWEEDOUT is disabled, the optimizer may generate a query plan that is far from optimal.

    Examples

    Syntax of hints that affect whether to use subquery materialization or IN-to-EXISTS transformations:

    The hint name is always SUBQUERY.

    For SUBQUERY hints, these strategy values are permitted: INTOEXISTS, MATERIALIZATION.

    For semi-join and SUBQUERY hints, a leading @query_block_name specifies the query block to which the hint applies. If the hint includes no leading @query_block_name, the hint applies to the query block in which it occurs. To assign a name to a query block, see Naming Query Blocks.

    If a hint comment contains multiple subquery hints, the first is used. If there are other following hints of that type, they produce a warning. Following hints of other types are silently ignored.

    MariaDB 12.0
    exists_to_in=on
    firstmatch=on
    index_condition_pushdown=on
    hash_join_cardinality=off
    MariaDB 10.6.13
    MDEV-30812
    index_merge_sort_intersection=off
    in_to_exists=on
    loosescan=on
    semi-join
    non-semi-join
    mrr=off
    mrr_cost_based=off
    mrr_sort_keys=off
    not_null_range_scan=off
    orderby_uses_equalities=on
    partial_match_rowid_merge=on
    partial_match_table_scan=on
    rowid_filter=on
    sargable_casefold=on
    semijoin=on
    split_materialized=on
    subquery_cache=on
    table_elimination=on

    Query Cache

    The query cache stores results of SELECT queries so that if the identical query is received in future, the results can be quickly returned.

    This is extremely useful in high-read, low-write environments (such as most websites). It does not scale well in environments with high throughput on multi-core machines, so it is disabled by default.

    Note that the query cache cannot be enabled in certain environments. See .

    Setting Up the Query Cache

    Unless MariaDB has been specifically built without the query cache, the query cache will always be available, although inactive. The server variable will show whether the query cache is available.

    hint:  hint_name([arguments])
    hint_name([table_name [table_name [,...]] )
    hint_name(table_name [index_name [, index_name] ...])
    
    hint_name(table_name@query_block [index_name [, index_name] ...])
    
    hint_name(@query_block  table_name [index_name [, index_name] ...])
    SELECT  /*+ MRR(t1 t1_index1) */  ... FROM t1 ...
    SELECT /*+ QB_NAME(foo) */ select_list FROM ...
    Note 1003 SELECT /*+ NO_RANGE_OPTIMIZATION(t3@select#1 PRIMARY) */ ...
    SELECT /*+ BKA(tbl1@`select#1`) */ 1 FROM tbl1 ...;
    /* +NO_ROWID_FILTER([table_name [index_name [ ... ] ]] ) */
    SELECT
      /*+ NO_SPLIT_MATERIALIZED(CUST_TOTALS) */
      ...
    FROM
      customer
      (SELECT SUM(amount), o_custkey FROM orders GROUP BY o_custkey) as CUST_TOTALS
    WHERE
       customer.c_custkey= o_custkey AND
       customer.country='FI';
    /* +ROWID_FILTER( [table_name [index_name [ ...] ]]) */
    SELECT /*+ ROWID_FILTER(t1 idx1) */
    ...
    SELECT
      /*+ SPLIT_MATERIALIZED(CUST_TOTALS) */
      ...
    FROM
      customer
      (SELECT SUM(amount), o_custkey FROM orders GROUP BY o_custkey) as CUST_TOTALS
    WHERE
       customer.c_custkey= o_custkey AND
       customer.country='FI';
    UPDATE /*+ hints */ table ...;
    DELETE /*+ hints */ FROM table... ;
    SELECT /*+ hints */  ...
    SELECT * FROM t1 WHERE a IN (SELECT /*+ hints */ ...)
    hints:  hint hint ...
    /*+ INDEX(table_name [index_name, ...]) */
    /*+ NO_INDEX(table_name [index_name, ...]) */
    /*+ INDEX_MERGE(table_name [index_name, ...]) */
    /*+ NO_INDEX_MERGE(table_name [index_name, ...]) */
    MariaDB [test]> EXPLAIN SELECT /*+ NO_INDEX_MERGE(t1 f2, f4, f3) */ COUNT(*) FROM t1 WHERE f4 = 'h' AND f3 = 'b' AND f5 = 'i'\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: t1
             type: ref
    possible_keys: PRIMARY,f3,f4
              key: f3
          key_len: 9
              ref: const,const
             rows: 1
            Extra: Using index condition; Using where
    1 row in set (0.009 sec)
    MariaDB [test]> EXPLAIN SELECT /*+ INDEX_MERGE(t1 f2, f4, f3) */ COUNT(*) FROM t1 WHERE f4 = 'h' AND f3 = 'b' AND f5 = 'i'\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: t1
             type: index_merge
    possible_keys: PRIMARY,f3,f4
              key: f3,f4
          key_len: 9,9
              ref: NULL
             rows: 1
            Extra: Using intersect(f3,f4); Using where; Using index
    1 row in set (0.010 sec)
    MariaDB [test]> EXPLAIN SELECT COUNT(*) FROM t1 WHERE f4 = 'h' AND f3 = 'b' AND f5 = 'i'\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: t1
             type: index_merge
    possible_keys: PRIMARY,f3,f4
              key: f3,PRIMARY
          key_len: 9,4
              ref: NULL
             rows: 1
            Extra: Using intersect(f3,PRIMARY); Using where
    1 row in set (0.006 sec)
    SELECT /*+ NO_RANGE_OPTIMIZATION(tbl index1 index2) */  * FROM tbl ...
    SELECT /*+ NO_ICP(tbl index1 index2) */  * FROM tbl ...
    SELECT /*+ MRR(tbl index1 index2) */  * FROM tbl ... 
    
    SELECT /*+ NO_MRR(tbl index1 index2) */  * FROM tbl ...
    SELECT /*+ MAX_EXECUTION_TIME(milliseconds) */ ...  ;
    SUBQUERY([@query_block_name] MATERIALIZATION)
    
    SUBQUERY([@query_block_name] INTOEXISTS)
    [NO_]SEMIJOIN([@query_block_name] [strategy [, strategy] ...])
    UPDATE /*+ hints */ table ...;
    DELETE /*+ hints */ FROM table... ;
    SELECT /*+ hints */  ...
    SELECT * FROM t1 WHERE a IN (SELECT /*+ hints */ ...)
    hints:  hint hint ...
    hint_name([@query_block_name])
    hint_name([@query_block_name] tbl_name [, tbl_name] ...)
    hint_name(tbl_name[@query_block_name] [, tbl_name[@query_block_name]] ...)
    hint_name([@query_block_name] [strategy [, strategy] ...])
    SELECT /*+ NO_SEMIJOIN(@subquery1 FIRSTMATCH, LOOSESCAN) */ * FROM t2
      WHERE t2.a IN (SELECT /*+ QB_NAME(subq1) */ a FROM t3);
    SELECT /*+ SEMIJOIN(@subquery1 MATERIALIZATION, DUPSWEEDOUT) */ * FROM t2
      WHERE t2.a IN (SELECT /*+ QB_NAME(subquery1) */ a FROM t3);
    SUBQUERY([@query_block_name] strategy)
    SELECT id, a IN (SELECT /*+ SUBQUERY(MATERIALIZATION) */ a FROM t1) FROM t2;
    SELECT * FROM t2 WHERE t2.a IN (SELECT /*+ SUBQUERY(INTOEXISTS) */ a FROM t1);
    Optimizer Hints for Naming Query Blocks
    MEMORY
    Aria
    MEMORY
    MyISAM
    ALTER TABLE
    GROUP BY
    ORDER BY
    ALTER TABLE ... ENABLE KEYS
    Aria
    MyISAM
    SELECT
    delete
    delete
    ALTER TABLE ... IMPORT TABLESPACE
    ALTER TABLE ... DISCARD TABLESPACE
    ALTER TABLE
    CREATE VIEW
    DELETE
    INSERT
    SELECT
    UPDATE
    mariadb client
    information_schema
    query cache
    FLUSH TABLES
    full-text
    ALTER TABLE
    DELETE
    INSERT
    SELECT
    UPDATE
    query cache
    binary log
    slow query log
    SHOW PROCESSLIST
    table_open_cache
    ALTER TABLE
    LOCK TABLE
    LOAD DATA INFILE
    SELECT
    ALTER TABLE
    MyISAM
    UPDATE
    SELECT
    INSERT ... RETURNING
    ALTER TABLE
    GROUP BY
    ORDER BY
    SELECT
    storage engine
    MyISAM
    skip_external_locking
    InnoDB
    transaction coordinator
    MDEV-19391
    long_query_time
    Slow_queries
    slow query log
    SERVER_AUDIT
    userstats
    GET LOCK()
    SHOW PROFILE
    SLEEP()
    FLUSH TABLES WITH READ LOCK
    If this is set to NO, you cannot enable the query cache unless you rebuild or reinstall a version of MariaDB with the cache available.

    To see if the cache is enabled, view the query_cache_type server variable. It is disabled by default — enable it by setting query_cache_type to 1 :

    The query_cache_size is set to 1MB by default. Set the cache to a larger size if needed, for example:

    The query_cache_type is automatically set to ON if the server is started with the query_cache_size set to a non-zero (and non-default) value.

    See Limiting the size of the Query Cache below for details.

    How the Query Cache Works

    When the query cache is enabled and a new SELECT query is processed, the query cache is examined to see if the query appears in the cache.

    Queries are considered identical if they use the same database, same protocol version and same default character set. Prepared statements are always considered as different to non-prepared statements, see Query cache internal structure for more info.

    If the identical query is not found in the cache, the query will be processed normally and then stored, along with its result set, in the query cache. If the query is found in the cache, the results will be pulled from the cache, which is much quicker than processing it normally.

    Queries are examined in a case-sensitive manner, so :

    Is different from :

    Comments are also considered and can make the queries differ, so :

    Is different from :

    See the query_cache_strip_comments server variable for an option to strip comments before searching.

    Each time changes are made to the data in a table, all affected results in the query cache are cleared. It is not possible to retrieve stale data from the query cache.

    When the space allocated to query cache is exhausted, the oldest results will be dropped from the cache.

    When using query_cache_type=ON, and the query specifies SQL_NO_CACHE (case-insensitive), the server will not cache the query and will not fetch results from the query cache.

    When using query_cache_type=DEMAND and the query specifies SQL_CACHE, the server will cache the query.

    Queries Stored in the Query Cache

    If the query_cache_type system variable is set to 1, or ON, all queries fitting the size constraints will be stored in the cache unless they contain a SQL_NO_CACHE clause, or are of a nature that caching makes no sense, for example making use of a function that returns the current time. Queries with SQL_NO_CACHE will not attempt to acquire query cache lock.

    If any of the following functions are present in a query, it will not be cached. Queries with these functions are sometimes called 'non-deterministic' — don't get confused with the use of this term in other contexts.

    (one parameter)

    A query will also not be added to the cache if:

    • It is of the form:

      • SELECT SQL_NO_CACHE ...

      • SELECT ... INTO OUTFILE ...

      • SELECT ... INTO DUMPFILE ...

      • SELECT ... FOR UPDATE

      • SELECT * FROM ... WHERE autoincrement_column IS NULL

      • SELECT ... LOCK IN SHARE MODE

    • It uses TEMPORARY table

    • It uses no tables at all

    • It generates a warning

    • The user has a column-level privilege on any table in the query

    • It accesses a table from INFORMATION_SCHEMA, mysql or the performance_schema database

    • It makes use of user or local variables

    • It makes use of stored functions

    • It makes use of user-defined functions

    • It is inside a transaction with the SERIALIZABLE isolation level

    • It is quering a table inside a transaction after the same table executed a query cache invalidation using INSERT, UPDATE or DELETE

    The query itself can also specify that it is not to be stored in the cache by using the SQL_NO_CACHE attribute. Query-level control is an effective way to use the cache more optimally.

    It is also possible to specify that no queries must be stored in the cache unless the query requires it. To do this, the query_cache_type server variable must be set to 2, or DEMAND. Then, only queries with the SQL_CACHE attribute are cached.

    Limiting the Size of the Query Cache

    There are two main ways to limit the size of the query cache. First, the overall size in bytes is determined by the query_cache_size server variable. About 40KB is needed for various query cache structures.

    The query cache size is allocated in 1024 byte-blocks, thus it should be set to a multiple of 1024.

    The query result is stored using a minimum block size of query_cache_min_res_unit. Check two conditions to use a good value of this variable: Query cache insert result blocks with locks, each new block insert lock query cache, a small value will increase locks and fragmentation and waste less memory for small results, a big value will increase memory use wasting more memory for small results but it reduce locks. Test with your workload for fine tune this variable.

    If the strict mode is enabled, setting the query cache size to an invalid value will cause an error. Otherwise, it will be set to the nearest permitted value, and a warning will be triggered.

    The ideal size of the query cache is very dependent on the specific needs of each system. Setting a value too small will result in query results being dropped from the cache when they could potentially be re-used later. Setting a value too high could result in reduced performance due to lock contention, as the query cache is locked during updates.

    The second way to limit the cache is to have a maximum size for each set of query results. This prevents a single query with a huge result set taking up most of the available memory and knocking a large number of smaller queries out of the cache. This is determined by the query_cache_limit server variable.

    If you attempt to set a query cache that is too small (the amount depends on the architecture), the resizing will fail and the query cache will be set to zero, for example :

    Examining the Query Cache

    A number of status variables provide information about the query cache.

    Qcache_inserts contains the number of queries added to the query cache, Qcache_hits contains the number of queries that have made use of the query cache, while Qcache_lowmem_prunes contains the number of queries that were dropped from the cache due to lack of memory.

    The above example could indicate a poorly performing cache. More queries have been added, and more queries have been dropped, than have actually been used.

    Results returned by the query cache count towards Com_select (see MDEV-4981).

    The QUERY_CACHE_INFO plugin creates the QUERY_CACHE_INFO table in the INFORMATION_SCHEMA, allowing you to examine the contents of the query cache.

    Query Cache Fragmentation

    The Query Cache uses blocks of variable length, and over time may become fragmented. A high Qcache_free_blocks relative to Qcache_total_blocks may indicate fragmentation. FLUSH QUERY CACHE will defragment the query cache without dropping any queries :

    After this, there will only be one free block :

    Emptying and disabling the Query Cache

    To empty or clear all results from the query cache, use RESET QUERY CACHE. FLUSH TABLES will have the same effect.

    Setting either query_cache_type or query_cache_size to 0 will disable the query cache, but to free up the most resources, set both to 0 when you wish to disable caching.

    Limitations

    • The query cache needs to be disabled in order to use OQGRAPH.

    • The query cache is not used by the Spider storage engine (amongst others).

    LOCK TABLES and the Query Cache

    The query cache can be used when tables have a write lock (which may seem confusing since write locks should avoid table reads). This behaviour can be changed by setting the query_cache_wlock_invalidate system variable to ON, in which case each write lock will invalidate the table query cache. Setting to OFF, the default, means that cached queries can be returned even when a table lock is being held. For example:

    Transactions and the Query Cache

    The query cache handles transactions. Internally a flag (FLAGS_IN_TRANS) is set to 0 when a query was executed outside a transaction, and to 1 when the query was inside a transaction (begin / COMMIT / ROLLBACK). This flag is part of the "query cache hash", in others words one query inside a transaction is different from a query outside a transaction.

    Queries that change rows (INSERT / UPDATE / DELETE / TRUNCATE) inside a transaction will invalidate all queries from the table, and turn off the query cache to the changed table. Transactions that don't end with COMMIT / ROLLBACK check that even without COMMIT / ROLLBACK, the query cache is turned off to allow row level locking and consistency level.

    Examples:

    Query Cache Internal Structure

    Internally, each flag that can change a result using the same query is a different query. For example, using the latin1 charset and using the utf8 charset with the same query are treated as different queries by the query cache.

    Some fields that differentiate queries are (from "Query_cache_query_flags" internal structure) :

    • query (string)

    • current database schema name (string)

    • client long flag (0/1)

    • client protocol 4.1 (0/1)

    • protocol type (internal value)

    • more results exists (protocol flag)

    • in trans (inside transaction or not)

    • autocommit ( session variable)

    • pkt_nr (protocol flag)

    • character set client ( session variable)

    • character set results ( session variable)

    • collation connection ( session variable)

    • limit ( session variable)

    • time zone ( session variable)

    • sql_mode ( session variable)

    • max_sort_length ( session variable)

    • group_concat_max_len ( session variable)

    • default_week_format ( session variable)

    • div_precision_increment ( session variable)

    • lc_time_names ( session variable)

    Timeout and Mutex Contention

    When searching for a query inside the query cache, a try_lock function waits with a timeout of 50ms. If the lock fails, the query isn't executed via the query cache. This timeout is hard-coded (MDEV-6766 include two variables to tune this timeout).

    From the sql_cache.cc, function "try_lock" using TIMEOUT :

    When inserting a query inside the query cache or aborting a query cache insert (using the KILL command for example), a try_lock function waits until the query cache returns; no timeout is used in this case.

    When two processes execute the same query, only the last process stores the query result. All other processes increase the Qcache_not_cached status variable.

    SQL_NO_CACHE and SQL_CACHE

    There are two aspects to the query cache: placing a query in the cache, and retrieving it from the cache.

    1. Adding a query to the query cache. This is done automatically for cacheable queries (see (Queries Stored in the Query Cache) when the query_cache_type system variable is set to 1, or ON and the query contains no SQL_NO_CACHE clause, or when the query_cache_type system variable is set to 2, or DEMAND, and the query contains the SQL_CACHE clause.

    2. Retrieving a query from the cache. This is done after the server receives the query and before the query parser. In this case one point should be considered:

    When using SQL_NO_CACHE, it should be after the first SELECT hint:

    Don't use it like this:

    The second query will be checked. The query cache only checks if SQL_NO_CACHE/SQL_CACHE exists after the first SELECT. (More info at MDEV-6631)

    This page is licensed: CC BY-SA / Gnu FDL

    Limitations
    have_query_cache

    Building the best INDEX for a given SELECT

    The problem

    You have a SELECT and you want to build the best INDEX for it. This blog is a "cookbook" on how to do that task.

    • A short algorithm that works for many simpler SELECTs and helps in complex queries.

    • Examples of the algorithm, plus digressions into exceptions and variants

    • Finally a long list of "other cases".

    The hope is that a newbie can quickly get up to speed, and his/her INDEXes will no longer smack of "newbie".

    Many edge cases are explained, so even an expert may find something useful here.

    Algorithm

    Here's the way to approach creating an INDEX, given a SELECT. Follow the steps below, gathering columns to put in the INDEX in order. When the steps give out, you usually have the 'perfect' index.

    1. Given a WHERE with a bunch of expressions connected by AND: Include the columns (if any), in any order, that are compared to a constant and not hidden in a function.

    2. You get one more chance to add to the INDEX; do the first of these that applies:

    • 2a. One column used in a 'range' -- BETWEEN, '>', LIKE w/o leading wildcard, etc.

    • 2b. All columns, in order, of the GROUP BY.

    • 2c. All columns, in order, of the ORDER BY if there is no mixing of ASC and DESC.

    Digression

    This blog assumes you know the basic idea behind having an INDEX. Here is a refresher on some of the key points.

    Virtually all INDEXes in MySQL are structured as BTrees BTrees allow very efficient for

    • Given a key, find the corresponding row(s);

    • "Range scans" -- That is start at one value for the key and repeatedly find the "next" (or "previous") row.

    A PRIMARY KEY is a UNIQUE KEY; a UNIQUE KEY is an INDEX. ("KEY" == "INDEX".)

    InnoDB "clusters" the PRIMARY KEY with the data. Hence, given the value of the PK ("PRIMARY KEY"), after drilling down the BTree to find the index entry, you have all the columns of the row when you get there. A "secondary key" (any UNIQUE or INDEX other than the PK) in InnoDB first drills down the BTree for the secondary index, where it finds a copy of the PK. Then it drills down the PK to find the row.

    Every InnoDB table has a PRIMARY KEY. While there is a default if you do not specify one, it is best to explicitly provide a PK.

    For completeness: MyISAM works differently. All indexes (including the PK) are in separate BTrees. The leaf node of such BTrees have a pointer (usually a byte offset) into the data file.

    All discussion here assumes InnoDB tables, however most statements apply to other Engines.

    First, some examples

    Think of a list of names, sorted by last_name, then first_name. You have undoubtedly seen such lists, and they often have other information such as address and phone number. Suppose you wanted to look me up. If you remember my full name ('James' and 'Rick'), it is easy to find my entry. If you remembered only my last name ('James') and first initial ('R'). You would quickly zoom in on the Jameses and find the Rs in them. There, you might remember 'Rick' and ignore 'Ronald'. But, suppose you remembered my first name ('Rick') and only my last initial ('J'). Now you are in trouble. You would be scanning all the Js -- Jones, Rick; Johnson, Rick; Jamison, Rick; etc, etc. That's much less efficient.

    Those equate to

    Think about this example as I talk about "=" versus "range" in the Algorithm, below.

    Algorithm, step 1 (WHERE "column = const")

    • WHERE aaa = 123 AND ... : an INDEX starting with aaa is good.

    • WHERE aaa = 123 AND bbb = 456 AND ... : an INDEX starting with aaa and bbb is good. In this case, it does not matter whether aaa or bbb comes first in the INDEX.

    • xxx IS NULL : this acts like "= const" for this discussion.

    Note that the expression must be of the form of column_name = (constant). These do not apply to this step in the Algorithm: DATE(dt) = '...', LOWER(s) = '...', CAST(s ...) = '...', x='...' COLLATE...

    (If there are no "=" parts AND'd in the WHERE clause, move on to step 2 without any columns in your putative INDEX.)

    Algorithm, step 2

    Find the first of 2a / 2b / 2c that applies; use it; then quit. If none apply, then you are through gathering columns for the index.

    In some cases it is optimal to do step 1 (all equals) plus step 2c (ORDER BY).

    Algorithm, step 2a (one range)

    A "range" shows up as

    • aaa >= 123 -- any of <, <=, >=, >; but not <>, !=

    • aaa BETWEEN 22 AND 44

    • sss LIKE 'blah%' -- but not sss LIKE '%blah'

    If there are more parts to the WHERE clause, you must stop now.

    Complete examples (assume nothing else comes after the snippet)

    • WHERE aaa >= 123 AND bbb = 1 ⇒ INDEX(bbb, aaa) (WHERE order does not matter; INDEX order does)

    • WHERE aaa >= 123 ⇒ INDEX(aaa)

    • WHERE aaa >= 123 AND ccc > 'xyz' ⇒ INDEX(aaa) or INDEX(ccc) (only one range)

    Algorithm, step 2b (GROUP BY)

    If there is a GROUP BY, all the columns of the GROUP BY should now be added, in the specified order, to the INDEX you are building. (I do not know what happens if one of the columns is already in the INDEX.)

    If you are GROUPing BY an expression (including function calls), you cannot use the GROUP BY; stop.

    Complete examples (assume nothing else comes after the snippet)

    • WHERE aaa = 123 AND bbb = 1 GROUP BY ccc ⇒ INDEX(bbb, aaa, ccc) or INDEX(aaa, bbb, ccc) (='s first, in any order; then the GROUP BY)

    • WHERE aaa >= 123 GROUP BY xxx ⇒ INDEX(aaa) (You should have stopped with Step 2a)

    • GROUP BY x,y ⇒ INDEX(x,y) (no WHERE)

    Algorithm, step 2c (ORDER BY)

    If there is a ORDER BY, all the columns of the ORDER BY should now be added, in the specified order, to the INDEX you are building.

    If there are multiple columns in the ORDER BY, and there is a mixture of ASC and DESC, do not add the ORDER BY columns; they won't help; stop.

    If you are ORDERing BY an expression (including function calls), you cannot use the ORDER BY; stop.

    Complete examples (assume nothing else comes after the snippet)

    • WHERE aaa = 123 GROUP BY ccc ORDER BY ddd ⇒ INDEX(aaa, ccc) -- should have stopped with Step 2b

    • WHERE aaa = 123 GROUP BY ccc ORDER BY ccc ⇒ INDEX(aaa, ccc) -- the ccc will be used for both GROUP BY and ORDER BY

    • WHERE aaa = 123 ORDER BY xxx ASC, yyy DESC ⇒ INDEX(aaa) -- mixture of ASC and DESC.

    The following are especially good. Normally a LIMIT cannot be applied until after lots of rows are gathered and then sorted according to the ORDER BY. But, if the INDEX gets all they way through the ORDER BY, only (OFFSET + LIMIT) rows need to be gathered. So, in these cases, you win the lottery with your new index:

    • WHERE aaa = 123 GROUP BY ccc ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)

    • WHERE aaa = 123 ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)

    • ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc)

    (It does not make much sense to have a LIMIT without an ORDER BY, so I do not discuss that case.)

    Algorithm end

    You have collected a few columns; put them in INDEX and ADD that to the table. That will often produce a "good" index for the SELECT you have. Below are some other suggestions that may be relevant.

    An example of the Algorithm being 'wrong':

    This would (according to the Algorithm) call for INDEX(flag). However, indexing a column that has two (or a small number of) values is almost always useless. This is called 'low cardinality'. The Optimizer would prefer to do a table scan than to bounce between the index BTree and the data.

    On the other hand, the Algorithm is 'right' with

    That would call for a compound index starting with a flag: INDEX(flag, date). Such an index is likely to be very beneficial. And it is likely to be more beneficial than INDEX(date).

    If your resulting INDEX include column(s) that are likely to be UPDATEd, note that the UPDATE will have extra work to remove a 'row' from one place in the INDEX's BTree and insert a 'row' back into the BTree. For example:

    There are too many variables to say whether it is better to keep the index or to toss it.

    In this case, shortening the index may be beneficial:

    Changing to INDEX(z) would make for less work for the UPDATE, but might hurt some SELECT. It depends on the frequency of each, plus many more factors.

    Limitations

    (There are exceptions to some of these.)

    • You may not create an index bigger than 3KB.

    • You may not include a column that equates to bigger than some value (767 bytes -- VARCHAR(255) CHARACTER SET utf8).

    • You can deal with big fields using "prefix" indexing; but see below.

    • You should not have more than 5 columns in an index. (This is just a Rule of Thumb; nothing prevents having more.)

    Flags and low cardinality

    INDEX(flag) is almost never useful if flag has very few values. More specifically, when you say WHERE flag = 1 and "1" occurs more than 20% of the time, such an index will be shunned. The Optimizer would prefer to scan the table instead of bouncing back and forth between the index and the data for more than 20% of the rows.

    ("20%" is really somewhere between 10% and 30%, depending on the phase of the moon.)

    "Covering" indexes

    A "Covering" index is an index that contains all the columns in the SELECT. It is special in that the SELECT can be completed by looking only at the INDEX BTree. (Since InnoDB's PRIMARY KEY is clustered with the data, "covering" is of no benefit when considering at the PRIMARY KEY.)

    Mini-cookbook:

    1. Gather the list of column(s) according to the "Algorithm", above.

    2. Add to the end of the list the rest of the columns seen in the SELECT, in any order.

    Examples:

    • SELECT x FROM t WHERE y = 5; ⇒ INDEX(y,x) -- The algorithm said just INDEX(y)

    • SELECT x,z FROM t WHERE y = 5 AND q = 7; ⇒ INDEX(y,q,x,z) -- y and q in either order (Algorithm), then x and z in either order (covering).

    • SELECT x FROM t WHERE y > 5 AND q > 7; ⇒ INDEX(y,q,x) -- y or q first (that's as far as the Algorithm goes), then the other two fields afterwards.

    The speedup you get might be minor, or it might be spectacular; it is hard to predict.

    But...

    • It is not wise to build an index with lots of columns. Let's cut it off at 5 (Rule of Thumb).

    • Prefix indexes cannot 'cover', so don't use them anywhere in a 'covering' index.

    • There are limits (3KB?) on how 'wide' an index can be, so "covering" may not be possible.

    Redundant/excessive indexes

    INDEX(a,b) can find anything that INDEX(a) could find. So you don't need both. Get rid of the shorter one.

    If you have lots of SELECTs and they generate lots of INDEXes, this may cause a different problem. Each index must be updated (sooner or later) for each INSERT. More indexes ⇒ slower INSERTs. Limit the number of indexes on a table to about 6 (Rule of Thumb).

    Notice in the cookbook how it says "in any order" in a few places. If, for example, you have both of these (in different SELECTs):

    • WHERE a=1 AND b=2 begs for either INDEX(a,b) or INDEX(b,a)

    • WHERE a>1 AND b=2 begs only for INDEX(b,a) Include only INDEX(b,a) since it handles both cases with only one INDEX.

    Suppose you have a lot of indexes, including (a,b,c,dd) and (a,b,c,ee). Those are getting rather long. Consider either picking one of them, or having simply (a,b,c). Sometimes the selectivity of (a,b,c) is so good that tacking on 'dd' or 'ee' does make enough difference to matter.

    Optimizer picks ORDER BY

    The main cookbook skips over an important optimization that is sometimes used. The optimizer will sometimes ignore the WHERE and, instead, use an INDEX that matches the ORDER BY. This, of course, needs to be a perfect match -- all columns, in the same order. And all ASC or all DESC.

    This becomes especially beneficial if there is a LIMIT.

    But there is a problem. There could be two situations, and the Optimizer is sometimes not smart enough to see which case applies:

    • If the WHERE does very little filtering, fetching the rows in ORDER BY order avoids a sort and has little wasted effort (because of 'little filtering'). Using the INDEX matching the ORDER BY is better in this case.

    • If the WHERE does a lot of filtering, the ORDER BY is wasting a lot of time fetching rows only to filter them out. Using an INDEX matching the WHERE clause is better.

    What should you do? If you think the "little filtering" is likely, then create an index with the ORDER BY columns in order and hope that the Optimizer uses it when it should.

    OR

    Cases...

    • WHERE a=1 OR a=2 -- This is turned into WHERE a IN (1,2) and optimized that way.

    • WHERE a=1 OR b=2 usually cannot be optimized.

    • WHERE x.a=1 OR y.b=2 This is even worse because of using two different tables.

    A workaround is to use UNION. Each part of the UNION is optimized separately. For the second case:

    Now the query can take good advantage of two different indexes. Note: "Index merge" might kick in on the original query, but it is not necessarily any faster than the UNION. Sister blog on compound indexes, including 'Index Merge'

    The third case (OR across 2 tables) is similar to the second.

    If you originally had a LIMIT, UNION gets complicated. If you started with ORDER BY z LIMIT 190, 10, then the UNION needs to be

    TEXT / BLOB

    You cannot directly index a TEXT or BLOB or large VARCHAR or large BINARY column. However, you can use a "prefix" index: INDEX(foo(20)). This says to index the first 20 characters of foo. But... It is rarely useful.

    Example of a prefix index:

    The index for me would contain 'Ja', 'Rick'. That's not useful for distinguishing between 'Jamison', 'Jackson', 'James', etc., so the index is so close to useless that the optimizer often ignores it.

    Probably never do UNIQUE(foo(20)) because this applies a uniqueness constraint on the first 20 characters of the column, not the whole column!

    Dates

    DATE, DATETIME, etc. are tricky to compare against.

    Some tempting, but inefficient, techniques:

    date_col LIKE '2016-01%' -- must convert date_col to a string, so acts like a functionLEFT(date_col, 4) = '2016-01' -- hiding the column in functionDATE(date_col) = 2016 -- hiding the column in function

    All must do a full scan. (On the other hand, it can handy to use GROUP BY LEFT(date_col, 7) for monthly grouping, but that is not an INDEX issue.)

    This is efficient, and can use an index:

    This case works because both right-hand values are converted to constants, then it is a "range". I like the design pattern with INTERVAL because it avoids computing the last day of the month. And it avoids tacking on '23:59:59', which is wrong if you have microsecond times. (And other cases.)

    EXPLAIN Key_len

    Perform EXPLAIN SELECT... (and EXPLAIN FORMAT=JSON SELECT... if you have 5.6.5). Look at the Key that it chose, and the Key_len. From those you can deduce how many columns of the index are being used for filtering. (JSON makes it easier to get the answer.) From that you can decide whether it is using as much of the INDEX as you thought. Caveat: Key_len only covers the WHERE part of the action; the non-JSON output won't easily say whether GROUP BY or ORDER BY was handled by the index.

    IN

    IN (1,99,3) is sometimes optimized as efficiently as "=", but not always. Older versions of MySQL did not optimize it as well as newer versions. (5.6 is possibly the main turning point.)

    IN ( SELECT ... )

    From version 4.1 through 5.5, IN ( SELECT ... ) was very poorly optimized. The SELECT was effectively re-evaluated every time. Often it can be transformed into a JOIN, which works much faster. Heres is a pattern to follow:

    The SELECT expressions will need "a." prefixing the column names.

    Alas, there are cases where the pattern is hard to follow.

    5.6 does some optimizing, but probably not as good as the JOIN.

    If there is a JOIN or GROUP BY or ORDER BY LIMIT in the subquery, that complicates the JOIN in new format. So, it might be better to use this pattern:

    Caveat: If you end up with two subqueries JOINed together, note that neither has any indexes, hence performance can be very bad. (5.6 improves on it by dynamically creating indexes for subqueries.)

    There is work going on in MariaDB and Oracle 5.7, in relation to "NOT IN", "NOT EXISTS", and "LEFT JOIN..IS NULL"; here is an old discussion on the topic So, what I say here may not be the final word.

    Explode/Implode

    When you have a JOIN and a GROUP BY, you may have the situation where the JOIN exploded more rows than the original query (due to many:many), but you wanted only one row from the original table, so you added the GROUP BY to implode back to the desired set of rows.

    This explode + implode, itself, is costly. It would be better to avoid them if possible.

    Sometimes the following will work.

    Using DISTINCT or GROUP BY to counteract the explosion

    When using second table just to check for existence:

    Many-to-many mapping table

    Do it this way.

    Notes:

    • Lack of an AUTO_INCREMENT id for this table -- The PK given is the 'natural' PK; there is no good reason for a surrogate.

    • "MEDIUMINT" -- This is a reminder that all INTs should be made as small as is safe (smaller ⇒ faster). Of course the declaration here must match the definition in the table being linked to.

    • "UNSIGNED" -- Nearly all INTs may as well be declared non-negative

    • "NOT NULL" -- Well, that's true, isn't it?

    To conditionally INSERT new links, use

    Note that if you had an AUTO_INCREMENT in this table, IODKU would "burn" ids quite rapidly.

    Subqueries and UNIONs

    Each subquery SELECT and each SELECT in a UNION can be considered separately for finding the optimal INDEX.

    Exception: In a "correlated" ("dependent") subquery, the part of the WHERE that depends on the outside table is not easily factored into the INDEX generation. (Cop out!)

    JOINs

    The first step is to decide what order the optimizer will go through the tables. If you cannot figure it out, then you may need to be pessimistic and create two indexes for each table -- one assuming the table will be used first, one assiming that it will come later in the table order.

    The optimizer usually starts with one table and extracts the data needed from it. As it finds a useful (that is, matches the WHERE clause, if any) row, it reaches into the 'next' table. This is called NLJ ("Nested Loop Join"). The process of filtering and reaching to the next table continues through the rest of the tables.

    The optimizer usually picks the "first" table based on these hints:

    • STRAIGHT_JOIN forces the table order.

    • The WHERE clause limits which rows needed (whether indexed or not).

    • The table to the "left" in a LEFT JOIN usually comes before the "right" table. (By looking at the table definitions, the optimizer may decide that "LEFT" is irrelevant.)

    • The current INDEXes will encourage an order.

    Running EXPLAIN tells you the table order that the Optimizer is very likely to use today. After adding a new INDEX, the optimizer may pick a different table order. You should anticipate the order changing, guess at what order makes the most sense, and build the INDEXes accordingly. Then rerun EXPLAIN to see if the Optimizer's brain was on the same wavelength you were on.

    You should build the INDEX for the "first" table based on any parts of the WHERE, GROUP BY, and ORDER BY clauses that are relevant to it. If a GROUP/ORDER BY mentions a different table, you should ignore that clause.

    The second (and subsequent) table will be reached into based on the ON clause. (Instead of using commajoin, please write JOINs with the JOIN keyword and ON clause!) In addition, there could be parts of the WHERE clause that are relevant. GROUP/ORDER BY are not to be considered in writing the optimal INDEX for subsequent tables.

    PARTITIONing

    PARTITIONing is rarely a substitute for a good INDEX.

    PARTITION BY RANGE is a technique that is sometimes useful when indexing fails to be good enough. In a two-dimensional situation such as nearness in a geographical sense, one dimension can partially be handled by partition pruning; then the other dimension can be handled by a regular index (preferrably the PRIMARY KEY). This goes into more detail: .

    FULLTEXT

    FULLTEXT is now implemented in InnoDB as well as MyISAM. It provides a way to search for "words" in TEXT columns. This is much faster (when it is applicable) than col LIKE '%word%'.

    always(?) uses the FULLTEXT index first. That is, the whole Algorithm is invalidated when one of the ANDs is a MATCH.

    Signs of a Newbie

    • No "compound" (aka "composite") indexes

    • No PRIMARY KEY

    • Redundant indexes (especially blatant is PRIMARY KEY(id), KEY(id))

    • Most or all columns individually indexes ("But I indexed everything")

    Speeding up wp_postmeta

    The published table (see Wikipedia) is

    The problems:

    • The AUTO_INCREMENT provides no benefit; in fact it slows down most queries and clutters disk.

    • Much better is PRIMARY KEY(post_id, meta_key) -- clustered, handles both parts of usual JOIN.

    • BIGINT is overkill, but that can't be fixed without changing other tables.

    • VARCHAR(255) can be a problem in 5.6 with utf8mb4; see workarounds below.

    The solutions:

    Postlog

    Initial posting: March, 2015; Refreshed Feb, 2016; Add DATE June, 2016; Add WP example May, 2017.

    The tips in this document apply to MySQL, MariaDB, and Percona.

    See also

    • Some info in the MySQL manual:

    • A short, but complicated,

    This blog is the consolidation of a Percona tutorial I gave in 2013, plus many years of experience in fixing thousands of slow queries on hundreds of systems. I apologize that this does not tell you how create INDEXes for all SELECTs. Some are just too complex.

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    Latitude/Longitude Indexing

    The problem

    You want to find the nearest 10 pizza parlors, but you cannot figure out how to do it efficiently in your huge database. Database indexes are good at one-dimensional indexing, but poor at two-dimensions.

    You might have tried

    • INDEX(lat), INDEX(lon) -- but the optimizer used only one

    • INDEX(lat,lon) -- but it still had to work too hard

    • Sometimes you ended up with a full table scan -- Yuck.

    WHERE < ... -- No chance of using any index.

    WHERE lat BETWEEN ... AND lng BETWEEN... -- This has some chance of using such indexes.

    The goal is to look only at records "close", in both directions, to the target lat/lng.

    A solution -- first, the principles

    in MariaDB and MySQL sort of give you a way to have two clustered indexes. So, if we could slice up (partition) the globe in one dimension and use ordinary indexing in the other dimension, maybe we can get something approximating a 2D index. This 2D approach keeps the number of disk hits significantly lower than 1D approaches, thereby speeding up "find nearest" queries.

    It works. Not perfectly, but better than the alternatives.

    What to PARTITION on? It seems like latitude or longitude would be a good idea. Note that longitudes vary in width, from 69 miles (111 km) at the equator, to 0 at the poles. So, latitude seems like a better choice.

    How many PARTITIONs? It does not matter a lot. Some thoughts:

    • 90 partitions - 2 degrees each. (I don't like tables with too many partitions; 90 seems like plenty.)

    • 50-100 - evenly populated. (This requires code. For 2.7M placenames, 85 partitions varied from 0.5 degrees to very wide partitions at the poles.)

    • Don't have more than 100 partitions, there are inefficiencies in the partition implementation.

    How to PARTITION? Well, MariaDB and MySQL are very picky. So / are out. is out. So, we are stuck with some kludge. Essentially, we need to convert Lat/Lng to some size of and use PARTITION BY RANGE.

    Representation choices

    To get to a datatype that can be used in PARTITION, you need to "scale" the latitude and longitude. (Consider only the *INTs; the other datatypes are included for comparison)

    (Sorted by resolution)

    What these mean...

    Deg*100 () -- you take the lat/lng, multiply by 100, round, and store into a SMALLINT. That will take 2 bytes for each dimension, for a total of 4 bytes. Two items might be 1570 meters apart, but register as having the same latitude and longitude.

    for latitude and DECIMAL(5,2) for longitude will take 2+3 bytes and have no better resolution than Deg*100.

    SMALLINT scaled -- Convert latitude into a SMALLINT SIGNED by doing (degrees / 90 * 32767) and rounding; longitude by (degrees / 180 * 32767).

    has 24 significant bits; has 53. (They don't work with PARTITIONing but are included for completeness. Often people use DOUBLE without realizing how much an overkill it is, and how much space it takes.)

    Sure, you could do DEG_1000 and other "in between" cases, but there is no advantage. DEG_1000 takes as much space as DEG*10000, but has less resolution.

    So, go down the list to see how much resolution you need, then pick an encoding you are comfortable with. However, since we are about to use latitude as a "partition key", it must be limited to one of the INTs. For the sample code, I will use Deg*10000 ().

    GCDist -- compute "great circle distance"

    GCDist is a helper FUNCTION that correctly computes the distance between two points on the globe.

    The code has been benchmarked at about 20 microseconds per call on a 2011-vintage PC. If you had to check a million points, that would take 20 seconds -- far too much for a web application. So, one goal of the Procedure that uses it will be to minimize the usage of this function. With the code presented here, the function need be called only a few dozen or few hundred times, except in pathological cases.

    Sure, you could use the Pythagorean formula. And it would work for most applications. But it does not take extra effort to do the GC. Furthermore, GC works across a pole and across the dateline. And, a Pythagorean function is not that much faster.

    For efficiency, GCDist understands the scaling you picked and has that stuff hardcoded. I am picking "Deg*10000", so the function expects 350000 for representing 35 degrees. If you choose a different scaling, you will need to change the code.

    GCDist() takes 4 scaled DOUBLEs -- lat1, lon1, lat2, lon2 -- and returns a scaled number of "degrees" representing the distance.

    The table of representation choices says 52 feet of resolution for Deg*10000 and DECIMAL(x,4). Here is how it was calculated: To measuring a diagonal between lat/lng (0,0) and (0.0001,00001) (one 'unit in the last place'): GCDist(0,0,1,1) * 69.172 / 10000 * 5280 = 51.65, where

    • 69.172 miles/degree of latitude

    • 10000 units per degree for the scaling chosen

    • 5280 feet / mile.

    (No, this function does not compensate for the Earth being an oblate spheroid, etc.)

    Required table structure

    There will be one table (plus normalization tables as needed). The one table must be partitioned and indexed as indicated below.

    Fields and indexes

    • PARTITION BY RANGE(lat)

    • lat -- scaled latitude (see above)

    • lon -- scaled longitude

    • PRIMARY KEY(lon, lat, ...) -- lon must be first; something must be added to make it UNIQUE

    For most of this discussion, lat is assumed to be MEDIUMINT -- scaled from -90 to +90 by multiplying by 10000. Similarly for lon and -180 to +180.

    The PRIMARY KEY must

    • start with lon since the algorithm needs the "clustering" that InnoDB will provide, and

    • include lat somewhere, since it is the PARTITION key, and

    • contain something to make the key UNIQUE (lon+lat is unlikely to be sufficient).

    The FindNearest PROCEDURE will do multiple SELECTs something like this:

    The query planner will

    • Do PARTITION "pruning" based on the latitude; then

    • Within a PARTITION (which is effectively a table), use lon do a 'clustered' range scan; then

    • Use the "condition" to filter down to the rows you desire, plus recheck lat. This design leads to very few disk blocks needing to be read, which is the main goal of the design.

    Note that this does not even call GCDist. That comes in the last pass when the ORDER BY and LIMIT are used.

    The has a loop. At least two SELECTs will be executed, but with proper tuning; usually no more than about 6 SELECTs will be performed. Because of searching by the PRIMARY KEY, each SELECT hits only one block, sometimes more, of the table. Counting the number of blocks hit is a crude, but effective way, of comparing the performance of multiple designs. By comparison, a full table scan will probably touch thousands of blocks. A simple INDEX(lat) probably leads to hitting hundreds of blocks.

    Filtering... An argument to the FindNearest procedure includes a boolean expression ("condition") for a WHERE clause. If you don't need any filtering, pass in "1". To avoid "SQL injection", do not let web users put arbitrary expressions; instead, construct the "condition" from inputs they provide, thereby making sure it is safe.

    The algorithm

    The algorithm is embodied in a because of its complexity.

    • You feed it a starting width for a "square" and a number of items to find.

    • It builds a "square" around where you are.

    • A SELECT is performed to see how many items are in the square.

    • Loop, doubling the width of the square, until enough items are found.

    The next section ("Performance") should make this a bit clearer as it walks through some examples.

    Performance

    Because of all the variations, it is hard to get a meaningful benchmark. So, here is some hand-waving instead.

    Each SELECT is constrained by a "square" defined by a latitude range and a longitude range. (See the WHERE clause mentioned above, or in the sample code below.) Because of the way longitude lines warp, the longitude range of the "square" will be more degrees than the latitude range. Let's say the latitude partitioning is 3 degrees wide in the area where you are searching. That is over 200 miles (over 300km), so you are very likely to have a latitude range smaller than the partition width. Still, if you are reaching from the edge of a latitude stripe, the square could span two partitions. After partition pruning down to one (sometimes more) partition, the query is then constrained by a longitude range. (Remember, the PRIMARY KEY starts with lon.) If an InnoDB data block contains 100 rows (a handy Rule of Thumb), the select will touch one (or a few) block. If the square spans two (or more) partitions, then the same logic applies to each partition.

    So, scanning the square will involve as little as one block; rarely more than a few blocks. The number of blocks is mostly independent of the dataset size.

    The primary use case for this algorithm is when the data is significantly larger than will fit into cache (the buffer_pool). Hence, the main goal is to minimize the number of disk hits.

    Now let's look at some edge cases, and argue that the number of blocks is still better (usually) than with traditional indexing techniques.

    What if you are looking for Starbucks in a dense city? There would be dozens, maybe hundreds per square mile. If you start the guess at 100 miles, the SELECTs would be hitting lots of blocks -- not efficient. In this case, the "starting distance" should be small, say, 2 miles. Let's say your app wants the closest 10 stores. In this example, you would probably find more than 10 Starbucks within 2 miles in 1 InnoDB block in one partition. Even though there is a second SELECT to finish off the query, it would be hitting the same block. Total: One block hit == cheap.

    Let's say you start with a 5 mile square. Since there are upwards of 200 Starbucks within a 5-miles radius in some dense cities of the world, that might imply 300 in our "square". That maps to about 4 disk blocks, and a modest amount of CPU to chew through the 300 records. Still not bad.

    Now, suppose you are on an ocean liner somewhere in the Pacific. And there is one Starbucks onboard, but you are looking for the nearest 10. If you again start with 2 miles, it will take several iterations to find 10 sites. But, let's walk through it anyway. The first probe will hit one partition (maybe 2), and find just one hit. The second probe doubles the width of the square; 4 miles will still give you one hit -- the same hit in the same block, which is now cached, so we won't count it as a second disk I/O. Eventually the square will be wide enough to span multiple partitions. Each extra partition will be one new disk hit to discover no sites in the square. Finally, the square will hit Chile or Hawaii or Fiji and find some more sites, perhaps enough to stop the iteration. Since the main criteria in determining the number of disk hits is the number of partitions hit, we do not want to split the world into too many partitions. If there are, say, 40 partitions, then I have just described a case where there might be 20 disk hits.

    2-degree partitions might be good for a global table of stores or restaurants. A 5-mile starting distance might be good when filtering for Starbucks. 20 miles might be better for a department store.

    Now, let's discuss the 'last' SELECT, wherein the square is expanded by SQRT(2) and it uses the Great Circle formula to precisely order the N results. The SQRT(2) is in case that the N items were all at the corners of the 'square'. Growing the square by this much allows us to catch any other sites that were just outside the old square.

    First, note that this 'last' SELECT is hitting the same block(s) that the iteration hit, plus possibly hitting some more blocks. It is hard to predict how many extra blocks might be hit. Here's a pathological case. You are in the middle of a desert; the square grows and grows. Eventually it finds N sites. There is a big city just outside the final square from the iterating. Now the 'last' SELECT kicks in, and it includes lots of sites in this big city. "Lots of sites" --> lots of blocks --> lots of disk hits.

    Discussion of reference code

    Here's the gist of the FindNearest().

    • Make a guess at how close to "me" to look.

    • See how many items are in a 'square' around me, after filtering.

    • If not enough, repeat, doubling the width of the square.

    • After finding enough, or giving up because we are looking "too far", make one last pass to get all the data, ORDERed and LIMITed

    Note that the loop merely uses 'squares' of lat/lng ranges. This is crude, but works well with the partitioning and indexing, and avoids calling to GCDist (until the last step). In the sample code, I picked 15 miles as starting value. Adjusting this will have some impact on the Procedure's performance, but the impact will vary with the use cases. A rough way to set the radius is to guess what will find the desired LIMIT about half the time. (This value is hardcoded in the PROCEDURE.)

    Parameters passed into FindNearest():

    • your Latitude -- -90..90 (not scaled -- see hardcoded conversion in PROCEDURE)

    • your Longitude -- -180..180 (not scaled)

    • Start distance -- (miles or km) -- see discussion below

    • Max distance -- in miles or km -- see hardcoded conversion in PROCEDURE

    The function will find the nearest items, up to Limit that meet the Condition. But it will give up at Max distance. (If you are at the South Pole, why bother searching very far for the tenth pizza parlor?)

    Because of the "scaling", "hardcoding", "Condition", the table name, etc, this PROCEDURE is not truly generic; the code must be modified for each application. Yes, I could have designed it to pass all that stuff in. But what a mess.

    The "_start_dist" gives some control over the performance. Making this too small leads to extra iterations; too big leads to more rows being checked. If you choose to tune the Stored Procedure, do the following. "SELECT @iterations" after calling the SP for a number of typical values. If the value is usually 1, then decrease _start_dist. If it is usually 2 or more, then increase it.

    Timing: Under 10ms for "typical" usage; any dataset size. Slower for pathological cases (low min distance, high max distance, crossing dateline, bad filtering, cold cache, etc)

    End-cases:

    • By using GC distance, not Pythagoras, distances are 'correct' even near poles.

    • Poles -- Even if the "nearest" is almost 360 degrees away (longitude), it can find it.

    • Dateline -- There is a small, 'contained', piece of code for crossing the Dateline. Example: you are at +179 deg longitude, and the nearest item is at -179.

    The procedure returns one resultset, SELECT *, distance.

    • Only rows that meet your Condition, within Max distance are returned

    • At most Limit rows are returned

    • The rows will be ordered, "closest" first.

    • "dist" will be in miles or km (based on a hardcoded constant in the SP)

    Reference code, assuming deg*10000 and 'miles'

    This version is based on scaling "Deg*10000 (MEDIUMINT)".

    Postlog

    There is a "Haversine" algorithm that is twice as fast as the GCDist function here. But it has a fatal flaw of sometimes returning NULL for the distance between a point and itself. (This is because of computing a number slightly bigger than 1.0, then trying to take the ACOS of it.)

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    GUID/UUID Performance

    The problem

    GUIDs/UUIDs (Globally/Universally Unique Identifiers) are very random. Therefore, INSERTing into an index means jumping around a lot. Once the index is too big to be cached, most INSERTs involve a disk hit. Even on a beefy system, this limits you to a few hundred INSERTs per second.

    .

    This blog is mostly eliminated in MySQL 8.0 with the advent of the following function:.

    SHOW VARIABLES LIKE 'have_query_cache';
    +------------------+-------+
    | Variable_name    | Value |
    +------------------+-------+
    | have_query_cache | YES   |
    +------------------+-------+
    SET GLOBAL query_cache_type = 1;
    SET GLOBAL query_cache_size = 2000000;
    SELECT * FROM t
    SELECT * from t
    /* retry */SELECT * FROM t
    /* retry2 */SELECT * FROM t
    SHOW VARIABLES LIKE 'query_cache_size';
    +------------------+----------+
    | Variable_name    | Value    |
    +------------------+----------+
    | query_cache_size | 67108864 |
    +------------------+----------+
    
    SET GLOBAL query_cache_size = 8000000;
    Query OK, 0 rows affected, 1 warning (0.03 sec)
    
    SHOW VARIABLES LIKE 'query_cache_size';
    +------------------+---------+
    | Variable_name    | Value   |
    +------------------+---------+
    | query_cache_size | 7999488 |
    +------------------+---------+
    SET GLOBAL query_cache_size=40000;
    Query OK, 0 rows affected, 2 warnings (0.03 sec)
    
    SHOW WARNINGS;
    +---------+------+-----------------------------------------------------------------+
    | Level   | Code | Message                                                         |
    +---------+------+-----------------------------------------------------------------+
    | Warning | 1292 | Truncated incorrect query_cache_size value: '40000'             |
    | Warning | 1282 | Query cache failed to set size 39936; new query cache size is 0 |
    +---------+------+-----------------------------------------------------------------+
    SHOW STATUS LIKE 'Qcache%';
    +-------------------------+----------+
    | Variable_name           | Value    |
    +-------------------------+----------+
    | Qcache_free_blocks      | 1158     |
    | Qcache_free_memory      | 3760784  |
    | Qcache_hits             | 31943398 |
    | Qcache_inserts          | 42998029 |
    | Qcache_lowmem_prunes    | 34695322 |
    | Qcache_not_cached       | 652482   |
    | Qcache_queries_in_cache | 4628     |
    | Qcache_total_blocks     | 11123    |
    +-------------------------+----------+
    FLUSH QUERY CACHE;
    SHOW STATUS LIKE 'Qcache%';
    +-------------------------+----------+
    | Variable_name           | Value    |
    +-------------------------+----------+
    | Qcache_free_blocks      | 1        |
    | Qcache_free_memory      | 6101576  |
    | Qcache_hits             | 31981126 |
    | Qcache_inserts          | 43002404 |
    | Qcache_lowmem_prunes    | 34696486 |
    | Qcache_not_cached       | 655607   |
    | Qcache_queries_in_cache | 4197     |
    | Qcache_total_blocks     | 8833     |
    +-------------------------+----------+
    1> SELECT * FROM T1
    +---+
    | a |
    +---+
    | 1 |
    +---+
    -- Here the query is cached
    
    -- From another connection execute:
    2> LOCK TABLES T1 WRITE;
    
    -- Expected result with: query_cache_wlock_invalidate = OFF
    1> SELECT * FROM T1
    +---+
    | a |
    +---+
    | 1 |
    +---+
    -- read from query cache
    
    
    -- Expected result with: query_cache_wlock_invalidate = ON
    1> SELECT * FROM T1
    -- Waiting Table Write Lock
    SELECT * FROM T1 <first INSERT TO query cache, USING FLAGS_IN_TRANS=0>
    +---+
    | a |
    +---+
    | 1 |
    +---+
    BEGIN;
    SELECT * FROM T1 <first INSERT TO query cache, USING FLAGS_IN_TRANS=1>
    +---+
    | a |
    +---+
    | 1 |
    +---+
    SELECT * FROM T1 <result FROM query cache, USING FLAGS_IN_TRANS=1>
    +---+
    | a |
    +---+
    | 1 |
    +---+
    INSERT INTO T1 VALUES(2);  <invalidate queries FROM TABLE T1 AND disable query cache TO TABLE T1>
    SELECT * FROM T1 <don't USE query cache, a normal query FROM innodb TABLE>
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    +---+
    SELECT * FROM T1 <don't USE query cache, a normal query FROM innodb TABLE>
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    +---+
    COMMIT;  <query cache IS now turned ON TO T1 TABLE>
    SELECT * FROM T1 <first INSERT TO query cache, USING FLAGS_IN_TRANS=0>
    +---+
    | a |
    +---+
    | 1 |
    +---+
    SELECT * FROM T1 <result FROM query cache, USING FLAGS_IN_TRANS=0>
    +---+
    | a |
    +---+
    | 1 |
    +---+
    struct timespec waittime;
            set_timespec_nsec(waittime,(ulong)(50000000L));  /* Wait for 50 msec */
            int res= mysql_cond_timedwait(&COND_cache_status_changed,
                                          &structure_guard_mutex, &waittime);
            if (res == ETIMEDOUT)
              break;
    SELECT SQL_NO_CACHE .... FROM (SELECT SQL_CACHE ...) AS temp_table
    SELECT SQL_CACHE .... FROM (SELECT SQL_NO_CACHE ...) AS temp_table

    MASTER_POS_WAIT()

    NOW()

    RAND()

    RELEASE_LOCK()

    SLEEP()

    SYSDATE()

    UNIX_TIMESTAMP() (no parameters)

    USER()

    UUID()

    UUID_SHORT()

    autocommit
    character_set_client
    character_set_results
    collation_connection
    sql_select_limit
    time_zone
    sql_mode
    max_sort_length
    group_concat_max_len
    default_week_format
    div_precision_increment
    lc_time_names
    BENCHMARK()
    CONNECTION_ID()
    CONVERT_TZ()
    CURDATE()
    CURRENT_DATE()
    CURRENT_TIME()
    CURRENT_TIMESTAMP()
    CURTIME()
    DATABASE()
    ENCRYPT()
    FOUND_ROWS()
    GET_LOCK()
    LAST_INSERT_ID()
    LOAD_FILE()

    WHERE t1.aa = 123 AND t2.bb = 456 -- You must only consider columns in the current table.

    xxx IS NOT NULL Add the column in the range to your putative INDEX.

    WHERE aaa >= 123 ORDER BY aaa ⇒ INDEX(aaa) -- Bonus: The ORDER BY will use the INDEX.
  • WHERE aaa >= 123 ORDER BY aaa ⇒ INDEX(aaa) DESC -- Same Bonus.

  • WHERE aaa = 123 GROUP BY xxx, (a+b) ⇒ INDEX(aaa) -- expression in GROUP BY, so no use including even xxx.

    WHERE ccc > 432 ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc) -- This "range" is compatible with ORDER BY

    You should not have redundant indexes. (See below.)

    "InnoDB" -- More effecient than MyISAM because of the way the PRIMARY KEY is clustered with the data in InnoDB.

  • "INDEX(y_id, x_id)" -- The PRIMARY KEY makes it efficient to go one direction; this index makes the other direction efficient. No need to say UNIQUE; that would be extra effort on INSERTs.

  • In the secondary index, saying justINDEX(y_id) would work because it would implicit include x_id. But I would rather make it more obvious that I am hoping for a 'covering' index.

  • etc.

    "Commajoin" -- That is FROM a, b WHERE a.x=b.x instead of FROM a JOIN b ON a.x=b.x

    When would meta_key or meta_value ever be NULL?

    Some discussion of JOIN

  • Indexing 101: Optimizing MySQL queries on a single table (Stephane Combaudon - Percona)

  • A complex query, well explained.

  • More on prefix indexing
    Another variant
    IODKU
    Find nearest 10 pizza parlors
    Percona 2015 Tutorial Slides
    ORDER BY Optimization
    example
    MySQL manual page on range accesses in composite indexes
    Rick James' site
    random
    id -- (optional) you may need to identify the rows for your purposes; AUTO_INCREMENT if you like
  • INDEX(id) -- if id is AUTO_INCREMENT, then this plain INDEX (not UNIQUE, not PRIMARY KEY) is necessary

  • ENGINE=InnoDB -- so the PRIMARY KEY will be "clustered"

  • Other indexes -- keep to a minimum (this is a general performance rule for large tables)

  • Now, a 'last' SELECT is performed to get the exact distances, sort them (ORDER BY) and LIMIT to the desired number.

  • If spanning a pole or the dateline, a more complex SELECT is used.

  • Limit -- maximum number of items to return

  • Condition -- something to put after 'AND' (more discussion above)

  • Z-ordering
    SQRT(...)
    PARTITIONs
    FLOAT
    DOUBLE
    DECIMAL
    INT
    SMALLINT
    DECIMAL(4,2)
    FLOAT
    DOUBLE
    MEDIUMINT
    stored procedure
    stored procedure
    stored procedure
    Cities used for testing
    A forum thread
    StackOverflow discussion
    Sample
    Rick James' site
    latlng
    INDEX(last_name, first_name) -- the order of the list.
        WHERE last_name = 'James' AND first_name = 'Rick'  -- best case
        WHERE last_name = 'James' AND first_name LIKE 'R%' -- pretty good
        WHERE last_name LIKE 'J%' AND first_name = 'Rick'  -- pretty bad
    SELECT ... FROM t WHERE flag = true;
    SELECT ... FROM t WHERE flag = true AND date >= '2015-01-01';
    INDEX(x)
    UPDATE t SET x = ... WHERE ...;
    INDEX(z, x)
    UPDATE t SET x = ... WHERE ...;
    ( SELECT ... WHERE a=1 )   -- and have INDEX(a)
       UNION DISTINCT -- "DISTINCT" is assuming you need to get rid of dups
       ( SELECT ... WHERE b=2 )   -- and have INDEX(b)
       GROUP BY ... ORDER BY ...  -- whatever you had at the end of the original query
    ( SELECT ... LIMIT 200 )   -- Note: OFFSET 0, LIMIT 190+10
       UNION DISTINCT -- (or ALL)
       ( SELECT ... LIMIT 200 )
       LIMIT 190, 10              -- Same as originally
    INDEX(last_name(2), first_name)
    date_col >= '2016-01-01'
        AND date_col  < '2016-01-01' + INTERVAL 3 MONTH
    SELECT  ...
        FROM  a
        WHERE  test_a
          AND  x IN (
            SELECT  x
                FROM  b
                WHERE  test_b
                    );
    ⇒
    SELECT  ...
        FROM  a
        JOIN  b USING(x)
        WHERE  test_a
          AND  test_b;
    SELECT  ...
        FROM  a
        WHERE  test_a
          AND  x IN ( SELECT  x  FROM ... );
    ⇒
    SELECT  ...
        FROM  a
        JOIN        ( SELECT  x  FROM ... ) b
            USING(x)
        WHERE  test_a;
    SELECT  DISTINCT
            a.*,
            b.y
        FROM a
        JOIN b
    ⇒
    SELECT  a.*,
            ( SELECT GROUP_CONCAT(b.y) FROM b WHERE b.x = a.x ) AS ys
        FROM a
    SELECT  a.*
        FROM a
        JOIN b  ON b.x = a.x
        GROUP BY a.id
    ⇒
    SELECT  a.*,
        FROM a
        WHERE EXISTS ( SELECT *  FROM b  WHERE b.x = a.x )
    CREATE TABLE XtoY (
            # No surrogate id for this table
            x_id MEDIUMINT UNSIGNED NOT NULL,   -- For JOINing to one table
            y_id MEDIUMINT UNSIGNED NOT NULL,   -- For JOINing to the other table
            # Include other fields specific to the 'relation'
            PRIMARY KEY(x_id, y_id),            -- When starting with X
            INDEX      (y_id, x_id)             -- When starting with Y
        ) ENGINE=InnoDB;
    WHERE x = 1
          AND MATCH (...) AGAINST (...)
    CREATE TABLE wp_postmeta (
          meta_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
          post_id BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
          meta_key VARCHAR(255) DEFAULT NULL,
          meta_value LONGTEXT,
          PRIMARY KEY (meta_id),
          KEY post_id (post_id),
          KEY meta_key (meta_key)
        ) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
    CREATE TABLE wp_postmeta (
            post_id BIGINT UNSIGNED NOT NULL,
            meta_key VARCHAR(255) NOT NULL,
            meta_value LONGTEXT NOT NULL,
            PRIMARY KEY(post_id, meta_key),
            INDEX(meta_key)
            ) ENGINE=InnoDB;
    Datatype           Bytes       resolution
       ------------------ -----  --------------------------------
       Deg*100 (SMALLINT)     4  1570 m    1.0 mi  Cities
       DECIMAL(4,2)/(5,2)     5  1570 m    1.0 mi  Cities
       SMALLINT scaled        4   682 m    0.4 mi  Cities
       Deg*10000 (MEDIUMINT)  6    16 m     52 ft  Houses/Businesses
       DECIMAL(6,4)/(7,4)     7    16 m     52 ft  Houses/Businesses
       MEDIUMINT scaled       6   2.7 m    8.8 ft
       FLOAT                  8   1.7 m    5.6 ft
       DECIMAL(8,6)/(9,6)     9    16cm    1/2 ft  Friends in a mall
       Deg*10000000 (INT)     8    16mm    5/8 in  Marbles
       DOUBLE                16   3.5nm     ...    Fleas on a dog
    WHERE lat    BETWEEN @my_lat - @dlat
                           AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
            AND lon    BETWEEN @my_lon - @dlon
                           AND @my_lon + @dlon   -- first part of PK
            AND condition                        -- filter out non-pizza parlors
    DELIMITER //
    
    DROP function IF EXISTS GCDist //
    CREATE FUNCTION GCDist (
            _lat1 DOUBLE,  -- Scaled Degrees north for one point
            _lon1 DOUBLE,  -- Scaled Degrees west for one point
            _lat2 DOUBLE,  -- other point
            _lon2 DOUBLE
        ) RETURNS DOUBLE
        DETERMINISTIC
        CONTAINS SQL  -- SQL but does not read or write
        SQL SECURITY INVOKER  -- No special privileges granted
    -- Input is a pair of latitudes/longitudes multiplied by 10000.
    --    For example, the south pole has latitude -900000.
    -- Multiply output by .0069172 to get miles between the two points
    --    or by .0111325 to get kilometers
    BEGIN
        -- Hardcoded constant:
        DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;  -- For scaled by 1e4 to MEDIUMINT
        DECLARE _rlat1 DOUBLE DEFAULT _deg2rad * _lat1;
        DECLARE _rlat2 DOUBLE DEFAULT _deg2rad * _lat2;
        -- compute as if earth's radius = 1.0
        DECLARE _rlond DOUBLE DEFAULT _deg2rad * (_lon1 - _lon2);
        DECLARE _m     DOUBLE DEFAULT COS(_rlat2);
        DECLARE _x     DOUBLE DEFAULT COS(_rlat1) - _m * COS(_rlond);
        DECLARE _y     DOUBLE DEFAULT               _m * SIN(_rlond);
        DECLARE _z     DOUBLE DEFAULT SIN(_rlat1) - SIN(_rlat2);
        DECLARE _n     DOUBLE DEFAULT SQRT(
                            _x * _x +
                            _y * _y +
                            _z * _z    );
        RETURN  2 * ASIN(_n / 2) / _deg2rad;   -- again--scaled degrees
    END;
    //
    DELIMITER ;
    
    DELIMITER //
    -- FindNearest (about my 6th approach)
    DROP PROCEDURE IF EXISTS FindNearest6 //
    CREATE
    PROCEDURE FindNearest (
            IN _my_lat DOUBLE,  -- Latitude of me [-90..90] (not scaled)
            IN _my_lon DOUBLE,  -- Longitude [-180..180]
            IN _START_dist DOUBLE,  -- Starting estimate of how far to search: miles or km
            IN _max_dist DOUBLE,  -- Limit how far to search: miles or km
            IN _limit INT,     -- How many items to try to get
            IN _condition VARCHAR(1111)   -- will be ANDed in a WHERE clause
        )
        DETERMINISTIC
    BEGIN
        -- lat and lng are in degrees -90..+90 and -180..+180
        -- All computations done in Latitude degrees.
        -- Thing to tailor
        --   *Locations* -- the table
        --   Scaling of lat, lon; here using *10000 in MEDIUMINT
        --   Table name
        --   miles versus km.
    
        -- Hardcoded constant:
        DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;  -- For scaled by 1e4 to MEDIUMINT
    
        -- Cannot use params in PREPARE, so switch to @variables:
        -- Hardcoded constant:
        SET @my_lat := _my_lat * 10000,
            @my_lon := _my_lon * 10000,
            @deg2dist := 0.0069172,  -- 69.172 for miles; 111.325 for km  *** (mi vs km)
            @start_deg := _start_dist / @deg2dist,  -- Start with this radius first (eg, 15 miles)
            @max_deg := _max_dist / @deg2dist,
            @cutoff := @max_deg / SQRT(2),  -- (slightly pessimistic)
            @dlat := @start_deg,  -- note: must stay positive
            @lon2lat := COS(_deg2rad * @my_lat),
            @iterations := 0;        -- just debugging
    
        -- Loop through, expanding search
        --   Search a 'square', repeat with bigger square until find enough rows
        --   If the inital probe found _limit rows, then probably the first
        --   iteration here will find the desired data.
        -- Hardcoded table name:
        -- This is the "first SELECT":
        SET @sql = CONCAT(
            "SELECT COUNT(*) INTO @near_ct
                FROM Locations
                WHERE lat    BETWEEN @my_lat - @dlat
                                 AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                  AND lon    BETWEEN @my_lon - @dlon
                                 AND @my_lon + @dlon   -- first part of PK
                  AND ", _condition);
        PREPARE _sql FROM @sql;
        MainLoop: LOOP
            SET @iterations := @iterations + 1;
            -- The main probe: Search a 'square'
            SET @dlon := ABS(@dlat / @lon2lat);  -- good enough for now  -- note: must stay positive
            -- Hardcoded constants:
            SET @dlon := IF(ABS(@my_lat) + @dlat >= 900000, 3600001, @dlon);  -- near a Pole
            EXECUTE _sql;
            IF ( @near_ct >= _limit OR         -- Found enough
                 @dlat >= @cutoff ) THEN       -- Give up (too far)
                LEAVE MainLoop;
            END IF;
            -- Expand 'square':
            SET @dlat := LEAST(2 * @dlat, @cutoff);   -- Double the radius to search
        END LOOP MainLoop;
        DEALLOCATE PREPARE _sql;
    
        -- Out of loop because found _limit items, or going too far.
        -- Expand range by about 1.4 (but not past _max_dist),
        -- then fetch details on nearest 10.
    
        -- Hardcoded constant:
        SET @dlat := IF( @dlat >= @max_deg OR @dlon >= 1800000,
                    @max_deg,
                    GCDist(ABS(@my_lat), @my_lon,
                           ABS(@my_lat) - @dlat, @my_lon - @dlon) );
                -- ABS: go toward equator to find farthest corner (also avoids poles)
                -- Dateline: not a problem (see GCDist code)
    
        -- Reach for longitude line at right angle:
        -- sin(dlon)*cos(lat) = sin(dlat)
        -- Hardcoded constant:
        SET @dlon := IFNULL(ASIN(SIN(_deg2rad * @dlat) /
                                 COS(_deg2rad * @my_lat))
                                / _deg2rad -- precise
                            , 3600001);    -- must be too near a pole
    
        -- This is the "last SELECT":
        -- Hardcoded constants:
        IF (ABS(@my_lon) + @dlon < 1800000 OR    -- Usual case - not crossing dateline
            ABS(@my_lat) + @dlat <  900000) THEN -- crossing pole, so dateline not an issue
            -- Hardcoded table name:
            SET @sql = CONCAT(
                "SELECT *,
                        @deg2dist * GCDist(@my_lat, @my_lon, lat, lon) AS dist
                    FROM Locations
                    WHERE lat BETWEEN @my_lat - @dlat
                                  AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                      AND lon BETWEEN @my_lon - @dlon
                                  AND @my_lon + @dlon   -- first part of PK
                      AND ", _condition, "
                    HAVING dist <= ", _max_dist, "
                    ORDER BY dist
                    LIMIT ", _limit
                            );
        ELSE
            -- Hardcoded constants and table name:
            -- Circle crosses dateline, do two SELECTs, one for each side
            SET @west_lon := IF(@my_lon < 0, @my_lon, @my_lon - 3600000);
            SET @east_lon := @west_lon + 3600000;
            -- One of those will be beyond +/- 180; this gets points beyond the dateline
            SET @sql = CONCAT(
                "( SELECT *,
                        @deg2dist * GCDist(@my_lat, @west_lon, lat, lon) AS dist
                    FROM Locations
                    WHERE lat BETWEEN @my_lat - @dlat
                                  AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                      AND lon BETWEEN @west_lon - @dlon
                                  AND @west_lon + @dlon   -- first part of PK
                      AND ", _condition, "
                    HAVING dist <= ", _max_dist, " )
                UNION ALL
                ( SELECT *,
                        @deg2dist * GCDist(@my_lat, @east_lon, lat, lon) AS dist
                    FROM Locations
                    WHERE lat BETWEEN @my_lat - @dlat
                                  AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                      AND lon BETWEEN @east_lon - @dlon
                                  AND @east_lon + @dlon   -- first part of PK
                      AND ", _condition, "
                    HAVING dist <= ", _max_dist, " )
                ORDER BY dist
                LIMIT ", _limit
                            );
        END IF;
    
        PREPARE _sql FROM @sql;
        EXECUTE _sql;
        DEALLOCATE PREPARE _sql;
    END;
    //
    DELIMITER ;
    <<code>>
    
    == Sample
    
    Find the 5 cities with non-zero population (out of 3 million) nearest to (+35.15, -90.15). Start with a 10-mile bounding box and give up at 100 miles.
    
    <<code>>
    CALL FindNearestLL(35.15, -90.05, 10, 100, 5, 'population > 0');
    +---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
    | id      | lat    | lon     | country | ascii_city   | city         | state | population | @gcd_ct := 0 | dist                | @gcd_ct := @gcd_ct + 1 |
    +---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
    | 3023545 | 351494 | -900489 | us      | memphis      | Memphis      | TN    |     641608 |            0 | 0.07478733189367963 |                      3 |
    | 2917711 | 351464 | -901844 | us      | west memphis | West Memphis | AR    |      28065 |            0 |   7.605683607627499 |                      2 |
    | 2916457 | 352144 | -901964 | us      | marion       | Marion       | AR    |       9227 |            0 |     9.3994963998986 |                      1 |
    | 3020923 | 352044 | -898739 | us      | bartlett     | Bartlett     | TN    |      43264 |            0 |  10.643941157860604 |                      7 |
    | 2974644 | 349889 | -900125 | us      | southaven    | Southaven    | MS    |      38578 |            0 |  11.344042217329935 |                      5 |
    +---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
    5 rows in set (0.00 sec)
    Query OK, 0 rows affected (0.04 sec)
    
    SELECT COUNT(*) FROM ll_table;
    +----------+
    | COUNT(*) |
    +----------+
    |  3173958 |
    +----------+
    1 row in set (5.04 sec)
    
    FLUSH STATUS;
    CALL...
    SHOW SESSION STATUS LIKE 'Handler%';
    
    SHOW session status LIKE 'Handler%';
    +----------------------------+-------+
    | Variable_name              | Value |
    +----------------------------+-------+
    | Handler_read_first         | 1     |
    | Handler_read_key           | 3     |
    | Handler_read_next          | 1307  |  -- some index, some tmp, but far less than 3 million.
    | Handler_read_rnd           | 5     |
    | Handler_read_rnd_next      | 13    |
    | Handler_write              | 12    |  -- it needed a tmp
    +----------------------------+-------+
    Why it is a problem

    A 'standard' GUID/UUID is composed of the time, machine identification and some other stuff. The combination should be unique, even without coordination between different computers that could be generating UUIDs simultaneously.

    The top part of the GUID/UUID is the bottom part of the current time. The top part is the primary part of what would be used for placing the value in an ordered list (INDEX). This cycles in about 7.16 minutes.

    Some math... If the index is small enough to be cached in RAM, each insert into the index is CPU only, with the writes being delayed and batched. If the index is 20 times as big as can be cached, then 19 out of 20 inserts will be a cache miss. (This math applies to any "random" index.)

    Second problem

    36 characters is bulky. If you are using that as a PRIMARY KEY in InnoDB and you have secondary keys, remember that each secondary key has an implicit copy of the PK, thereby making it bulky.

    It is tempting to declare the UUID VARCHAR(36). And, since you probably are thinking globally, so you have CHARACTER SET utf8 (or utf8mb4). For utf8:

    • 2 - Overhead for VAR

    • 36 - chars

    • 3 (or 4) bytes per character for utf8 (or utf8mb4) So, max length = 2+3*36 = 110 (or 146) bytes. For temp tables 108 (or 144) is actually used if a MEMORY table is used.

    To compress

    • utf8 is unnecessary (ascii would do); but this is obviated by the next two steps

    • Toss dashes

    • UNHEX Now it will fit in 16 bytes: BINARY(16)

    Combining the problems and crafting a solution

    But first, a caveat. This solution only works for "Time based" / "Version 1" UUIDs They are recognizable by the "1" at the beginning of the third clump.

    The manual's sample: 6ccd780c-baba-1026-9564-0040f4311e29 . A more current value (after a few years): 49ea2de3-17a2-11e2-8346-001eecac3efa . Notice how the 3rd part has slowly changed over time? Let's data is rearranged, thus:

    Now we have a number that increases nicely over time. Multiple sources won't be quite in time order, but they will be close. The "hot" spot for inserting into an INDEX(uuid) will be rather narrow, thereby making it quite cacheable and efficient.

    If your SELECTs tend to be for "recent" uuids, then they, too, will be easily cached. If, on the other hand, your SELECTs often reach for old uuids, they will be random and not well cached. Still, improving the INSERTs will help the system overall.

    Code to do it

    Let's make Stored Functions to do the messy work of the two actions:

    • Rearrange fields

    • Convert to/from BINARY(16)

    Then you would do things like

    Do not flip the WHERE; this will be inefficent because it won't use INDEX(uuid):

    TokuDB

    TokuDB has been deprecated by its upstream maintainer. It is disabled from and has been removed in MariaDB 10.6 - MDEV-19780. We recommend MyRocks as a long-term migration path.

    TokuDB is a viable engine if you must have UUIDs (even non-type-1) in a huge table. TokuDB is available in MariaDB as a 'standard' engine, making the barrier to entry very low. There are a small number of differences between InnoDB and TokuDB; I will not go into them here.

    Tokudb, with its “fractal” indexing strategy builds the indexes in stages. In contrast, InnoDB inserts index entries “immediately” — actually that indexing is buffered by most of the size of the buffer_pool. To elaborate…

    When adding a record to an InnoDB table, here are (roughly) the steps performed to write the data (and PK) and secondary indexes to disk. (I leave out logging, provision for rollback, etc.) First the PRIMARY KEY and data:

    • Check for UNIQUEness constraints

    • Fetch the BTree block (normally 16KB) that should contain the row (based on the PRIMARY KEY).

    • Insert the row (overflow typically occurs 1% of the time; this leads to a block split).

    • Leave the page “dirty” in the buffer_pool, hoping that more rows are added before it is bumped out of cache (buffer_pool).. Note that for AUTO_INCREMENT and TIMESTAMP-based PKs, the “last” block in the data will be updated repeatedly before splitting; hence, this delayed write adds greatly to the efficiency. OTOH, a UUID will be very random; when the table is big enough, the block will almost always be flushed before a second insert occurs in that block. <– This is the inefficiency in UUIDs. Now for any secondary keys:

    • All the steps are the same, since an index is essentially a "table" except that the "data" is a copy of the PRIMARY KEY.

    • UNIQUEness must be checked immediately — cannot delay the read.

    • There are (I think) some other "delays" that avoid some I/O.

    Tokudb, on the other hand, does something like

    • Write data/index partially sorted records to disk before finding out exactly where it belongs.

    • In the background, combine these partially digested blocks. Repeat as needed.

    • Eventually move the info into the real table/indexes.

    If you are familiar with how sort-merge works, consider the parallels to Tokudb. Each "sort" does some work of ordering things; each "merge" is quite efficient.

    To summarize:

    • In the extreme (data/index much larger than buffer_pool), InnoDB must read-modify-write one 16KB disk block for each UUID entry.

    • Tokudb makes each I/O "count" by merging several UUIDs for each disk block. (Yeah, Toku rereads blocks, but it comes out ahead in the long run.)

    • Tokudb excels when the table is really big, which implies high ingestion rate.

    Wrapup

    This shows three thing for speeding up usage of GUIDs/UUIDs:

    • Shrink footprint (Smaller -> more cacheable -> faster).

    • Rearrange uuid to make a "hot spot" to improve cachability.

    • Use TokuDB (MyRocks shares some architectural traits which may also be beneficial in handling UUIDs, but this is hypothetical and hasn't been tested)

    Note that the benefit of the "hot spot" is only partial:

    • Chronologically ordered (or approximately ordered) INSERTs benefit; random ones don't.

    • SELECTs/UPDATEs by "recent" uuids benefit; old ones don't benefit.

    Postlog

    Thanks to Trey for some of the ideas here.

    The tips in this document apply to MySQL, MariaDB, and Percona.

    Written Oct, 2012. Added TokuDB, Jan, 2015.

    See Also

    • UUID data type

    • Detailed discussion of UUID indexing

    • Graphical display of the random nature of UUID on PRIMARY KEY

    • Benchmarks, etc, by Karthik Appigatla

    • , but it seems to be backwards.

    Rick James graciously allowed us to use this article in the documentation.

    Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: uuid

    This page is licensed: CC BY-SA / Gnu FDL

    MariaDB's UUID function
    UUID_TO_BIN(str, swap_flag)
    1026-baba-6ccd780c-9564-0040f4311e29
          11e2-17a2-49ea2de3-8346-001eecac3efa
          11e2-17ac-106762a5-8346-001eecac3efa -- after a few more minutes
    DELIMITER //
    
        CREATE FUNCTION UuidToBin(_uuid BINARY(36))
            RETURNS BINARY(16)
            LANGUAGE SQL  DETERMINISTIC  CONTAINS SQL  SQL SECURITY INVOKER
        RETURN
            UNHEX(CONCAT(
                SUBSTR(_uuid, 15, 4),
                SUBSTR(_uuid, 10, 4),
                SUBSTR(_uuid,  1, 8),
                SUBSTR(_uuid, 20, 4),
                SUBSTR(_uuid, 25) ));
        //
        CREATE FUNCTION UuidFromBin(_bin BINARY(16))
            RETURNS BINARY(36)
            LANGUAGE SQL  DETERMINISTIC  CONTAINS SQL  SQL SECURITY INVOKER
        RETURN
            LCASE(CONCAT_WS('-',
                HEX(SUBSTR(_bin,  5, 4)),
                HEX(SUBSTR(_bin,  3, 2)),
                HEX(SUBSTR(_bin,  1, 2)),
                HEX(SUBSTR(_bin,  9, 2)),
                HEX(SUBSTR(_bin, 11))
                     ));
    
        //
        DELIMITER ;
    -- Letting MySQL create the UUID:
        INSERT INTO t (uuid, ...) VALUES (UuidToBin(UUID()), ...);
    
        -- Creating the UUID elsewhere:
        INSERT INTO t (uuid, ...) VALUES (UuidToBin(?), ...);
    
        -- Retrieving (point query using uuid):
        SELECT ... FROM t WHERE uuid = UuidToBin(?);
    
        -- Retrieving (other):
        SELECT UuidFromBin(uuid), ... FROM t ...;
    WHERE UuidFromBin(uuid) = '1026-baba-6ccd780c-9564-0040f4311e29' -- NO
    More details on the clock
    Percona benchmarks
    NHibernate can generate sequential GUIDs
    Galera slave threads

    Index Hints: How to Force Query Plans

    The optimizer is largely cost-based and will try to choose the optimal plan for any query. However in some cases it does not have enough information to choose a perfect plan and in these cases you may have to provide hints to force the optimizer to use another plan.

    You can examine the query plan for a SELECT by writing EXPLAIN before the statement. SHOW EXPLAIN shows the output of a running query. In some cases, its output can be closer to reality than EXPLAIN.

    For the following queries, we will use the world database for the examples.

    Setting up the World Example Database

    Download it from

    Install it with:

    or

    Forcing Join Order

    You can force the join order by using either in the or part.

    The simplest way to force the join order is to put the tables in the correct order in the FROM clause and use SELECT STRAIGHT_JOIN like so:

    If you only want to force the join order for a few tables, useSTRAIGHT_JOIN in the FROM clause. When this is done, only tables connected with STRAIGHT_JOIN will have their order forced. For example:

    In both of the above cases Country will be scanned first and for each matching country (one in this case) all rows in City will be checked for a match. As there is only one matching country this will be faster than the original query.

    The output of for the above cases is:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    This is one of the few cases where ALL is ok, as the scan of theCountry table will only find one matching row.

    Forcing Usage of a Specific Index for the WHERE Clause

    In some cases the optimizer may choose a non-optimal index or it may choose to not use an index at all, even if some index could theoretically be used.

    In these cases you have the option to either tell the optimizer to only use a limited set of indexes, ignore one or more indexes, or force the usage of some particular index.

    USE INDEX: Use a Limited Set of Indexes

    You can limit which indexes are considered with the option.

    The default is 'FOR JOIN', which means that the hint only affects how theWHERE clause is optimized.

    USE INDEX is used after the table name in the FROM clause.

    Example:

    This produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    If we had not used , the Name index would have been inpossible keys.

    IGNORE INDEX: Don't Use a Particular Index

    You can tell the optimizer to not consider some particular index with the option.

    This is used after the table name in the FROM clause:

    This produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    The benefit of using IGNORE_INDEX instead of USE_INDEX is that it will not disable a new index which you may add later.

    Also see for an option to specify in the index definition that indexes should be ignored.

    FORCE INDEX: Forcing an Index

    to be used is mostly useful when the optimizer decides to do a table scan even if you know that using an index would be better. (The optimizer could decide to do a table scan even if there is an available index when it believes that most or all rows will match and it can avoid the overhead of using the index).

    This produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    FORCE_INDEX works by only considering the given indexes (like withUSE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.

    Index Prefixes

    When using index hints (USE, FORCE or ), the index name value can also be an unambiguous prefix of an index name.

    Forcing an Index to be Used for ORDER BY or GROUP BY

    The optimizer will try to use indexes to resolve and .

    You can use , and as in the WHERE clause above to ensure that some specific index used:

    This is used after the table name in the FROM clause.

    Example:

    This produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    Without the option we would have 'Using where; Using temporary; Using filesort' in the 'Extra' column, which means that the optimizer would created a temporary table and sort it.

    Help the Optimizer Optimize GROUP BY and ORDER BY

    The optimizer uses several strategies to optimize and :

    • Resolve with an index:

      • Scan the table in index order and output data as we go. (This only works if the / can be resolved by an index after constant propagation is done).

    • Filesort:

      • Scan the table to be sorted and collect the sort keys in a temporary file.

    A temporary table will always be used if the fields which will be sorted are not from the first table in the order.

    • Use a temporary table for :

      • Create a temporary table to hold the result with an index that matches the fields.

      • Produce a result row

      • If a row with the key exists in the temporary table, add the new result row to it. If not, create a new row.

    Forcing/Disallowing TemporaryTables to be Used for GROUP BY:

    Using an in-memory table (as described above) is usually the fastest option for if the result set is small. It is not optimal if the result set is very big. You can tell the optimizer this by usingSELECT SQL_SMALL_RESULT or SELECT SQL_BIG_RESULT.

    For example:

    produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    while:

    produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    The difference is that with SQL_SMALL_RESULT a temporary table is used.

    Forcing Usage of Temporary Tables

    In some cases you may want to force the use of a temporary table for the result to free up the table/row locks for the used tables as quickly as possible.

    You can do this with the SQL_BUFFER_RESULT option:

    This produces:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    Without SQL_BUFFER_RESULT, the above query would not use a temporary table for the result set.

    Optimizer Switch

    In we added an which allows you to specify which algorithms will be considered when optimizing a query.

    See the section for more information about the different algorithms which are used.

    See Also

    This page is licensed: CC BY-SA / Gnu FDL

    NULL

    NULL

    239

    Using where

    1

    SIMPLE

    City

    ALL

    NULL

    NULL

    NULL

    NULL

    4079

    Using where; Using join buffer (flat, BNL join)

    3

    const

    14

    Using where

    3

    const

    14

    Using where

    35

    NULL

    4079

    Using where

    35

    NULL

    4079

    Using where

  • Sort the keys + reference to row (with filesort)

  • Scan the table in sorted order

  • Use a temporary table for ORDER BY:

    • Create a temporary (in memory) table for the 'to-be-sorted' data. (If this gets bigger than max_heap_table_size or contains blobs then an Aria or MyISAM disk based table will be used)

    • Sort the keys + reference to row (with filesort)

    • Scan the table in sorted order

  • Before sending the results to the user, sort the rows with filesort to get the results in the GROUP BY order.

    NULL

    NULL

    4079

    Using temporary; Using filesort

    NULL

    NULL

    4079

    Using filesort

    35

    NULL

    4079

    Using index; Using temporary

    Ignored Indexes

    1

    SIMPLE

    Country

    ALL

    PRIMARY

    1

    SIMPLE

    City

    ref

    CountryCode

    1

    SIMPLE

    City

    ref

    CountryCode

    1

    SIMPLE

    City

    range

    Name

    1

    SIMPLE

    City

    index

    NULL

    1

    SIMPLE

    City

    ALL

    NULL

    1

    SIMPLE

    City

    ALL

    NULL

    1

    SIMPLE

    City

    index

    NULL

    world.sql.gz
    STRAIGHT_JOIN
    SELECT
    JOIN
    EXPLAIN
    USE INDEX
    USE INDEX
    IGNORE INDEX
    Ignored Indexes
    Forcing an index
    IGNORE INDEX
    ORDER BY
    GROUP BY
    USE INDEX
    IGNORE INDEX
    FORCE INDEX
    FORCE INDEX
    GROUP BY
    ORDER BY
    ORDER BY
    GROUP BY
    JOIN
    GROUP BY
    GROUP BY
    GROUP BY
    GROUP BY
    GROUP BY
    optimizer switch
    optimizer
    FORCE INDEX
    USE INDEX
    IGNORE INDEX
    GROUP BY

    NULL

    CountryCode

    CountryCode

    Name

    Name

    NULL

    NULL

    Name

    mariadb-admin create world
    zcat world.sql.gz | ../client/mysql world
    mariadb-admin create world
    gunzip world.sql.gz
    ../client/mysql world < world.sql
    SELECT STRAIGHT_JOIN SUM(City.Population) FROM Country,City WHERE
    City.CountryCode=Country.Code AND Country.HeadOfState="Volodymyr Zelenskyy";
    SELECT SUM(City.Population) FROM Country STRAIGHT_JOIN City WHERE
    City.CountryCode=Country.Code AND Country.HeadOfState="Volodymyr Zelenskyy";
    USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
    CREATE INDEX Name ON City (Name);
    CREATE INDEX CountryCode ON City (Countrycode);
    EXPLAIN SELECT Name FROM City USE INDEX (CountryCode)
    WHERE name="Helsingborg" AND countrycode="SWE";
    IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
    CREATE INDEX Name ON City (Name);
    CREATE INDEX CountryCode ON City (Countrycode);
    EXPLAIN SELECT Name FROM City IGNORE INDEX (Name)
    WHERE name="Helsingborg" AND countrycode="SWE";
    CREATE INDEX Name ON City (Name);
    EXPLAIN SELECT Name,CountryCode FROM City FORCE INDEX (Name)
    WHERE name>="A" AND CountryCode >="A";
    USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
    CREATE INDEX Name ON City (Name);
    EXPLAIN SELECT Name,COUNT(*) FROM City
    FORCE INDEX FOR GROUP BY (Name)
    WHERE population >= 10000000 GROUP BY Name;
    EXPLAIN SELECT SQL_SMALL_RESULT Name,Count(*) AS Cities 
    FROM City GROUP BY Name HAVING Cities > 2;
    EXPLAIN SELECT SQL_BIG_RESULT Name,Count(*) AS Cities 
    FROM City GROUP BY Name HAVING Cities > 2;
    CREATE INDEX Name ON City (Name);
    EXPLAIN SELECT SQL_BUFFER_RESULT Name,COUNT(*) AS Cities FROM City
    GROUP BY Name HAVING Cities > 2;
    Batched Key Access
    Normalization
    Normalization
    normalization
    submit a bug report
    Threadpool Benchmarks

    Full-Text Index Stopwords

    Stopwords are used to provide a list of commonly-used words that can be ignored for the purposes of Full-text-indexes.

    Full-text indexes built in MyISAM and InnoDB have different stopword lists by default.

    MyISAM Stopwords

    For full-text indexes on MyISAM tables, by default, the list is built from the file storage/myisam/ft_static.c, and searched using the server's character set and collation. The ft_stopword_file system variable allows the default list to be overridden with words from another file, or for stopwords to be ignored altogether.

    If the stopword list is changed, any existing full-text indexes need to be rebuilt

    The following table shows the default list of stopwords, although you should always treat storage/myisam/ft_static.c as the definitive list. See the for more details, and for related articles.

    InnoDB Stopwords

    Stopwords on full-text indexes are only enabled if the system variable is set (by default it is) at the time the index was created.

    The stopword list is determined as follows:

    • If the system variable is set, that table is used as a stopword list.

    • If innodb_ft_user_stopword_table is not set, the table set by is used.

    • If neither variable is set, the built-in list is used, which can be viewed by querying the in the .

    In the first two cases, the specified table must exist at the time the system variable is set and the full-text index created. It must be an InnoDB table with a single column, a named VALUE.

    The default InnoDB stopword list differs from the default MyISAM list, being much shorter, and contains the following words:

    This page is licensed: CC BY-SA / Gnu FDL

    Block Hash Join
    Why is ORDER BY in a FROM subquery ignored?
    progress reporting
    Batched Key Access
    Relational databases: Foreign Keys
    progress reporting
    Why ORDER BY in subquery is ignored
    extended_keys=on
    join_cache_bka=on
    join_cache_hashed=on
    join_cache_incremental=on
    optimize_join_buffer_size=on
    outer_join_with_cache=on
    semijoin_with_cache=on

    almost

    alone

    along

    already

    also

    although

    always

    am

    among

    amongst

    an

    and

    another

    any

    anybody

    anyhow

    anyone

    anything

    anyway

    anyways

    anywhere

    apart

    appear

    appreciate

    appropriate

    are

    aren't

    around

    as

    aside

    ask

    asking

    associated

    at

    available

    away

    awfully

    be

    became

    because

    become

    becomes

    becoming

    been

    before

    beforehand

    behind

    being

    believe

    below

    beside

    besides

    best

    better

    between

    beyond

    both

    brief

    but

    by

    c'mon

    c's

    came

    can

    can't

    cannot

    cant

    cause

    causes

    certain

    certainly

    changes

    clearly

    co

    com

    come

    comes

    concerning

    consequently

    consider

    considering

    contain

    containing

    contains

    corresponding

    could

    couldn't

    course

    currently

    definitely

    described

    despite

    did

    didn't

    different

    do

    does

    doesn't

    doing

    don't

    done

    down

    downwards

    during

    each

    edu

    eg

    eight

    either

    else

    elsewhere

    enough

    entirely

    especially

    et

    etc

    even

    ever

    every

    everybody

    everyone

    everything

    everywhere

    ex

    exactly

    example

    except

    far

    few

    fifth

    first

    five

    followed

    following

    follows

    for

    former

    formerly

    forth

    four

    from

    further

    furthermore

    get

    gets

    getting

    given

    gives

    go

    goes

    going

    gone

    got

    gotten

    greetings

    had

    hadn't

    happens

    hardly

    has

    hasn't

    have

    haven't

    having

    he

    he's

    hello

    help

    hence

    her

    here

    here's

    hereafter

    hereby

    herein

    hereupon

    hers

    herself

    hi

    him

    himself

    his

    hither

    hopefully

    how

    howbeit

    however

    i'd

    i'll

    i'm

    i've

    ie

    if

    ignored

    immediate

    in

    inasmuch

    inc

    indeed

    indicate

    indicated

    indicates

    inner

    insofar

    instead

    into

    inward

    is

    isn't

    it

    it'd

    it'll

    it's

    its

    itself

    just

    keep

    keeps

    kept

    know

    knows

    known

    last

    lately

    later

    latter

    latterly

    least

    less

    lest

    let

    let's

    like

    liked

    likely

    little

    look

    looking

    looks

    ltd

    mainly

    many

    may

    maybe

    me

    mean

    meanwhile

    merely

    might

    more

    moreover

    most

    mostly

    much

    must

    my

    myself

    name

    namely

    nd

    near

    nearly

    necessary

    need

    needs

    neither

    never

    nevertheless

    new

    next

    nine

    no

    nobody

    non

    none

    noone

    nor

    normally

    not

    nothing

    novel

    now

    nowhere

    obviously

    of

    off

    often

    oh

    ok

    okay

    old

    on

    once

    one

    ones

    only

    onto

    or

    other

    others

    otherwise

    ought

    our

    ours

    ourselves

    out

    outside

    over

    overall

    own

    particular

    particularly

    per

    perhaps

    placed

    please

    plus

    possible

    presumably

    probably

    provides

    que

    quite

    qv

    rather

    rd

    re

    really

    reasonably

    regarding

    regardless

    regards

    relatively

    respectively

    right

    said

    same

    saw

    say

    saying

    says

    second

    secondly

    see

    seeing

    seem

    seemed

    seeming

    seems

    seen

    self

    selves

    sensible

    sent

    serious

    seriously

    seven

    several

    shall

    she

    should

    shouldn't

    since

    six

    so

    some

    somebody

    somehow

    someone

    something

    sometime

    sometimes

    somewhat

    somewhere

    soon

    sorry

    specified

    specify

    specifying

    still

    sub

    such

    sup

    sure

    t's

    take

    taken

    tell

    tends

    th

    than

    thank

    thanks

    thanx

    that

    that's

    thats

    the

    their

    theirs

    them

    themselves

    then

    thence

    there

    there's

    thereafter

    thereby

    therefore

    therein

    theres

    thereupon

    these

    they

    they'd

    they'll

    they're

    they've

    think

    third

    this

    thorough

    thoroughly

    those

    though

    three

    through

    throughout

    thru

    thus

    to

    together

    too

    took

    toward

    towards

    tried

    tries

    truly

    try

    trying

    twice

    two

    un

    under

    unfortunately

    unless

    unlikely

    until

    unto

    up

    upon

    us

    use

    used

    useful

    uses

    using

    usually

    value

    various

    very

    via

    viz

    vs

    want

    wants

    was

    wasn't

    way

    we

    we'd

    we'll

    we're

    we've

    welcome

    well

    went

    were

    weren't

    what

    what's

    whatever

    when

    whence

    whenever

    where

    where's

    whereafter

    whereas

    whereby

    wherein

    whereupon

    wherever

    whether

    which

    while

    whither

    who

    who's

    whoever

    whole

    whom

    whose

    why

    will

    willing

    wish

    with

    within

    without

    won't

    wonder

    would

    wouldn't

    yes

    yet

    you

    you'd

    you'll

    you're

    you've

    your

    yours

    yourself

    yourselves

    zero

    is

    it

    la

    of

    on

    or

    that

    the

    this

    to

    was

    what

    when

    where

    who

    will

    with

    und

    the

    www

    a's

    able

    about

    above

    according

    accordingly

    across

    actually

    after

    afterwards

    again

    against

    ain't

    all

    allow

    allows

    a

    about

    an

    are

    as

    at

    be

    by

    com

    de

    en

    for

    from

    how

    i

    in

    Fulltext Index Overview
    Full-text-indexes
    innodb_ft_enable_stopword
    innodb_ft_user_stopword_table
    innodb_ft_server_stopword_table
    INNODB_FT_DEFAULT_STOPWORD table
    Information Schema
    VARCHAR
    MariaDB 10.1.7
    MariaDB 10.1.7
    MariaDB 10.1.7
    compatibility pages in the release notes
    10.7.0
    MariaDB 10.7.0
    MariaDB 10.7.0
    MariaDB 10.7.0
    MariaDB 10.5.25
    MariaDB 11.0.6
    MariaDB 11.1.5
    MariaDB 11.2.4
    MariaDB 11.0.2
    MariaDB 10.10.4
    MariaDB 10.9.6
    MariaDB 10.8.8
    MariaDB 11.1
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 10.0
    MariaDB 11.3
    MariaDB 10.0.1
    MariaDB 10.0.2
    MariaDB 10.4.3
    MariaDB 10.1.1
    MariaDB 10.10
    MariaDB 10.3
    MariaDB 10.4
    MariaDB 5.3
    MariaDB 5.3
    What is MariaDB 5.3
    MariaDB 10.1.1
    MariaDB 10.1.1
    MariaDB 10.6.18
    MariaDB 10.11.8
    MariaDB 11.0.6
    MariaDB 11.1.5
    MariaDB 11.2.4
    MariaDB 11.4.2
    MariaDB 11.0
    11.0
    MariaDB 11.0
    MariaDB 10.5.0
    MariaDB 11.5
    MariaDB 10.0.3
    MariaDB 10.4
    MariaDB 10.5
    MariaDB 11.0
    MariaDB 11.1.0
    MariaDB 10.1
    MariaDB 10.5
    MariaDB 10.10.7
    MariaDB 11.0.4
    MariaDB 11.1.3
    MariaDB 11.2.2
    MariaDB 11.7
    MariaDB 10.5
    MariaDB 10.2
    MariaDB 10.2.2
    MariaDB 10.4
    MariaDB 10.4.3
    MariaDB 11.0.4
    MariaDB 11.1.3
    MariaDB 11.2.2
    MariaDB 5.3
    MariaDB 11.7.0
    MariaDB 11.3.1
    MariaDB 11.0.4
    MariaDB 11.1.3
    MariaDB 11.2.2
    MariaDB 11.0.2
    MariaDB 5.3
    MariaDB 5.1
    MariaDB 5.5
    MariaDB 10.1
    MariaDB 11.3
    MariaDB 10.0
    MariaDB 5.5.21
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.1
    MariaDB 5.1
    MariaDB 5.3
    MariaDB 5.1
    MariaDB 5.1
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 10.5
    MariaDB 5.3
    MariaDB 10.1.15
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 10.4.3
    MariaDB 11.3.0
    MariaDB 5.3
    MariaDB 5.3
    MariaDB 10.3.4
    MariaDB 5.3
    MariaDB 5.1
    MariaDB 10.5
    MariaDB 5.3