Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Optimize MariaDB Server for high availability and performance. Learn about replication, clustering, load balancing, and configuration tuning for robust and efficient database solutions.
Optimize MariaDB Server for high availability and performance. Learn about replication, clustering, load balancing, and configuration tuning for robust and efficient database solutions.
Covers essential configurations to maximize throughput and responsiveness for your database workloads.
Delves into how the database engine enhances query execution, data storage, and overall performance through its core architecture.
Covers configuring your OS for improved I/O, memory management, and network settings to maximize database efficiency.
Covers index types, creation, and best practices for leveraging them to significantly improve query performance and data retrieval speed.
Optimizer hints are options available that affect the execution plan.
Details how to apply data compression at various levels to reduce disk space and improve I/O efficiency.
Covers schema design, data types, and normalization techniques to improve query efficiency and storage utilization.
Covers various techniques, including proper indexing, data types, and storage engine choices, to improve query speed and efficiency.
Provides techniques for writing efficient SQL, understanding query execution plans, and leveraging indexes effectively to speed up your queries.
Optimize MariaDB Server with system variables, configuring various parameters to fine-tune performance, manage resources, and adapt the database to your specific workload requirements.
Optimize MariaDB Server performance by tuning buffers, caches, and threads. This section covers essential configurations to maximize throughput and responsiveness for your database workloads.
Optimize MariaDB Server with the thread pool. This section explains how to manage connections and improve performance by efficiently handling concurrent client requests, reducing resource overhead.
Understand MariaDB Server thread states. This section explains the different states a thread can be in, helping you monitor and troubleshoot query execution and server performance.
Explore MariaDB Server's internal optimizations. This section delves into how the database engine enhances query execution, data storage, and overall performance through its core architecture.
Optimize MariaDB Server queries with indexes. This section covers index types, creation, and best practices for leveraging them to significantly improve query performance and data retrieval speed.
Implement full-text indexes in MariaDB Server for efficient text search. This section guides you through creating and utilizing these indexes to optimize queries on large text datasets.
Optimize MariaDB Server performance and storage with compression. This section details how to apply data compression at various levels to reduce disk space and improve I/O efficiency.
Optimize MariaDB Server performance by refining your data structure. This section covers schema design, data types, and normalization techniques to improve query efficiency and storage utilization.
Optimize derived tables in MariaDB Server queries. This section provides techniques and strategies to improve the performance of subqueries and complex joins, enhancing overall query efficiency.
Optimize tables for enhanced performance. This section covers various techniques, including proper indexing, data types, and storage engine choices, to improve query speed and efficiency.
A large numeric value is stored in far fewer bytes than the equivalent string value. It is therefore faster to move and compare numeric data, so it's best to choose numeric columns for unique id's and other similar fields.
This page is licensed: CC BY-SA / Gnu FDL
When values from different columns are compared, the comparison runs more quickly when the columns are of the same character set and collation. If they are different, the strings need to be converted while the query runs. So, where possible, declare string columns using the same character set and collation when you may need to compare them.
ORDER BY and GROUP BY clauses can generate temporary tables in memory (see ) if the original table doesn't contain any BLOB fields. If a column is less than 8KB, you can make use of a Binary VARCHAR rather than a BLOB.
This page is licensed: CC BY-SA / Gnu FDL
This article documents thread states that are related to scheduling and execution. These include the Event Scheduler thread, threads that terminate the Event Scheduler, and threads for executing events.
These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the
This article documents thread states that are related to the . These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .
This article documents thread states that are related to connection threads that occur on a replicatioin replica. These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .
are a good choice for data that needs to be accessed often, and is rarely updated. Being in memory, it's not suitable for critical data or for storage, but if data can be moved to memory for reading without needing to be regenerated often, if at all, it can provide a significant performance boost.
The has a key feature in that it permits its indexes to be either B-tree or Hash. Choosing the best index type can lead to better performance. See for more on the characteristics of each index type.
This page is licensed: CC BY-SA / Gnu FDL
Optimize queries for peak performance. This section provides techniques for writing efficient SQL, understanding query execution plans, and leveraging indexes effectively to speed up your queries.
Discover effective optimization strategies for MariaDB Server queries. This section provides a variety of techniques and approaches to enhance query performance and overall database efficiency.
Optimize MariaDB Server performance with operating system tuning. This section covers configuring your OS for improved I/O, memory management, and network settings to maximize database efficiency.
Waiting for next activation
The event queue contains items, but the next activation is at some time in the future.
Waiting for scheduler to stop
Waiting for the event scheduler to stop after issuing SET GLOBAL event_scheduler=OFF.
Waiting on empty queue
Sleeping, as the event scheduler's queue is empty.
This page is licensed: CC BY-SA / Gnu FDL
Clearing
Thread is terminating.
Initialized
Thread has be initialized.
sending cached result to client
A result found in the query cache is being sent to the client.
storing result in query cache
Saving the result of a query into the query cache.
Waiting for query cache lock
Waiting to take a query cache lock.
This page is licensed: CC BY-SA / Gnu FDL
checking privileges on cached query
Checking whether the user has permission to access a result in the query cache.
checking query cache for query
Checking whether the current query exists in the query cache.
invalidating query cache entries
Marking query cache entries as invalid as the underlying tables have changed.
Reading master dump table data
After the table created by a master dump (the Opening master dump table state) the table is now being read.
Rebuilding the index on master dump table
After the table created by a master dump has been opened and read (the Reading master dump table data state), the index is built.
This page is licensed: CC BY-SA / Gnu FDL
Changing master
Processing a CHANGE MASTER TO statement.
Killing slave
Processing a STOP SLAVE statement.
Opening master dump table
A table has been created from a master dump and is now being opened.
This article documents thread states that are related to the handler thread that inserts the results of INSERT DELAYED statements.
These correspond to the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.
insert
About to insert rows into the table.
reschedule
Sleeping in order to let other threads function, after inserting a number of rows into the table.
This page is licensed: CC BY-SA / Gnu FDL
This article documents thread states that are related to replication master threads. These correspond to the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.
Finished reading one binlog; switching to next binlog
After completing one , the next is being opened for sending to the slave.
Master has sent all binlog to slave; waiting for binlog to be updated
All events have been read from the binary logs and sent to the slave. Now waiting for the binary log to be updated with new events.
Sending binlog event to slave
This page is licensed: CC BY-SA / Gnu FDL
This article documents thread states that are related to replication slave SQL threads. These correspond to the Slave_SQL_State shown by SHOW SLAVE STATUS as well as the STATE values listed by the SHOW PROCESSLIST statement and the Information Schema PROCESSLIST as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.
Apply log event
Log event is being applied.
Making temp file
Creating a temporary file containing the row data as part of a statement.
This page is licensed: CC BY-SA / Gnu FDL
The filesystem is not the most important aspect of MariaDB performance. More important are the available memory (RAM), the drive speed, and the system variable settings (see Hardware Optimization and System Variables).
Optimizing the filesystem can, however, make a noticeable difference in some cases. Among the best suited Linux filesystems are ext4, XFS and Btrfs. They are all included in the mainline Linux kernel and are widely supported and available on most Linux distributions.
The following theoretical file size and filesystem size limits apply to those filesystems:
Each has unique characteristics that are worth understanding to get the most from their usage.
It's unlikely you'll need to record file access time on a database server, and mounting your filesystem with this disabled can give an easy improvement in performance. To do so, use the noatime option.
If you want to keep access time for or other system files, these can be stored on a separate drive.
Generally, we recommend not to use (Network File System) with MariaDB, for these reasons:
MariaDB data and log files on NFS volumes can become locked and unavailable for use. Locking issues may occur in cases where multiple instances of MariaDB access the same data directory, or when MariaDB is shut down improperly, for instance, due to a power outage. In particular, sharing a data directory among MariaDB instances is not recommended.
Data inconsistencies due to messages received out of order or lost network traffic. To avoid this issue, use TCP with hard and intr mount options.
Using NFS within a professional SAN environment or other storage system tends to offer greater reliability than using NFS outside of such an environment. However, NFS within a SAN environment may be slower than directly attached or bus-attached non-rotational storage.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB deals with primary keys over nullable columns according to the SQL standards.
Take the following table structure:
CREATE TABLE t1(
c1 INT NOT NULL AUTO_INCREMENT,
c2 INT NULL DEFAULT NULL,
PRIMARY KEY(c1,c2)
);Column c2 is part of a primary key, and thus it cannot be NULL.
Before , MariaDB (as well as versions of MySQL before MySQL 5.7) would silently convert it into a NOT NULL column with a default value of 0.
Since , the column is converted to NOT NULL, but without a default value. If we then attempt to insert a record without explicitly setting c2, a warning (or, in strict mode, an error), will be thrown, for example:
MySQL, since 5.7, will abort such a CREATE TABLE with an error.
The behavior adheres to the SQL 2003 standard.
SQL-2003, Part II, “Foundation” says:
**11.7 **Syntax Rules
…
5) If the specifies PRIMARY KEY, then for each in the explicit or implicit for which NOT NULL is not specified, NOT NULL is implicit in the .
Essentially this means that all PRIMARY KEY columns are automatically converted to NOT NULL. Furthermore:
11.5 General Rules
…
3) When a site S is set to its default value,
…
b) If the data descriptor for the site includes a , then S is set to the value specified by that .
…
e) Otherwise, S is set to the null value.
There is no concept of “no default value” in the standard. Instead, a column always has an implicit default value of NULL. On insertion it might however fail the NOT NULL constraint. MariaDB and MySQL instead mark such a column as “not having a default value”. The end result is the same — a value must be specified explicitly or an INSERT will fail.
MariaDB since 10.1.7 behaves in a standard compatible manner — being part of a PRIMARY KEY, the nullable column gets an automatic NOT NULL constraint, on insertion one must specify a value for such a column. MariaDB before 10.1.7 was automatically assigning a default value of 0 — this behavior was non-standard. Issuing an error at CREATE TABLE time is also non-standard.
describes an edge-case that may result in replication problems when replicating from a master server before this change to a slave server after this change.
This page is licensed: CC BY-SA / Gnu FDL
HIGH_PRIORITY gives the statement a higher priority. If the table is locked, high priority SELECTs will be executed as soon as the lock is released, even if other statements are queued. HIGH_PRIORITY applies only if the storage engine only supports table-level locking (MyISAM, MEMORY, MERGE). See for details.
If the system variable is set to 2 or DEMAND, and the current statement is cacheable, SQL_CACHE causes the query to be cached and SQL_NO_CACHE causes the query not to be cached. For UNIONs, SQL_CACHE or SQL_NO_CACHE should be specified for the first query. See also for more detail and a list of the types of statements that aren't cacheable.
SQL_BUFFER_RESULT forces the optimizer to use a temporary table to process the result. This is useful to free locks as soon as possible.
SQL_SMALL_RESULT and SQL_BIG_RESULT tell the optimizer whether the result is very big or not. Usually, GROUP BY and DISTINCT operations are performed using a temporary table. Only if the result is very big, using a temporary table is not convenient. The optimizer automatically knows if the result is too big, but you can force the optimizer to use a temporary table with SQL_SMALL_RESULT, or avoid the temporary table using SQL_BIG_RESULT.
STRAIGHT_JOIN applies to the queries, and tells the optimizer that the tables must be read in the order they appear in the SELECT. For const and system table this options is sometimes ignored.
SQL_CALC_FOUND_ROWS is only applied when using the LIMIT clause. If this option is used, MariaDB will count how many rows would match the query, without the LIMIT clause. That number can be retrieved in the next query, using .
USE INDEX, FORCE INDEX and IGNORE INDEX constrain the query planning to a specific index. For further information about some of these options, see .
Forcing an index to be used is mostly useful when the optimizer decides to do a table scan even if you know that using an index would be better. (The optimizer could decide to do a table scan even if there is an available index when it believes that most or all rows will match and it can avoid the overhead of using the index).
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.
FORCE INDEX cannot force an ignored index to be used - it will be treated as if it doesn't exist.
This produces:
When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.
for more details
This page is licensed: CC BY-SA / Gnu FDL
You can limit which indexes are considered with the USE INDEX option.
The default is 'FOR JOIN', which means that the hint only affects how the WHERE clause is optimized.
USE INDEX is used after the table name in the FROM clause.
USE INDEX cannot use an - it will be treated as if it doesn't exist.
When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.
This produces:
If we had not used USE INDEX, the Name index would have been in possible keys.
for more details
This page is licensed: CC BY-SA / Gnu FDL
You can tell the optimizer to not consider a particular index with the IGNORE INDEX option.
The benefit of using IGNORE_INDEX instead of USE_INDEX is that it will not disable a new index which you may add later.
Also see for an option to specify in the index definition that indexes should be ignored.
When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.
This is used after the table name in the FROM clause:
This produces:
See for more details
This page is licensed: CC BY-SA / Gnu FDL
This article documents thread states that are related to the connection thread that processes statements.
These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the .
index_merge is a method used by the optimizer to retrieve rows from a single table using several index scans. The results of the scans are then merged.
When using , if index_merge is the plan chosen by the optimizer, it will show up in the "type" column. For example:
The "rows" column gives us a way to compare efficiency between index_merge and other plans.
It is sometimes necessary to discard index_merge in favor of a different plan to avoid a combinatorial explosion of possible range and/or index_merge strategies. But, the old logic in MySQL for when index_merge was rejected caused some good index_merge plans to not even be considered. Specifically,
additional AND predicates in WHERE clauses could cause an index_merge plan to be rejected in favor of a less efficient plan. The slowdown could be anywhere from 10x to over 100x. Here are two examples (based on the previous query) using MySQL:
LooseScan is an execution strategy for .
We will demonstrate the LooseScan strategy by example. Suppose, we're looking for countries that have satellites. We can get them using the following query (for the sake of simplicity we ignore satellites that are owned by consortiums of multiple countries):
Suppose, there is an index on Satellite.country_code. If we use that index, we will get satellites in the order of their owner country:
Prior to , the index_merge access method supported union,sort-union, and intersection operations. Starting from , thesort-intersection operation is also supported. This allows the use ofindex_merge in a broader number of cases.
This feature is disabled by default. To enable it, turn on the optimizer switch index_merge_sort_intersection like so:
This refers to the index_type definition when creating an index, i.e. BTREE, HASH or RTREE.
For more information on general types of indexes, such as primary keys, unique indexes etc, go to .
MariaDB starting with
Compressions plugins were added in a preview release.
The various MariaDB storage engines, such as , , , can use different compression libraries.
Before , each separate library would have to be compiled in order to be available for use, resulting in numerous runtime/rpm/deb dependencies, most of which would never be used by users.
From , five additional MariaDB compression libraries (besides the default zlib) are available as plugins (note that these affect InnoDB and Mroonga only; RocksDB still uses the compression algorithms from its own library):
bzip2
One can use DISTINCT keyword to de-duplicate the arguments of an aggregate function. For example:
In order to compute this, MariaDB has to collect the values of col1 and remove the duplicates. This may be computationally expensive.
After the fix for (available from , , , , , , ), the optimizer can detect certain cases when argument of aggregate function will not have duplicates and so de-duplication can be skipped.
MariaDB starting with
The hash_join_cardinality optimizer_switch flag was added in , , , , and .
In MySQL and MariaDB, the output cardinality of a part of query has historically been tied to the used access method(s). This is different from the approach used in database textbooks. There, the cardinality "x JOIN y" is the same regardless of which access methods are used to compute it.
Consider a query joining customers with their orders:
Starting from , conditions in the form
are sargable, provided that
CMP is any of =, <=>, <, <=, >, >= .
INSERT INTO t1() VALUES();
Query OK, 1 row affected, 1 warning (0.00 sec)
Warning (Code 1364): Field 'c2' doesn't have a default value
SELECT * FROM t1;
+----+----+
| c1 | c2 |
+----+----+
| 1 | 0 |
+----+----+USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])This is another name for Lateral Derived Optimization.
This page is licensed: CC BY-SA / Gnu FDL
upgrading lock
Attempting to get lock on the table in order to insert rows.
Waiting for INSERT
Waiting for the delayed-insert connection thread to add rows to the queue.
An event has been read from the binary log, and is now being sent to the slave.
Waiting to finalize termination
State that only occurs very briefly while the thread is terminating.
Reading event from the relay log
Reading an event from the relay log in order to process the event.
Slave has read all relay log, waiting for the slave I/O thread to update it
All relay log events have been processed, now waiting for the I/O thread to write new events to the relay log.
Waiting for work from SQL thread
In parallel replication the worker thread is waiting for more things from the SQL thread.
Waiting for prior transaction to start commit before starting next transaction
In parallel replication the worker thread is waiting for conflicting things to end before starting executing.
Waiting for worker threads to be idle
Happens in parallel replication when moving to a new binary log after a master restart. All slave temporary files are deleted, and worker threads are restarted.
Waiting due to global read lock
In parallel replication when worker threads are waiting for a global read lock to be released.
Waiting for worker threads to pause for global read lock
FLUSH TABLES WITH READ LOCK is waiting for worker threads to finish what they are doing.
Waiting while replication worker thread pool is busy
Happens in parallel replication during a FLUSH TABLES WITH READ LOCK or when changing number of parallel workers.
Waiting for other master connection to process GTID received on multiple master connections
A worker thread noticed that there is already another thread executing the same GTID from another connection and it's waiting for the other to complete.
Waiting for slave mutex on exit
Thread is stopping. Only occurs very briefly.
Waiting for the next event in relay log
State before reading next event from the relay log.
Links
Max file size
16-256 TB
8 EB
16 EB
Max filesystem size
1 EB
8 EB
16 EB
Lock to access the delayed-insert handler thread has been received. Follows from the waiting for handler lock state and before the allocating local table state.
got old table
The initialization phase is over. Follows from the waiting for handler open state.
storing row into queue
Adding new row to the list of rows to be inserted by the delayed-insert handler thread.
waiting for delay_list
Initializing (trying to find the delayed-insert handler thread).
waiting for handler insert
Waiting for new inserts, as all inserts have been processed.
waiting for handler lock
Waiting for delayed insert-handler lock to access the delayed-insert handler thread.
waiting for handler open
Waiting for the delayed-insert handler thread to initialize. Follows from the Creating delayed handler state and before the got old table state.
This page is licensed: CC BY-SA / Gnu FDL
allocating local table
Preparing to allocate rows to the delayed-insert handler thread. Follows from the got handler lock state.
Creating delayed handler
Creating a handler for the delayed-inserts.
got handler lock
BTREE is generally the default index type. For MEMORY tables, HASH is the default. TokuDB uses a particular data structure called fractal trees, which is optimized for data that do not entirely fit memory.
Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes. B-Tree Index Characteristics
B-tree indexes are used for column comparisons using the >, >=, =, >=, < or BETWEEN operators, as well as for LIKE comparisons that begin with a constant.
For example, the query SELECT * FROM Employees WHERE First_Name LIKE 'Maria%'; can make use of a B-tree index, while SELECT * FROM Employees WHERE First_Name LIKE '%aria'; cannot.
B-tree indexes also permit leftmost prefixing for searching of rows.
If the number or rows doesn't change, hash indexes occupy a fixed amount of memory, which is lower than the memory occupied by BTREE indexes.
Hash indexes, in contrast, can only be used for equality comparisons, so those using the = or <=> operators. They cannot be used for ordering, and provide no information to the optimizer on how many rows exist between two values.
Hash indexes do not permit leftmost prefixing - only the whole index can be used.
See SPATIAL for more information.
This page is licensed: CC BY-SA / Gnu FDL
BTREE, RTREE
BTREE, RTREE
BTREE
HASH, BTREE
CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,CountryCode FROM City FORCE INDEX (Name)
WHERE name>="A" AND CountryCode >="A";id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE City range Name Name 35 NULL 4079 Using whereCREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City USE INDEX (CountryCode)
WHERE name="Helsingborg" AND countrycode="SWE";id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE City ref CountryCode CountryCode 3 const 14 Using whereCREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City IGNORE INDEX (Name)
WHERE name="Helsingborg" AND countrycode="SWE";id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE City ref CountryCode CountryCode 3 const 14 Using whereThis is an extended version of the pool-of-threads code from MySQL 6.0. This allows you to use a limited set of threads to handle all queries, instead of the old 'one-thread-per-connection' style. In recent times, its also been referred to as "thread pool" or "thread pooling" as this feature (in a different implementation) is available in Enterprise editions of MySQL (not in the Community edition).
This can be a very big win if most of your queries are short running queries and there are few table/row locks in your system.
To enable pool-of-threads you must first run configure with the--with-libevent option. (This is automatically done if you use any 'max' scripts in the BUILD directory):
When starting mysqld with the pool of threads code you should use:
Default values are:
One issue with pool-of-threads is that if all worker threads are doing work (like running long queries) or are locked by a row/table lock no new connections can be established and you can't login and find out what's wrong or login and kill queries.
To help this, we have introduced two new options for mysqld; extra_port and extra_max_connections:
If extra-port is <> 0 then you can connect max_connections number of normal threads + 1 extra SUPER user through the 'extra-port' TCP/IP port. These connections use the old one-thread-per-connection method.
To connect with through the extra port, use:
This allows you to freely use, on connection bases, the optimal connection/thread model.
This page is licensed: CC BY-SA / Gnu FDL
index_merge.Starting in , the optimizer will delay discarding potentialindex_merge plans until the point where it is really necessary.
By not discarding potential index_merge plans until absolutely necessary,
the two queries stay just as efficient as the original:
This new behavior is always on and there is no need to enable it. There are no known issues or gotchas with this new optimization.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB [ontime]> SELECT COUNT(*) FROM ontime;
+--------+
|count(*)|
+--------+
| 1578171|
+--------+
MySQL [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA');
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+--------------------------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+--------------------------------------+
| 1|SIMPLE |ontime|index_merge|Origin,Dest |Origin,Dest|6,6 |NULL|92800|Using union (Origin,Dest); Using where|
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+--------------------------------------+MariaDB brought several improvements to the ORDER BY optimizer.
The fixes were made as a response to complaints by MariaDB customers, so they fix real-world optimization problems. The fixes are a bit hard to describe (as the ORDER BY optimizer is complicated), but here's a short description:
The ORDER BY optimizer:
Doesn’t make stupid choices when several multi-part keys and potential range accesses are present (MDEV-6402).
This also fixes MySQL Bug#12113.
Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT (MDEV-6657).
Tries hard to be smart and use cost/number of records estimates from other parts of the optimizer (, ).
This change also fixes .
Takes full advantage of InnoDB’s Extended Keys feature when checking if filesort() can be skipped ().
In MySQL 5.7 changelog, one can find this passage:
Make switching of index due to small limit cost-based (WL#6986) : We have made the decision in make_join_select() of whether to switch to a new index in order to support "ORDER BY ... LIMIT N" cost-based. This work fixes Bug#73837.
MariaDB is not using Oracle's fix (we believe make_join_select is not the right place to do ORDER BY optimization), but the effect is the same.
This page is licensed: CC BY-SA / Gnu FDL
LooseScan strategy doesn't really need ordering, what it needs is grouping. In the above figure, satellites are grouped by country. For instance, all satellites owned by Australia come together, without being mixed with satellites of other countries. This makes it easy to select just one satellite from each group, which you can join with its country and get a list of countries without duplicates:The EXPLAIN output for the above query looks as follows:
LooseScan avoids the production of duplicate record combinations by putting the subquery table first and using its index to select one record from multiple duplicates
Hence, in order for LooseScan to be applicable, the subquery should look like:
or
LooseScan can handle correlated subqueries
LooseScan can be switched off by setting the loosescan=off flag in the optimizer_switch variable.
This page is licensed: CC BY-SA / Gnu FDL

Prior to , the index_merge access method had one intersection strategy called intersection. That strategy can only be used when merged index scans produced rowid-ordered streams. In practice this means that anintersection could only be constructed from equality (=) conditions.
For example, the following query will use intersection:
but if you replace OriginState ='CA' with OriginState IN ('CA', 'GB') (which matches the same number of records), then intersection is not usable anymore:
The latter query would also run 5.x times slower (from 2.2 to 10.8 seconds) in our experiments.
In , when index_merge_sort_intersection is enabled,index_merge intersection plans can be constructed from non-equality
conditions:
In our tests, this query ran in 3.2 seconds, which is not as good as the case with two equalities, but still much better than 10.8 seconds we were getting without sort_intersect.
The sort_intersect strategy has higher overhead than intersect but is able to handle a broader set of WHERE conditions.
index_merge/sort_intersection works best on tables with lots of records and where intersections are sufficiently large (but still small enough to make a full table scan overkill).
The benefit is expected to be bigger for io-bound loads.
This page is licensed: CC BY-SA / Gnu FDL
lzma
lz4
lzo
snappy
Depending on how MariaDB was installed, the libraries may already be available for installation, or may first need to be installed as .deb or .rpm packages, for example:
Once available, install as a plugin, for example:
The compression algorithm can then be used, for example, in InnoDB compression:
When upgrading from a release without compression plugins, if a non-zlib compression algorithm was used, those tables will be unreadable until the appropriate compression library is installed. mariadb-upgrade should be run. The --force option (to run mariadb-check) or mariadb-check itself will indicate any problems with compression, for example:
or
In this case, the appropriate compression plugin should be installed, and the server restarted.
10.7 preview feature: Compression Provider Plugins (mariadb.org blog)
Add zstd as a compression plugin - MDEV-34290
This page is licensed: CC BY-SA / Gnu FDL
A basic example: if we're doing a select from one table, then the values of primary_key are already distinct:
If the SELECT has other constant tables, that's also ok, as they will not create duplicates.
The next step: a part of the primary key can be "bound" by the GROUP BY clause. Consider a query:
Suppose the table has PRIMARY KEY(pk1, pk2). Grouping by pk2 fixes the value of pk2 within each group. Then, the values of pk1 must be unique within each group, and de-duplication is not necessary.
EXPLAIN or EXPLAIN FORMAT=JSON do not show any details about how aggregate functions are computed. One has to look at the Optimizer Trace. Search for aggregator_type:
When de-duplication is necessary, it will show:
When de-duplication is not necessary, it will show:
This page is licensed: CC BY-SA / Gnu FDL
IDXorders.customer_idIf the query plan is using this index to fetch orders for each customer, the optimizer will use index statistics from IDX to estimate the number of rows in the customer-joined-with-orders.
On the other hand, if the optimizer considers a query plan that joins customer with orders without use of indexes, it will ignore the customer.id = orders.customer_id equality completely and will compute the
output cardinality as if customer was cross-joined with orders.
MariaDB supports . It is not enabled by default, one needs to set it join_cache_level to 3 or a bigger value to enable it.
Before MDEV-30812, Query optimization for Block Hash Join would work as described in the above example: It would assume that the join operation is a cross join.
MDEV-30812 introduces a new optimizer_switch flag, hash_join_cardinality. In MariaDB versions before 11.0, it is off by default.
If one sets it to ON, the optimizer will make use of column histograms when computing the cardinality of hash join operation output.
One can see the computation in the Optimizer Trace, search for hash_join_cardinality.
This page is licensed: CC BY-SA / Gnu FDL
indexed_date_col has a type of DATE, DATETIME or TIMESTAMP and is a part of some index.
One can swap the left and right hand sides of the equality: const_value CMP {DATE|YEAR}(indexed_date_col) is also handled.
Sargable here means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or use them to perform partition pruning.
Internally, the optimizer rewrites the condition to an equivalent condition which doesn't use YEAR or DATE functions.
For example, YEAR(date_col)=2023 is rewritten intodate_col between '2023-01-01' and '2023-12-31'.
Similarly, DATE(datetime_col) <= '2023-06-01' is rewritten intodatetime_col <= '2023-06-01 23:59:59'.
The optimization is always ON, there is no Optimizer Switch flag to control it.
The rewrite is logged as date_conds_into_sargable transformation. Example:
MDEV-8320: Allow index usage for DATE(datetime_column) = const
This page is licensed: CC BY-SA / Gnu FDL
This article documents thread states that are related to replica I/O threads. These correspond to the Slave_IO_State shown by SHOW REPLICA STATUS and the STATE values listed by the SHOW PROCESSLIST statement or in the Information Schema PROCESSLIST Table as well as the PROCESSLIST_STATE value listed in the Performance Schema threads Table.
Checking master version
Checking the primary's version, which only occurs very briefly after establishing a connection with the primary.
Connecting to master
Attempting to connect to primary.
This page is licensed: CC BY-SA / Gnu FDL
Index statistics provide crucial insights to the MariaDB query optimizer, guiding it in executing queries efficiently. Up-to-date index statistics ensure optimized query performance.
Understanding index statistics is crucial for the MariaDB query optimizer to efficiently execute queries. Accurate and current statistics guide the optimizer in choosing the best way to access data, similar to using a personal address book for quicker searches rather than a larger phone book. Up-to-date index statistics ensure optimized query performance.
The statistics primarily focus on groups of index elements with identical values. In a primary key, each index is unique, resulting in a group size of one. In a non-unique index, multiple keys may share the same value. The worst-case scenario involves large groups with identical values, such as an index on a boolean field.
MariaDB makes heavy use of the average group size statistic. For example, if there are 100 rows, and twenty groups with the same index values, the average group size would be five.
However, averages can be skewed by extremes, and the usual culprit is NULL values. The row of 100 may have 19 groups with an average size of 1, while the other 81 values are all NULL. MariaDB may think five is a good average size and choose to use that index, and then end up having to read through 81 rows with identical keys, taking longer than an alternative.
There are three main approaches to the problem of NULLs. NULL index values can be treated as a single group (nulls_equal). This is usually fine, but if you have large numbers of NULLs the average group size is slanted higher, and the optimizer may miss using the index for ref accesses when it would be useful. This is the default used by . The opposite approach is nulls_unequal, with each NULL forming its own group of one. Conversely, the average group size is slanted lower, and the optimizer may use the index for ref accesses when not suitable. This is the default used by the and storage engines. A third option, nulls_ignored, sees NULLs ignored altogether from index group calculations.
The default approaches can be changed by setting the , and server variables.
The comparison operator used plays an important role. If two values are compared with <=> (see the comparison operator), and both are null, 1 is returned. If the same values are compared with = (see the comparison operator) null is returned. For example:
introduced a way to gather statistics independently of the storage engine. See .
were introduced in , and are collected by default from .
. This plugin provides user, client, table and index usage statistics.
This page is licensed: CC BY-SA / Gnu FDL
introduced the max_statement_time system variable. When set to a non-zero value, the server attempts to abort any queries taking longer than this time in seconds.
The abortion is not immediate; the server checks the timer status at specific intervals during execution. Consequently, a query may run slightly longer than the specified time before being detected and stopped.
The default is zero, and no limits are then applied. The aborted query has no effect on any larger transaction or connection contexts. The variable is of type double, thus you can use subsecond timeout. For example you can use value 0.01 for 10 milliseconds timeout.
The value can be set globally or per session, as well as per user or per query (see below). Replicas are not affected by this variable, however from , there is which serves the same purpose on replicas only.
An associated status variable, , stores the number of queries that have exceeded the execution time specified by , and a MAX_STATEMENT_TIME_EXCEEDED column was added to the and Information Schema tables.
The feature was based upon a patch by Davi Arnaut.
Important Note on Reliability
MAX_STATEMENT_TIME relies on the execution thread checking the "killed" flag, which happens intermittently.
Long Running Operations: If a query enters a long processing phase where the flag is not checked (e.g., certain storage engine operations or complex calculations), it may continue running significantly past the limit.
can be stored per user with the syntax.
By using in conjunction with , it is possible to limit the execution time of individual queries. For example:
max_statement_time per query
Individual queries can also be limited by adding a MAX_STATEMENT_TIME clause to the query. For example:
does not work in embedded servers.
does not work for statements in a Galera cluster (see for discussion).
Check Intervals: The timeout is checked only at specific points during query execution. Queries stuck in operations where the check code path is not hit will not abort until they reach a checkpoint. This can result in query times exceeding the MAX_STATEMENT_TIME value.
MySQL 5.7.4 introduced similar functionality, but the MariaDB implementation differs in a number of ways.
The MySQL version of (max_execution_time) is defined in millseconds, not seconds
MySQL's implementation can only kill SELECTs, while MariaDB's can kill any queries (excluding stored procedures).
MariaDB only introduced the status variable, while MySQL also introduced a number of other variables which were not seen as necessary in MariaDB.
variable
This page is licensed: CC BY-SA / Gnu FDL
If a query uses a derived table (or a view), the first action that the query optimizer will attempt is to apply the derived-table-merge-optimization and merge the derived table into its parent select. However, that optimization is only applicable when the select inside the derived table has a join as the top-level operation. If it has a GROUP-BY, DISTINCT, or uses window functions, then derived-table-merge-optimization is not applicable.
In that case, the Condition Pushdown optimization is applicable.
Consider an example
The naive way to execute the above is to
Compute the OCT_TOTALS contents (for all customers).
The, select the line with customer_id=1
This is obviously inefficient, if there are 1000 customers, then one will be doing up to 1000 times more work than necessary.
However, the optimizer can take the condition customer_id=1 and push it down into the OCT_TOTALS view.
Inside the OCT_\TOTALS, the added condition is put into its HAVING clause, so we end up with:
Then, parts of HAVING clause that refer to GROUP BY columns are moved into the WHERE clause:
Once a restriction like customer_id=1 is in the WHERE, the query optimizer can use it to construct efficient table access paths.
The optimization is enabled by default. One can disable it by setting the flag condition_pushdown_for_derived to OFF.
The pushdown from HAVING to WHERE part is controlled by condition_pushdown_from_having flag in .
From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .
No optimizer hint is available.
Condition Pushdown through Window Functions (since )
(since )
The Jira task for the feature is .
This page is licensed: CC BY-SA / Gnu FDL
Users of "big" database systems are used to using FROM subqueries as a way to structure their queries. For example, if one's first thought was to select cities with population greater than 10,000 people, and then that from these cities to select those that are located in Germany, one
could write this SQL:
For MySQL, using such syntax was taboo. If you run EXPLAIN for this query, you can see why:
It plans to do the following actions:
From left to right:
Execute the subquery: (SELECT * FROM City WHERE Population > 1*1000), exactly as it was written in the query.
Put result of the subquery into a temporary table.
Read back, and apply a WHERE condition from the upper select, big_city.Country='DEU'
Executing a subquery like this is very inefficient, because the highly-selective condition from the parent select, (Country='DEU') is not used when scanning the base table City. We read too many records from theCity table, and then we have to write them into a temporary table and read them back again, before finally filtering them out.
If one runs this query in MariaDB/MySQL 5.6, they get this:
From the above, one can see that:
The output has only one line. This means that the subquery has been merged into the top-level SELECT.
Table City is accessed through an index on the Country column. Apparently, the Country='DEU' condition was used to construct ref access on the table.
Derived tables (subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same as requirements for VIEWs to allow algorithm=merge.
The optimization is enabled by default. It can be disabled with:
Versions of MySQL and MariaDB which do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. ) of EXPLAIN statements taking a very long time. Starting from + and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.
FAQ entry:
This page is licensed: CC BY-SA / Gnu FDL
If a derived table cannot be merged into its parent SELECT, it will be materialized in a temporary table, and then parent select will treat it as a regular base table.
Before /MySQL 5.6, the temporary table would never have any indexes, and the only way to read records from it would be a full table scan. Starting from the mentioned versions of the server, the optimizer has an option to create an index and use it for joins with other tables.
Consider a query: we want to find countries in Europe, that have more than one million people living in cities. This is accomplished with this query:
The EXPLAIN output for it will show:
One can see here that
table <derived2> is accessed through key0.
ref column shows world.Country.Code
if we look that up in the original query, we find the equality that was used to construct ref access: Country.Code=cities_in_country.Country
The idea of "derived table with key" optimization is to let the materialized derived table have one key which is used for joins with other tables.
The optimization is applied then the derived table could not be merged into its parent SELECT
which happens when the derived table doesn't meet criteria for mergeable VIEW
The optimization is ON by default; it can be switched off like so:
in MySQL 5.6 manual
This page is licensed: CC BY-SA / Gnu FDL
When n is sufficiently small, the optimizer will use a priority queue for sorting. Before the optimization's porting to , the alternative was, roughly speaking, to sort the entire output and then pick only first n rows.
NOTE: The problem of choosing which index to use for query with ORDER BY ... LIMIT is a different problem, see optimizer_join_limit_pref_ratio-optimization.
There are two ways to check whether filesort has used a priority queue.
The first way is to check the status variable. It shows the number of times that sorting was done through a priority queue. (The total number of times sorting was done is a sum and ).
The second way is to check the slow query log. When one uses and specifies , entries look like this
Note the "Priority_queue: Yes" on the last comment line. (pt-query-digest is able to parse slow query logs with the Priority_queue field)
As for EXPLAIN, it will give no indication whether filesort uses priority queue or the generic quicksort and merge algorithm. Using filesort will be shown in both cases, by both MariaDB and MySQL.
page in the MySQL 5.6 manual (search for "priority queue").
MySQL WorkLog entry,
,
This page is licensed: CC BY-SA / Gnu FDL
Starting from , expressions in the form
are sargable if key_col uses either the utf8mb3_general_ci or utf8mb4_general_ci collation.
UCASE is a synonym for UPPER so is covered as well.
Sargable means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or perform partition pruning.
Note that ref access is used.
An example with join:
Here, the optimizer was able to construct ref access.
The variable has the flag sargable_casefold to turn the optimization on and off. The default is ON.
The optimization is implemented as a rewrite for a query's WHERE/ON conditions. It uses the sargable_casefold_removal object name in the trace:
: Make optimizer handle UCASE(varchar_col)=...
An analog for is not possible. See : Make optimizer handle LCASE(varchar_col)=... for details.
This page is licensed: CC BY-SA / Gnu FDL
A thread can have any of the following COMMAND values (displayed by the COMMAND field listed by the statement or in the , as well as the PROCESSLIST_COMMAND value listed in the ). These indicate the nature of the thread's activity.
FirstMatch is an execution strategy for .
It is very similar to how IN/EXISTS subqueries were executed in MySQL 5.x.
Let's take the usual example of a search for countries with big cities:
Suppose, our execution plan is to find countries in Europe, and then, for each found country, check if it has any big cities. Regular inner join execution will look as follows:
Consider a query with a WHERE clause:
the WHERE clause will compute to true only if col1=col2. This means that in the rest of the WHERE clause occurrences of col1 can be substituted with col2 (with some limitations which are discussed in the next section). This allows the optimizer to infer additional restrictions.
For example:
allows to infer a new equality:
The normal way to count "Unique Users" is to take large log files, sort by userid, dedup, and count. This requires a rather large amount of processing. Furthermore, the count derived cannot be rolled up. That is, daily counts cannot be added to get weekly counts -- some users will be counted multiple times.
So, the problem is to store the counts is such a way as to allow rolling up.
./configure --with-libeventmysqld --thread-handling=pool-of-threads --thread-pool-size=20thread-handling= one-thread-per-connection
thread-pool-size= 20--extra-port=# (Default 0)
--extra-max-connections=# (Default 1)mysql --port='number-of-extra-port' --protocol=tcpMySQL [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND securitydelay=0;
+--+-----------+------+----+-------------------------+-------------+-------+-----+------+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+----+-------------------------+-------------+-------+-----+------+-----------+
| 1|SIMPLE |ontime|ref |Origin,Dest,SecurityDelay|SecurityDelay|5 |const|791546|Using where|
+--+-----------+------+----+-------------------------+-------------+-------+-----+------+-----------+
MySQL [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND depdelay < 12*60;
+--+-----------+------+----+--------------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+----+--------------------+----+-------+----+-------+-----------+
| 1|SIMPLE |ontime|ALL |Origin,DepDelay,Dest|NULL|NULL |NULL|1583093|Using where|
+--+-----------+------+----+--------------------+----+-------+----+-------+-----------MariaDB [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA');
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+-------------------------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+-------------------------------------+
| 1|SIMPLE |ontime|index_merge|Origin,Dest |Origin,Dest|6,6 |NULL|92800|Using union(Origin,Dest); Using where|
+--+-----------+------+-----------+-------------+-----------+-------+----+-----+-------------------------------------+
MariaDB [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND securitydelay=0;
+--+-----------+------+-----------+-------------------------+-----------+-------+----+-----+-------------------------------------+
|id|select_type|table |type |possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+-----------+-------------------------+-----------+-------+----+-----+-------------------------------------+
| 1|SIMPLE |ontime|index_merge|Origin,Dest,SecurityDelay|Origin,Dest|6,6 |NULL|92800|Using union(Origin,Dest); Using where|
+--+-----------+------+-----------+-------------------------+-----------+-------+----+-----+-------------------------------------+
MariaDB [ontime]> EXPLAIN SELECT * FROM ontime WHERE (Origin='SEA' OR Dest='SEA') AND depdelay < 12*60;
+--+-----------+------+-----------+--------------------+-----------+-------+----+-----+-------------------------------------+
|id|select_type|table |type |possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+-----------+--------------------+-----------+-------+----+-----+-------------------------------------+
| 1|SIMPLE |ontime|index_merge|Origin,DepDelay,Dest|Origin,Dest|6,6 |NULL|92800|Using union(Origin,Dest); Using where|
+--+-----------+------+-----------+--------------------+-----------+-------+----+-----+-------------------------------------+SELECT * FROM Country
WHERE
Country.code IN (SELECT country_code FROM Satellite)MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select country_code from Satellite);
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| 1 | PRIMARY | Satellite | index | country_code | country_code | 9 | NULL | 932 | Using where; Using index; LooseScan |
| 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.Satellite.country_code | 1 | Using index condition |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+expr IN (SELECT tbl.keypart1 FROM tbl ...)expr IN (SELECT tbl.keypart2 FROM tbl WHERE tbl.keypart1=const AND ...)SET optimizer_switch='index_merge_sort_intersection=on'MySQL [ontime]> EXPLAIN SELECT AVG(arrdelay) FROM ontime WHERE depdel15=1 AND OriginState ='CA';
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
|id|select_type|table |type |possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
| 1|SIMPLE |ontime|index_merge|OriginState,DepDel15|OriginState,DepDel15|3,5 |NULL|76952|Using intersect(OriginState,DepDel15);Using where|
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+MySQL [ontime]> EXPLAIN SELECT avg(arrdelay) FROM ontime where depdel15=1 and OriginState IN ('CA', 'GB');
+--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
| 1|SIMPLE |ontime|ref |OriginState,DepDel15|DepDel15|5 |const|36926|Using where|
+--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+MySQL [ontime]> EXPLAIN SELECT avg(arrdelay) FROM ontime where depdel15=1 and OriginState IN ('CA', 'GB');
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
|id|select_type|table |type |possible_keys |key |key_len|ref |rows |Extra |
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
| 1|SIMPLE |ontime|index_merge|OriginState,DepDel15|DepDel15,OriginState|5,3 |NULL|60754|Using sort_intersect(DepDel15,OriginState); Using where |
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+apt-get install mariadb-plugin-provider-lz4INSTALL SONAME 'provider_lz4';SET GLOBAL innodb_compression_algorithm = lz4;Warning : MariaDB tried to use the LZMA compression, but its provider plugin is not loaded
Error : Table 'test.t' doesn't exist in engine
status : Operation failedError : Table test/t is compressed with lzma, which is not currently loaded.
Please load the lzma provider plugin to open the table
error : CorruptSELECT COUNT(DISTINCT col1) FROM tbl1;SELECT aggregate_func(DISTINCT tbl.primary_key, ...) FROM tbl;SELECT aggregate_func(DISTINCT t1.pk1, ...) FROM t1 GROUP BY t1.pk2;{
"prepare_sum_aggregators": {
"function": "count(distinct t1.col1)",
"aggregator_type": "distinct"
}
}{
"prepare_sum_aggregators": {
"function": "count(distinct t1.pk1)",
"aggregator_type": "simple"
}
}SELECT *
FROM
customer, orders, ...
WHERE
customer.id = orders.customer_id AND ...YEAR(indexed_date_col) CMP const_value
DATE(indexed_date_col) CMP const_value{
"transformation": "date_conds_into_sargable",
"before": "cast(t1.datetime_col as date) <= '2023-06-01'",
"after": "t1.datetime_col <= '2023-06-01 23:59:59'"
},CREATE VIEW OCT_TOTALS AS
SELECT
customer_id,
SUM(amount) AS TOTAL_AMT
FROM orders
WHERE order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY customer_id;
SELECT * FROM OCT_TOTALS WHERE customer_id=1SELECT *
FROM
(SELECT * FROM City WHERE Population > 10*1000) AS big_city
WHERE
big_city.Country='DEU'mysql> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000)
AS big_city WHERE big_city.Country='DEU' ;
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4068 | Using where |
| 2 | DERIVED | City | ALL | Population | NULL | NULL | NULL | 4079 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.60 sec)UPPER(key_col) = expr
UPPER(key_col) IN (constant-list)
Let's think about what we can do with a hash of the userid. The hash could map to a bit in a bit string. A BIT_COUNT of the bit string would give the 1-bits, representing the number of users. But that bit string would have to be huge. What if we could use shorter bit strings? Then different userids would be folded into the same bit. Let's assume we can solve that.
Meanwhile, what about the rollup? The daily bit strings can be OR'd together to get a similar bit string for the week.
We have now figured out how to do the rollup, but have created another problem -- the counts are too low.
A sufficiently random hash (eg MD5) will fold userids into the same bits with a predictable frequency. We need to figure this out, and work backwards. That is, given that X percent of the bits are set, we need a formula that says approximately how many userids were used to get those bits.
I simulated the problem by generating random hashes and calculated the number of bits that would be set. Then, with the help of Eureqa software, I derived the formula:
Y = 0.5456_X + 0.6543_tan(1.39_X_X*X)
The formula is reasonably precise. It is usually within 1% of the correct value; rarely off by 2%.
Of course, if virtually all the bits are set, the forumla can't be very precise. Hence, you need to plan to have the bit strings big enough to handle the expected number of Uniques. In practice, you can use less than 1 bit per Unique. This would be a huge space savings over trying to save all the userids.
Another suggestion... If you are rolling up over a big span of time (eg hourly -> monthly), the bit strings must all be the same length, and the monthly string must be big enough to handle the expected count. This is likely to lead to very sparse hourly bit strings. Hence, it may be prudent to compress the hourly stings.
Invented Nov, 2013; published Apr, 2014
Future: Rick is working on actual code (Sep, 2016) It is complicated by bit-wise operations being limited to BIGINT. However, with MySQL 8.0 (freshly released), the desired bit-wise operations can be applied to BLOB, greatly simplifying my code. I hope to publish the pre-8.0 code soon; 8.0 code later.
Rick James graciously allowed us to use this article in the documentation.
Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: uniques
This page is licensed: CC BY-SA / Gnu FDL
When using the COMPRESSED attribute, note that FIELD LENGTH is reduced by 1; for example, a BLOB has a length of 65535, while BLOB COMPRESSED has 65535-1. See MDEV-15592.
Description: Minimum column data length eligible for compression.
Command line: --column-compression-threshold=#
Scope: Global, Session
Dynamic: Yes
Data Type: numeric
Default Value: 100
Range: 0 to 4294967295
Description: zlib compression level (1 gives best speed, 9 gives best compression).
Command line: --column-compression-zlib-level=#
Scope: Global, Session
Dynamic: Yes
Data Type: numeric
Default Value: 6
Range: 1 to 9
Description: The strategy parameter is used to tune the compression algorithm. Use the value DEFAULT_STRATEGY for normal data, FILTERED for data produced by a filter (or predictor), HUFFMAN_ONLY to force Huffman encoding only (no string match), or RLE to limit match distances to one (run-length encoding). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect of FILTERED is to force more Huffman coding and less string matching; it is somewhat intermediate between DEFAULT_STRATEGY and HUFFMAN_ONLY. RLE is designed to be almost as fast as HUFFMAN_ONLY, but give better compression for PNG image data. The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately. FIXED prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications.
Command line: --column-compression-zlib-strategy=#
Scope: Global, Session
Dynamic: Yes
Data Type: enum
Default Value: DEFAULT_STRATEGY
Valid Values: DEFAULT_STRATEGY, FILTERED, HUFFMAN_ONLY, RLE, FIXED
Description: If set to 1 (0 is default), generate zlib header and trailer and compute adler32 check value. It can be used with storage engines that don't provide data integrity verification to detect data corruption.
Command line: --column-compression-zlib-wrap{=0|1}
Scope: Global, Session
Dynamic: Yes
Data Type: boolean
Default Value: OFF
Description: Incremented each time field data is compressed.
Scope: Global, Session
Data Type: numeric
Description: Incremented each time field data is decompressed.
Scope: Global, Session
Data Type: numeric
The only supported method currently is zlib.
The CSV storage engine stores data uncompressed on-disk even if the COMPRESSED attribute is present.
It is not possible to create indexes over compressed columns.
Storage-independent column compression is different to InnoDB Page Compression in a number of ways.
It is storage engine independent, while InnoDB page compression applies to InnoDB only.
By being specific to a column, one can access non-compressed fields without the decompression overhead.
Only zlib is available, while InnoDB page compression can offer alternative compression algorithms.
It is not recommended to use multiple forms of compression over the same data.
It is intended for compressing large blobs, while InnoDB page compression is suitable for a more general case.
Columns cannot be indexed, while with InnoDB page compression indexes are possible as usual.
This page is licensed: CC BY-SA / Gnu FDL
col2=123allows to infer that col2<10.
There are some limitations to where one can do the substitution, though.
The first and obvious example is the string datatype and collations. Most commonly-used collations in SQL are "case-insensitive", that is 'A'='a'. Also, most collations have a "PAD SPACE" attribute, which means that comparison ignores the spaces at the end of the value, 'a'='a '.
Now, consider a query:
Here, col1=col2, the values are "equal". At the same time LENGTH(col1)=2, while LENGTH(col2)=4, which means one can't perform the substiution for the argument of LENGTH(...).
It's not only collations. There are similar phenomena when equality compares columns of different datatypes. The exact criteria of when thy happen are rather convoluted.
The take-away is: sometimes, X=Y does not mean that one can replace any reference to X with Y.
What one CAN do is still replace the occurrence in the comparisons <, >, >=, <=, etc.
This is how we get two kinds of substitution:
Identity substitution: X=Y, and any occurrence of X can be replaced with Y.
Comparison substitution: X=Y, and an occurrence of X in a comparison (X<Z) can be replaced with Y (Y<Z).
(A draft description): Let's look at how Equality Propagation is integrated with the rest of the query optimization process.
First, multiple-equalities are built (TODO example from optimizer trace)
If multiple-equality includes a constant, fields are substituted with a constant if possible.
From this point, all optimizations like range optimization, ref access, etc make use of multiple equalities: when they see a reference to tableX.columnY somewhere, they also look at all the columns that tableX.columnY is equal to.
After the join order is picked, the optimizer walks through the WHERE clause and substitutes each field reference with the "best" one - the one that can be checked as soon as possible.
Then, the parts of the WHERE condition are attached to the tables where they can be checked.
Consider a query:
Suppose, there is an INDEX(col1). MariaDB optimizer is able to figure out that it can use an index on col1 (or sort by the value of col1) in order to resolve ORDER BY col2.
Look at these elements:
condition_processing
attaching_conditions_to_tables
Equality propagation doesn't just happen at the top of the WHERE clause. It is done "at all levels" where a level is:
A top level of the WHERE clause.
If the WHERE clause has an OR clause, each branch of the OR clause.
The top level of any ON expression
(the same as above about OR-levels)
This page is licensed: CC BY-SA / Gnu FDL
CREATE TABLE cmp (i TEXT COMPRESSED);
CREATE TABLE cmp2 (i TEXT COMPRESSED=zlib);WHERE col1=col2 AND ...WHERE col1=col2 AND col1=123WHERE col1=col2 AND col1 < 10INSERT INTO t1 (col1, col2) VALUES ('ab', 'ab ');
SELECT * FROM t1 WHERE col1=col2 AND LENGTH(col1)=2SELECT ... FROM ... WHERE col1=col2 ORDER BY col2Resource Protection: Because the abort is not guaranteed to be instantaneous or strictly enforced in all code paths, MAX_STATEMENT_TIME should not be relied upon as the sole mechanism for preventing resource exhaustion (such as filling up temporary disk space).
SELECT MAX_STATEMENT_TIME = N ...SET STATEMENT MAX_STATEMENT_TIME=N FOR...
Closing a .
Connect
slave is connected to its master.
Connect Out
Replication slave is in the process of connecting to its master.
Create DB
Executing an operation to create a database.
Daemon
Internal server thread rather than for servicing a client connection.
Debug
Generating debug information.
Delayed insert
A delayed-insert handler.
Drop DB
Executing an operation to drop a database.
Error
Error.
Execute
Executing a .
Fetch
Fetching the results of an executed .
Field List
Retrieving table column information.
Init DB
Selecting default database.
Kill
Killing another thread.
Long Data
Retrieving long data from the result of executing a .
Ping
Handling a server ping request.
Prepare
Preparing a .
Processlist
Preparing processlist information about server threads.
Query
Executing a statement.
Quit
In the process of terminating the thread.
Refresh
a table, logs or caches, or refreshing replication server or information.
Register Slave
Registering a slave server.
Reset stmt
Resetting a .
Set option
Setting or resetting a client statement execution option.
Sleep
Waiting for the client to send a new statement.
Shutdown
Shutting down the server.
Statistics
Preparing status information about the server.
Table Dump
Sending the contents of a table to a slave.
Time
Not used.
This page is licensed: CC BY-SA / Gnu FDL
Binlog Dump
Master thread for sending binary log contents to a slave.
Change user
Executing a change user operation.
Close stmt
Since Germany has two big cities (in this diagram), it will be put into the query output twice. This is not correct, SELECT ... FROM Country should not produce the same country record twice. The FirstMatch strategy avoids the production of duplicates by short-cutting execution as soon as the first genuine match is found:
Note that the short-cutting has to take place after "Using where" has been applied. It would have been wrong to short-cut after we found Trier.
The EXPLAIN for the above query will look as follows:
FirstMatch(Country) in the Extra column means that as soon as we have produced one matching record combination, short-cut the execution and jump back to the Country table.
FirstMatch's query plan is very similar to one you would get in MySQL:
and these two particular query plans will execute in the same time.
The general idea behind the FirstMatch strategy is the same as the one behind the IN->EXISTS transformation, however, FirstMatch has several advantages:
Equality propagation works across semi-join bounds, but not subquery bounds. Therefore, converting a subquery to semi-join and using FirstMatch can still give a better execution plan. (TODO example)
There is only one way to apply the IN->EXISTS strategy and MySQL will do it unconditionally. With FirstMatch, the optimizer can make a choice between whether it should run the FirstMatch strategy as soon as all tables used in the subquery are in the join prefix, or at some later point in time. (TODO: example)
The FirstMatch strategy works by executing the subquery and short-cutting its execution as soon as the first match is found.
This means, subquery tables must be after all of the parent select's tables that are referred from the subquery predicate.
EXPLAIN shows FirstMatch as "FirstMatch(tableN)".
The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
Use of the FirstMatch strategy is controlled with the firstmatch=on|off flag in the variable.
In-depth material:
This page is licensed: CC BY-SA / Gnu FDL
Tag
Provider (of news article)
Manufacturer (of item for sale)
Ticker (financial stock)
Variants on "news article"
Item for sale
Blog comment
Blog thread
Variants on "latest"
Publication date (unix_timestamp)
Most popular (keep the count)
Most emailed (keep the count)
Manual ranking (1..10 -- 'top ten')
Variants on "10" - there is nothing sacred about "10" in this discussion.
Currently you have a table (or a column) that relates the topic to the article. The SELECT statement to find the latest 10 articles has grown in complexity, and performance is poor. You have focused on what index to add, but nothing seems to work.
If there are multiple topics for each article, you need a many-to-many table.
You have a flag "is_deleted" that needs filtering on.
You want to "paginate" the list (ten articles per page, for as many pages as necessary).
First, let me give you the solution, then I will elaborate on why it works well.
One new table called, say, Lists.
Lists has exactly 3 columns: topic, article_id, sequence
Lists has exactly 2 indexes: PRIMARY KEY(topic, sequence, article_id), INDEX(article_id)
Only viewable articles are in Lists. (This avoids the filtering on "is_deleted", etc)
Lists is . (This gets "clustering".)
"sequence" is typically the date of the article, but could be some other ordering.
"topic" should probably be normalized, but that is not critical to this discussion.
"article_id" is a link to the bulky row in another table(s) that provide all the details about the article.
Find the latest 10 articles for a topic:
You must not have any WHERE condition touching columns in Articles.
When you mark an article for deletion; you must remove it from Lists:
I emphasize "must" because flags and other filtering is often the root of performance issues.
By now, you may have discovered why it works.
The big goal is to minimize the disk hits. Let's itemize how few disk hits are needed. When finding the latest articles with 'normal' code, you will probably find that it is doing significant scans of the Articles table, failing to quickly home in on the 10 rows you want. With this design, there is only one extra disk hit:
1 disk hit: 10 adjacent, narrow, rows in Lists -- probably in a single "block".
10 disk hits: The 10 articles. (These hits are unavoidable, but may be cached.) The PRIMARY KEY, and using InnoDB, makes these quite efficient.
OK, you pay for this by removing things that you should avoid.
1 disk hit: INDEX(article_id) - finding a few ids
A few more disk hits to DELETE rows from Lists. This is a small price to pay -- and you are not paying it while the user is waiting for the page to render.
Rick James graciously allowed us to use this article in the documentation.
Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: lists
This page is licensed: CC BY-SA / Gnu FDL
Set the lock wait timeout. See WAIT and NOWAIT.
OPTIMIZE TABLE works for InnoDB (before , only if the innodb_file_per_table server system variable is set), Aria, MyISAM and ARCHIVE tables, and should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length
rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a
linked list and subsequent INSERT operations reuse old row positions.
This statement requires SELECT and INSERT privileges for the table.
By default, OPTIMIZE TABLE statements are written to the binary log and will be replicated. The NO_WRITE_TO_BINLOG keyword (LOCAL is an alias) will ensure the statement is not written to the binary log.
OPTIMIZE TABLE statements are not logged to the binary log if read_only is set. See also Read-Only Replicas.
OPTIMIZE TABLE is also supported for partitioned tables. You
can use[ALTER TABLE](../../../../reference/sql-statements-and-structure/sql-statements/data-definition/alter/alter-table.md) ... OPTIMIZE PARTITION
to optimize one or more partitions.
You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. With other storage engines, OPTIMIZE TABLE does nothing by default, and returns this message: " The storage engine for the table doesn't support optimize". However, if the server has been started with the --skip-new option, OPTIMIZE TABLE is linked to ALTER TABLE, and recreates the table. This operation frees the unused space and updates index statistics.
The Aria storage engine supports for this statement.
If a MyISAM table is fragmented, concurrent inserts will not be performed until an OPTIMIZE TABLE statement is executed on that table, unless the concurrent_insert server system variable is set to ALWAYS.
When rows are added or deleted to an InnoDB fulltext index, the index is not immediately re-organized, as this can be an expensive operation. Change statistics are stored in a separate location. The fulltext index is only fully re-organized when an OPTIMIZE TABLE statement is run.
By default, an OPTIMIZE TABLE will defragment a table. In order to use it to update fulltext index statistics, the innodb_optimize_fulltext_only system variable must be set to 1. This is intended to be a temporary setting and should be reset to 0 once the fulltext index has been re-organized.
Since fulltext re-organization can take a long time, the innodb_ft_num_word_optimize variable limits the re-organization to a number of words (2000 by default). You can run multiple OPTIMIZE statements to fully re-organize the index.
merged the Facebook/Kakao defragmentation patch, allowing one to use OPTIMIZE TABLE to defragment InnoDB tablespaces. For this functionality to be enabled, the innodb_defragment system variable must be enabled. No new tables are created and there is no need to copy data from old tables to new tables. Instead, this feature loads n pages (determined by innodb-defragment-n-pages) and tries to move records so that pages would be full of records and then frees pages that are fully empty after the operation. Note that tablespace files (including ibdata1) will not shrink as the result of defragmentation, but one will get better memory utilization in the InnoDB buffer pool as there are fewer data pages in use.
See Defragmenting InnoDB Tablespaces for more details.
This page is licensed: GPLv2, originally from fill_help_tables.sql
Queueing master event to the relay log
Event is being copied to the relay log after being read, where it can be processed by the SQL thread.
Reconnecting after a failed binlog dump request
Attempting to reconnect to the primary after a previously failed binary log dump request.
Reconnecting after a failed master event read
Attempting to reconnect to the primary after a previously failed request. After successfully connecting, the state will change to Waiting for master to send event.
Registering slave on master
Registering the replica on the primary, which only occurs very briefly after establishing a connection with the primary.
Requesting binlog dump
Requesting the contents of the binary logs from the given log file name and position. Only occurs very briefly after establishing a connection with the primary.
Waiting for master to send event
Waiting for binary log events to arrive after successfully connecting. If there are no new events on the primary, this state can persist for as many seconds as specified by the slave_net_timeout system variable, after which the thread will reconnect. Prior to , , , , and , the time was from SLAVE START. From these versions, the time is since reading the last event.
Waiting for slave mutex on exit
Waiting for replica mutex while the thread is stopping. Only occurs very briefly.
Waiting for the slave SQL thread to free enough relay log space.
Relay log has reached its maximum size, determined by relay_log_space_limit (no limit by default), so waiting for the SQL thread to free up space by processing enough relay log events.
Waiting for master update
State before connecting to primary.
Waiting to reconnect after a failed binlog dump request
Waiting to reconnect after a binary log dump request has failed due to disconnection. The length of time in this state is determined by the MASTER_CONNECT_RETRY clause of the CHANGE MASTER TO statement.
Waiting to reconnect after a failed master event read
Sleeping while waiting to reconnect after a disconnection error. The time in seconds is determined by the MASTER_CONNECT_RETRY clause of the CHANGE MASTER TO statement.

Ignored indexes allow indexes to be visible and maintained without being used by the optimizer. This feature is comparable to MySQL 8’s "invisible indexes."
Ignored indexes are indexes that are visible and maintained, but which are not used by the optimizer. MySQL 8 has a similar feature which they call "invisible indexes".
By default, an index is not ignored. One can mark existing index as ignored (or not ignored) with an statement:
It is also possible to specify IGNORED attribute when creating an index with a , or statement:
table's primary key cannot be ignored. This applies to both explicitly defined primary key, as well as implicit primary key - if there is no explicit primary key defined but the table has a unique key containing only NOT NULL columns, the first of such keys becomes the implicitly defined primary key.
The optimizer will treats ignored indexes as if they didn't exist. They will not be used in the query plans, or as a source of statistical information.
Also, an attempt to use an ignored index in a USE INDEX, FORCE INDEX, or IGNORE INDEX hint will result in an error - the same what would have if one used a name of a non-existent index.
Information about whether or not indexes are ignored can be viewed in the IGNORED column in the or the statement.
The primary use case is as follows: a DBA sees an index that seems to have little or no usage and considers whether to remove it. Dropping the index is a risk as it may still be needed in a few cases. For example, the optimizer may rely on the estimates provided by the index without using the index in query plans. If dropping an index causes an issue, it will take a while to re-create the index. On the other hand, marking the index as ignored (or not ignored) is instant, so the suggested workflow is:
Mark the index as ignored
Check if everything continues to work
If not, mark the index as not ignored.
If everything continues to work, one can safely drop the index.
The optimizer does not make use of an index when it is ignored, while if the index is not ignored (the default), the optimizer will consider it in the optimizer plan, as shown in the output.
This page is licensed: CC BY-SA / Gnu FDL
optimizer_adjust_secondary_key_costsDescription: Gives the user the ability to affect how the costs for secondary keys using ref are calculated in the few cases when MariaDB 10.6 up to MariaDB 10.11 makes a sub-optimal choice when optimizing ref access, either for key lookups or GROUP BY. ref, as used by , means that the optimizer is using key-lookup on one value to find the matching rows from a table. Unused from . In the variable was changed from a number to a set of strings and disable_forced_index_in_group_by (value 4) was added.
Scope: Global, Session
Dynamic: Yes
Data Type: set
Default Value: fix_reuse_range_for_ref, fix_card_multiplier
Range: 0 to 63 or any combination of adjust_secondary_key_cost, disable_max_seek or disable_forced_index_in_group_by, fix_innodb_cardinality,fix_reuse_range_for_ref, fix_card_multiplier
Introduced: ,
MariaDB starting with
optimizer_adjust_secondary_key_costs will be obsolete starting from as the new optimizer in 11.0 does not have max_seek optimization and is already using cost based choices for index usage with GROUP BY.
The value for optimizer_adjust_secondary_key_costs is one of more of the following:
One can set all options with:
The reason for the max_seek optimization was originally to ensure that MariaDB would use a key instead of a table scan. This works well for a lot of queries, but can cause problems when a table scan is a better choice, such as when one would have to scan more than 1/4 of the rows in the table (in which case a table scan is better).
The system variable.
This page is licensed: CC BY-SA / Gnu FDL
The target use case for rowid filtering is as follows:
a table uses ref access on index IDX1
but it also has a fairly restrictive range predicate on another index IDX2.
In this case, it is advantageous to:
Do an index-only scan on index IDX2 and collect rowids of index records into a data structure that allows filtering (let's call it $FILTER).
When doing ref access on IDX1, check $FILTER before reading the full record.
Consider a query
Suppose the condition on l_shipdate is very restrictive, which means lineitem table should go first in the join order. Then, the optimizer can use o_orderkey=l_orderkey equality to do an index lookup to get the order the line item is from. On the other hand o_totalprice between ... can also be rather selective.
With filtering, the query plan would be:
Note that table orders has "Using rowid filter". The type column has "|filter", the key column shows the index that is used to construct the filter. rows column shows the expected filter selectivity, it is 5%.
ANALYZE FORMAT=JSON output for table orders will show
Note the rowid_filter element. It has a range element inside it. selectivity_pct is the expected selectivity, accompanied by the r_selectivity_pct showing the actual observed selectivity.
The optimizer makes a cost-based decision about when the filter should be used.
The filter data structure is currently an ordered array of rowids. (a Bloom filter would be better here and will probably be introduced in the future versions).
The optimization needs to be supported by the storage engine. At the moment, it is supported by and . It is not supported in .
Rowid Filtering can't be used with a backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query and decides to handle it with a backward-ordered index scan, it will disable Rowid Filtering.
Rowid filtering can be switched on/off using rowid_filter flag in the variable. By default, the optimization is enabled.
This page is licensed: CC BY-SA / Gnu FDL
Off (0) by default, when enabling this optimization, MariaDB will consider a join order that may shorten query execution time based on the ORDER BY ... LIMIT n clause. For small values of n, this may improve performance.
Set the value of optimizer_join_limit_pref_ratio to a non-zero value to enable this option (higher values are more conservative, recommended value is 100), or set to 0 (the default value) to disable it.
By default, the MariaDB optimizer picks a join order without considering the ORDER BY ... LIMIT clause, when present.
For example, consider a query looking at latest 10 orders together with customers who made them:
The two possible plans are:customer->orders:
and orders->customer:
The customer->orders plan computes a join between all customers and orders, saves that result into a temporary table, and then uses filesort to get the 10 most recent orders. This query plan doesn't benefit from the fact that just 10 orders are needed.
However, in contrast, the orders->customers plan uses an index to read rows in the ORDER BY order. The query can stop execution once it finds 10 order-and-customer combinations. This is much faster than computing the entire join. Under this plan, and when this new optimization, we can leverage ORDER BY ... LIMIT to stop early, when we have the 10 combinations.
It is fundamentally difficult to produce a reliable estimate for ORDER BY ... LIMIT shortcuts. Let's take an example from the previous section to see why. This query searches for last 10 orders that were shipped by air:
Suppose we know beforehand that 50% of orders are shipped by air.
Assuming there's no correlation between date and shipping method, orders->customer plan will need to scan 20 orders before we find 10 that are shipped by air.
But if there is correlation, then we may need to scan up to (total_orders*0.5 + 10) before we find first 10 orders that are shipped by air. Scanning about 50% of all orders can be expensive.
This situation worsens when the query has constructs whose selectivity is not known. For example, suppose the WHERE condition was
in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.
Due to these challenges, the optimization is not enabled by default.
When running a mostly OLTP workload such that query WHERE conditions have suitable indexes or are not very selective, then any ORDER BY ... LIMIT queries will typically find matching rows quickly. In this case, it makes sense to give the following guidance to the optimizer:
The value of X is given to the optimizer via optimizer_join_limit_pref_ratio setting.
Higher values carry less risk. The recommended value is 100: prefer the LIMIT join order if it
promises at least 100x speedup.
introduces optimizer_join_limit_pref_ratio optimization
is about future development that would make the optimizer handle such cases without user guidance.
This page is licensed: CC BY-SA / Gnu FDL
Index Condition Pushdown is an optimization that is applied for access methods that access table data through indexes: range, ref, eq_ref, ref_or_null, and .
The idea is to check part of the WHERE condition that refers to index fields (we call it Pushed Index Condition) as soon as we've accessed the index. If the Pushed Index Condition is not satisfied, we won't need to read the whole table record.
Index Condition Pushdown is on by default. To disable it, set its optimizer_switch flag like so:
When Index Condition Pushdown is used, EXPLAIN will show "Using index condition":
DuplicateWeedout is an execution strategy for .
The idea is to run the semi-join (a query with uses WHERE X IN (SELECT Y FROM ...)) as if it were a regular inner join, and then eliminate the duplicate record combinations using a temporary table.
Suppose, you have a query where you're looking for countries which have more than 33% percent of their population in one big city:
First, we run a regular inner join between the City
The NOT NULL range scan optimization enables the optimizer to construct range scans from NOT NULL conditions that it was able to infer from the WHERE clause.
The optimization appeared in . It is not enabled by default; one needs to set an optimizer_switch flag to enable it.
A basic (but slightly artificial) example:
The WHERE condition in this form cannot be used for range scans. However, one can infer that it will reject rows that NULL for weight. That is, infer an additional condition that
SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
+---------+---------------+------------+
| 1 <=> 1 | NULL <=> NULL | 1 <=> NULL |
+---------+---------------+------------+
| 1 | 1 | 0 |
+---------+---------------+------------+
SELECT 1 = 1, NULL = NULL, 1 = NULL;
+-------+-------------+----------+
| 1 = 1 | NULL = NULL | 1 = NULL |
+-------+-------------+----------+
| 1 | NULL | NULL |
+-------+-------------+----------+SET STATEMENT max_statement_time=100 FOR
SELECT field1 FROM table_name ORDER BY field1;SELECT MAX_STATEMENT_TIME=2 * FROM t1;MariaDB [world]> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000)
AS big_city WHERE big_city.Country='DEU';
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
| 1 | SIMPLE | City | ref | Population,Country | Country | 3 | const | 90 | Using index condition; Using where |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)SET @@optimizer_switch='derived_merge=OFF'SELECT *
FROM
Country,
(SELECT
SUM(City.Population) AS urban_population,
City.Country
FROM City
GROUP BY City.Country
HAVING
urban_population > 1*1000*1000
) AS cities_in_country
WHERE
Country.Code=cities_in_country.Country AND Country.Continent='Europe';+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
| 1 | PRIMARY | Country | ref | PRIMARY,continent | continent | 17 | const | 60 | Using index condition |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 3 | world.Country.Code | 17 | |
| 2 | DERIVED | City | ALL | NULL | NULL | NULL | NULL | 4079 | Using temporary; Using filesort |
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+SET optimizer_switch='derived_with_keys=off'SELECT * FROM Country
WHERE Country.code IN (SELECT City.Country
FROM City
WHERE City.Population > 1*1000*1000)
AND Country.continent='Europe'MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 1*1000*1000)
AND Country.continent='Europe';
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| 1 | PRIMARY | Country | ref | PRIMARY,continent | continent | 17 | const | 60 | Using index condition |
| 1 | PRIMARY | City | ref | Population,Country | Country | 3 | world.Country.Code | 18 | Using where; FirstMatch(Country) |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
2 rows in set (0.00 sec)MySQL [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 1*1000*1000)
AND Country.continent='Europe';
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| 1 | PRIMARY | Country | ref | continent | continent | 17 | const | 60 | Using index condition; Using where |
| 2 | DEPENDENT SUBQUERY | City | index_subquery | Population,Country | Country | 3 | func | 18 | Using where |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
2 rows in set (0.01 sec)SELECT a.*
FROM Articles a
JOIN Lists s ON s.article_id = a.article_id
WHERE s.topic = ?
ORDER BY s.sequence DESC
LIMIT 10;DELETE FROM Lists
WHERE article_id = ?;OPTIMIZE [NO_WRITE_TO_BINLOG | LOCAL] TABLE
tbl_name [, tbl_name] ...
[WAIT n | NOWAIT]SELECT
customer_id,
SUM(amount) AS TOTAL_AMT
FROM orders
WHERE order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY customer_id
HAVING
customer_id=1 SELECT
customer_id,
SUM(amount) AS TOTAL_AMT
FROM orders
WHERE
order_date BETWEEN '2017-10-01' AND '2017-10-31' AND
customer_id=1
GROUP BY customer_id# Time: 140714 18:30:39
# User@Host: root[root] @ localhost []
# Thread_id: 3 Schema: test QC_hit: No
# Query_time: 0.053857 Lock_time: 0.000188 Rows_sent: 11 Rows_examined: 100011
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: Yes
SET TIMESTAMP=1405348239;SET TIMESTAMP=1405348239;
SELECT * FROM t1 WHERE col1 BETWEEN 10 AND 20 ORDER BY col2 LIMIT 100;CREATE TABLE t1 (
key1 VARCHAR(32) COLLATE utf8mb4_general_ci,
...
KEY(key1)
);EXPLAIN SELECT * FROM t1 WHERE UPPER(key1)='ABC'
+------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | t1 | ref | key1 | key1 | 131 | const | 1 | Using where; Using index |
+------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+EXPLAIN SELECT * FROM t0,t1 WHERE upper(t1.key1)=t0.col;
+------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t1 | ref | key1 | key1 | 131 | test.t0.col | 1 | Using index |
+------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+"join_optimization": {
"select_id": 1,
"steps": [
{
"sargable_casefold_removal": {
"before": "ucase(t1.key1) = t0.col",
"after": "t1.key1 = t0.col"
}
},

Consider this table with data in JSON format:
In order to do efficient queries over data in JSON, you can add a virtual column, and an index on that column:
Before MariaDB 11.8, you had to use vcol1 in the WHERE clause. Now, you can use the virtual column expression, too:
In MariaDB, one has to create a virtual column and then create an index over it. Other databases allow to create an index directly over expression: create index on t1((col1+col2)). This is not yet supported in MariaDB (MDEV-35853).
The WHERE clause must use the exact same expression as in the virtual column definition.
The optimization is implemented in a way similar to MySQL – the optimizer finds potentially useful occurrences of vcol_expr in the WHERE clause and replaces them with vcol_name.
In the optimizer trace, the rewrites are shown like this:
Improved Optimizer plans for SELECT statements with ORDER BY or GROUP BY virtual columns when the virtual column expressions are covered by indexes that can be used.
Improved Optimizer plans for SELECT statements with ORDER BY or GROUP BY virtual columns expressions, by substitution of the virtual column expressions with virtual columns when the virtual columns are usable indexes themselves.
The same improvements apply for single-table UPDATE or DELETE statements.
SQL is strongly-typed language while JSON is weakly-typed. This means one must specify the desired datatype when accessing JSON data from SQL. In the above example, we declared vcol1 as INT and then used (CAST ... AS INTEGER) (both in the ALTER TABLE and in the WHERE clause in SELECT query:):
When extracting string values, CAST is not necessary, as JSON_VALUE returns strings. However, you must take into account collations. Consider this column declared as JSON:
The collation of json_data is utf8mb4_bin. The collation of JSON_VALUE(json_data, ...) is utf8mb4_bin, too.
Most use cases require a more commonly-used collation. It is possible to achieve that using the COLLATE clause:
MDEV-35616: Add basic optimizer support for virtual columns
This page is licensed: CC BY-SA / Gnu FDL
ALTER TABLE table_name ALTER {KEY|INDEX} [IF EXISTS] key_name [NOT] IGNORED;CREATE TABLE table_name (
...
INDEX index_name ( ...) [NOT] IGNORED
...CREATE INDEX index_name (...) [NOT] IGNORED ON tbl_name (...);CREATE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b) IGNORED);CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b));
ALTER TABLE t1 ALTER INDEX k1 IGNORED;CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT);
CREATE INDEX k1 ON t1(b) IGNORED;SELECT * FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_NAME = 't1'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 0
INDEX_SCHEMA: test
INDEX_NAME: PRIMARY
SEQ_IN_INDEX: 1
COLUMN_NAME: id
COLLATION: A
CARDINALITY: 0
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IGNORED: NO
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 1
INDEX_SCHEMA: test
INDEX_NAME: k1
SEQ_IN_INDEX: 1
COLUMN_NAME: b
COLLATION: A
CARDINALITY: 0
SUB_PART: NULL
PACKED: NULL
NULLABLE: YES
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IGNORED: YESSHOW INDEXES FROM t1\G
*************************** 1. row ***************************
Table: t1
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 0
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Ignored: NO
*************************** 2. row ***************************
Table: t1
Non_unique: 1
Key_name: k1
Seq_in_index: 1
Column_name: b
Collation: A
Cardinality: 0
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Ignored: YESCREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b) IGNORED);
EXPLAIN SELECT * FROM t1 ORDER BY b;
+------+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1 | Using filesort |
+------+-------------+-------+------+---------------+------+---------+------+------+----------------+
ALTER TABLE t1 ALTER INDEX k1 NOT IGNORED;
EXPLAIN SELECT * FROM t1 ORDER BY b;
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | index | NULL | k1 | 5 | NULL | 1 | Using index |
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+SELECT ...
FROM orders JOIN lineitem ON o_orderkey=l_orderkey
WHERE
l_shipdate BETWEEN '1997-01-01' AND '1997-01-31' AND
o_totalprice between 200000 and 230000;*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: lineitem
type: range
possible_keys: PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity
key: i_l_shipdate
key_len: 4
ref: NULL
rows: 98
Extra: Using index condition
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: orders
type: eq_ref|filter
possible_keys: PRIMARY,i_o_totalprice
key: PRIMARY|i_o_totalprice
key_len: 4|9
ref: dbt3_s001.lineitem.l_orderkey
rows: 1 (5%)
Extra: Using where; Using rowid filter"table": {
"table_name": "orders",
"access_type": "eq_ref",
"possible_keys": ["PRIMARY", "i_o_totalprice"],
"key": "PRIMARY",
"key_length": "4",
"used_key_parts": ["o_orderkey"],
"ref": ["dbt3_s001.lineitem.l_orderkey"],
"rowid_filter": {
"range": {
"key": "i_o_totalprice",
"used_key_parts": ["o_totalprice"]
},
"rows": 69,
"selectivity_pct": 4.6,
"r_rows": 71,
"r_selectivity_pct": 10.417,
"r_buffer_size": 53,
"r_filling_time_ms": 0.0716
}SELECT *
FROM
customer,ORDER
WHERE
customer.name=ORDER.customer_name
ORDER BY
ORDER.DATE DESC
LIMIT 10+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
| 1 | SIMPLE | customer | ALL | name | NULL | NULL | NULL | 9623 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | orders | ref | customer_name | customer_name | 103 | customer.name | 1 | |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------++------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
| 1 | SIMPLE | orders | index | customer_name | order_date | 4 | NULL | 10 | Using where |
| 1 | SIMPLE | customer | ref | name | name | 103 | orders.customer_name | 1 | |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+SELECT *
FROM
customer,ORDER
WHERE
customer.name=ORDER.customer_name
AND ORDER.shipping_method='Airplane'
ORDER BY
ORDER.DATE DESC
LIMIT 10order.shipping_method='%Airplane%'Do consider the query plan using LIMIT short-cutting
and prefer it if it promises at least X times speedup.CREATE TABLE t1 (json_data JSON);
INSERT INTO t1 VALUES('{"column1": 1234}');
INSERT INTO t1 ...ALTER TABLE t1
ADD COLUMN vcol1 INT AS (cast(json_value(json_data, '$.column1') AS INTEGER)),
ADD INDEX(vcol1);-- This uses the index before 11.8:
EXPLAIN SELECT * FROM t1 WHERE vcol1=100;
-- Starting from 11.8, this uses the index, too:
EXPLAIN SELECT * FROM t1
WHERE cast(json_value(json_data, '$.column1') AS INTEGER)=100;+------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE | t1 | ref | vcol1 | vcol1 | 5 | const | 1 | |
+------+-------------+-------+------+---------------+-------+---------+-------+------+-------+"virtual_column_substitution": {
"condition": "WHERE",
"resulting_condition": "t1.vcol1 = 100"
}ALTER TABLE t1
ADD COLUMN vcol1 INT AS (CAST(json_value(json_data, '$.column1') AS INTEGER)) ...SELECT ... WHERE ... CAST(json_value(json_data, '$.column1') AS INTEGER) ...;CREATE TABLE t1 (
json_data JSON
...ALTER TABLE t1
ADD col1 VARCHAR(100) COLLATE utf8mb4_uca1400_ai_ci AS
(json_value(js1, '$.string_column') COLLATE utf8mb4_uca1400_ai_ci),
ADD INDEX(col1);
...
SELECT ...
WHERE
json_value(js1, '$.string_column') COLLATE utf8mb4_uca1400_ai_ci='string-value';fix_innodb_cardinality
By default InnoDB doubles the cardinality for indexes in an effort to force index usage over table scans. This can cause the optimizer to create sub-optimal plans for ranges or index entries that cover a big part of the table.
Using this option removes the doubling of cardinality in InnoDB. fix_innodb_cardinality is recommended to be used only as a server startup option, as it is enabled for a table at first usage. See for details.
fix_reuse_range_for_ref
Number of estimated rows for 'ref' did not always match costs from range optimizer
Use cost from range optimizer for 'ref' if all used key parts are constants. The old code did not always do this
fix_card_multiplier
Index selectivity can be bigger than 1.0 if index statistics is not up to date. Not on by default.
Ensure that the calculated index selectivity is never bigger than 1.0. Having index selectivity bigger than 1.0 causes MariaDB to believe that there is more rows in the table than in reality, which can cause wrong plans. This option is on by default.
adjust_secondary_key_cost
Limit ref costs by max_seeks
The secondary key costs for ref are updated to be at least five times the clustered primary key costs if a clustered primary key exists
disable_max_seek
ref cost on secondary keys is limited to max_seek = min('number of expected rows'/ 10, scan_time*3)
Disable 'max_seek' optimization and do a slight adjustment of filter cost
disable_forced_index_in_group_by
Use a rule-based choice when deciding to use an index to resolve GROUP BY
The choice is now cost based
In disk-based storage engines, making an index lookup is done in two steps, like shown on the picture:
Index Condition Pushdown optimization tries to cut down the number of full record reads by checking whether index records satisfy part of the WHERE condition that can be checked for them:
How much speed will be gained depends on
How many records will be filtered out
How expensive it was to read them
The former depends on the query and the dataset. The latter is generally bigger when table records are on disk and/or are big, especially when they have blobs.
I used DBT-3 benchmark data, with scale factor=1. Since the benchmark defines very few indexes, we've added a multi-column index (index condition pushdown is usually useful with multi-column indexes: the first component(s) is what index access is done for, the subsequent have columns that we read and check conditions on).
The query was to find big (l_quantity > 40) orders that were made in January 1993 that took more than 25 days to ship:
EXPLAIN without Index Condition Pushdown:
with Index Condition Pushdown:
The speedup was:
Cold buffer pool: from 5 min down to 1 min
Hot buffer pool: from 0.19 sec down to 0.07 sec
There are two server status variables:
Number of times pushed index condition was checked.
Number of times the condition was matched.
That way, the value Handler_icp_attempts - Handler_icp_match shows the number records that the server did not have to read because of Index Condition Pushdown.
Currently, virtual column indexes can't be used for index condition pushdown. Instead, a generated column can be made declared STORED. Then, index condition pushdown will be possible.
Index Condition Pushdown can't be used with backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query which can be handled by using a backward-ordered index scan, it will disable Index Condition Pushdown.
Index condition pushdown support for partitioned tables was added in .
This page is licensed: CC BY-SA / Gnu FDL
CountryThe Inner join produces duplicates. We have Germany three times, because it has three big cities.
Now, lets put DuplicateWeedout into the picture:
Here one can see that a temporary table with a primary key was used to avoid producing multiple records with 'Germany'.
The Start temporary and End temporary from the last diagram are shown in the EXPLAIN output:
This query will read 238 rows from the City table, and for each of them will make a primary key lookup in the Country table, which gives another 238 rows. This gives a total of 476 rows, and you need to add 238 lookups in the temporary table (which are typically much cheaper since the temporary table is in-memory).
If we run the same query with semi-join optimizations disabled, we'll get:
This plan will read (239 + 239*18) = 4541 rows, which is much slower.
DuplicateWeedout is shown as "Start temporary/End temporary" in EXPLAIN.
The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
DuplicateWeedout allows the optimizer to freely mix a subquery's tables and the parent select's tables.
There is no separate @@optimizer_switch flag for DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='optimizer_semijoin=off' command.
This page is licensed: CC BY-SA / Gnu FDL
and pass it to the range optimizer. The range optimizer can, in turn, evaluate whether it makes sense to construct range access from the condition:
Here's another example that's more complex but is based on a real-world query. Consider a join query
Here, the optimizer can infer the condition "return_id IS NOT NULL". If most of the orders are not returned (and so have NULL for return_id), one can use range access to scan only those orders that had a return.
The optimization is not enabled by default. One can enable it like so
TODO.
MDEV-15777 - JIRA bug report which resulted in the optimization
NULL Filtering Optimization is a related optimization in MySQL and MariaDB. It uses inferred NOT NULL conditions to perform filtering (but not index access)
This page is licensed: CC BY-SA / Gnu FDL
Your data includes a large set of non-overlapping 'ranges'. These could be IP addresses, datetimes (show times for a single station), zipcodes, etc.
You have pairs of start and end values; one 'item' belongs to each such 'range'. So, instinctively, you create a table with start and end of the range, plus info about the item. Your queries involve a WHERE clause that compares for being between the start and end values.
Once you get a large set of items, performance degrades. You play with the indexes, but find nothing that works well. The indexes fail to lead to optimal functioning because the database does not understand that the ranges are non-overlapping.
I will present a solution that enforces the fact that items cannot have overlapping ranges. The solution builds a table to take advantage of that, then uses Stored Routines to get around the clumsiness imposed by it.
The instinctive solution often leads to scanning half the table to do just about anything, such as finding the item containing an 'address'. In complexity terms, this is Order(N).
The solution here can usually get the desired information by fetching a single row, or a small number of rows. It is Order(1).
In a large table, "counting the disk hits" is the important part of performance. Since InnoDB is used, and the PRIMARY KEY (clustered) is used, most operations hit only 1 block.
Finding the 'block' where a given IP address lives:
For start of block: One single-row fetch using the PRIMARY KEY
For end of block: Ditto. The record containing this will be 'adjacent' to the other record.
For allocating or freeing a block:
2-7 SQL statements, hitting the clustered PRIMARY KEY for the rows containing and immediately adjacent to the block.
One SQL statement is a DELETE; if hits as many rows as are needed for the block.
The other statements hit one row each.
This is crucial to the design and its performance:
Having just one address in the row. These were alternative designs; they seemed to be no better, and possibly worse:
That one address could have been the 'end' address.
The routine parameters for a 'block' could have be start of this block and start of next block.
The IPv4 parameters could have been dotted quads; I chose to keep the reference implemetation simpler instead.
The interesting work is in the Ips, not the second table, so I focus on it. The inconvenience of JOINing to the second table is small compared to the performance gains.
Two, not one, tables will be used. The first table (Ips in the reference implementations) is carefully designed to be optimal for all the basic operations needed. The second table contains other infomation about the 'owner' of each 'item'. In the reference implementations owner is an id used to JOIN the two tables. This discussion centers around Ips and how to efficiently map IP(s) to/from owner(s). The second table has "PRIMARY KEY(owner)".
In addition to the two-table schema, there are a set of Stored Routines to encapsulate the necessary code.
One row of Ips represents one 'item' by specifying the starting IP address and the 'owner'. The next row gives the starting IP address of the next "address block", thereby indirectly providing the ending address for the current block.
This lack of explicitly stating the "end address" leads to some clumsiness. The stored routines hide it from the user.
A special owner (indicated by '0') is reserved for "free" or "not-owned" blocks. Hence, sparse allocation of address blocks is no problem. Also, the 'free' owner is handled no differently than real owners, so there are no extra Stored Routines for such.
Links below give "reference" implementations for IPv4 and IPv6. You will need to make changes for non-IP situations, and may need to make changes even for IP situations.
These are the main stored routines provided:
IpIncr, IpDecr -- for adding/subtracting 1
IpStore -- for allocating/freeing a range
IpOwner, IpRangeOwners, IpFindRanges, Owner2IpStarts, Owner2IpRanges -- for lookups
IpNext, IpEnd -- IP of start of next block, or end of current block
None of the provided routines JOIN to the other table; you may wish to develop custom queries based on the given reference Stored Procedures.
The Ips table's size is proportional to the number of blocks. A million 'owned' blocks may be 20-50MB. This varies due to
number of 'free' gaps (between zero and the number of owned blocks)
datatypes used for ip and owner
overhead Even 100M blocks is quite manageable in today's hardware. Once things are cached, most operations would take only a few milliseconds. A trillion blocks would work, but most operations would hit the disk a few times -- only a few times.
This specific to IPv4 (32 bit, a la '196.168.1.255'). It can handle anywhere from 'nothing assigned' (1 row) to 'everything assigned' (4B rows) 'equally' well. That is, to ask the question "who owns '11.22.33.44'" is equally efficient regardless of how many blocks of IP addresses exist in the table. (OK, caching, disk hits, etc may make a slight difference.) The one function that can vary is the one that reassigns a range to a new owner. Its speed is a function of how many existing ranges need to be consumed, since those rows will be DELETEd. (It helps that they are, by schema design, 'clustered'.)
Notes on the :
Externally, the user may use the dotted quad notation (11.22.33.44), but needs to convert to INT UNSIGNED for calling the Stored Procs.
The user is responsible for converting to/from the calling datatype (INT UNSIGNED) when accessing the stored routine; suggest /.
The internal datatype for addresses is the same as the calling datatype (INT UNSIGNED).
Adding and subtracting 1 (simple arithmetic).
(The reference implementation does not handle CDRs. Such should be easy to add on, by first turning it into an IP range.)
The code for handling IP address is more complex, but the overall structure is the same as for IPv4. Launch into it only if you need IPv6.
Notes on the :
Externally, IPv6 has a complex string, VARCHAR(39) CHARACTER SET ASCII. The Stored Procedure IpStr2Hex() is provided.
The user is responsible for converting to/from the calling datatype (BINARY(16)) when accessing the stored routine; suggest /.
The internal datatype for addresses is the same as the calling datatype (BINARY(16)).
Communication with the Stored routines is via 32-char hex strings.
The INET6* functions were first available in MySQL 5.6.3 and
Adapting to a different non-IP 'address range' data
The external datatype for an 'address' should be whatever is convenient for the application.
The datatype for the 'address' in the table must be ordered, and should be as compact as possible.
You must write the Stored functions (IpIncr, IpDecr) for incrementing/decrementing an 'address'.
An 'owner' is an id of your choosing, but smaller is better.
"Owner" needs a special value to represent "not owned". The reference implementations use "=" and "!=" to compare two 'owners'. Numeric values and strings work nicely with those operators; NULL does not. Hence, please do not use NULL for "not owned".
Since the datatypes are pervasive in the stored routines, adapting a reference implementation to a different concept of 'address' would require multiple minor changes.
The code enforces that consecutive blocks never have the same 'owner', so the table is of 'minimal' size. Your application can assume that such is always the case.
Original writing -- Oct, 2012; Notes on INET6 functions -- May, 2015.
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
Semi-join Materialization is a special kind of subquery materialization used for Semi-join subqueries. It actually includes two strategies:
Materialization/lookup
Materialization/scan
Consider a query that finds countries in Europe which have big cities:
The subquery is uncorrelated, that is, we can run it independently of the upper query. The idea of semi-join materialization is to do just that, and fill a temporary table with possible values of the City.country field of big cities, and then do a join with countries in Europe:
The join can be done in two directions:
From the materialized table to countries in Europe
From countries in Europe to the materialized table
The first way involves doing a full scan on the materialized table, so we call it "Materialization-scan".
If you run a join from Countries to the materialized table, the cheapest way to find a match in the materialized table is to make a lookup on its primary key (it has one: we used it to remove duplicates). Because of that, we call the strategy "Materialization-lookup".
If we chose to look for cities with a population greater than 7 million, the optimizer will use Materialization-Scan and EXPLAIN will show this:
Here, you can see:
There are still two SELECTs (look for columns with id=1 and id=2)
The second select (with id=2) has select_type=MATERIALIZED. This means it will be executed and its results will be stored in a temporary table with a unique key over all columns. The unique key is there to prevent the table from containing any duplicate records.
The optimizer chose to do a full scan over the materialized table, so this is an example of a use of the Materialization-Scan strategy.
As for execution costs, we're going to read 15 rows from table City, write 15 rows to materialized table, read them back (the optimizer assumes there won't be any duplicates), and then do 15 eq_ref accesses to table Country. In total, we'll do 45 reads and 15 writes.
By comparison, if you run the EXPLAIN with semi-join optimizations disabled, you'll get this:
...which is a plan to do (239 + 239*15) = 3824 table reads.
Let's modify the query slightly and look for countries which have cities with a population over one millon (instead of seven):
The EXPLAIN output is similar to the one which used Materialization-scan, except that:
the <subquery2> table is accessed with the eq_ref access method
the access uses an index named distinct_key
This means that the optimizer is planning to do index lookups into the materialized table. In other words, we're going to use the Materialization-lookup strategy.
With optimizer_switch='semijoin=off,materialization=off'), one will get this EXPLAIN:
One can see that both plans will do a full scan on the Country table. For the second step, MariaDB will fill the materialized table (238 rows read from table City and written to the temporary table) and then do a unique key lookup for each record in table Country, which works out to 238 unique key lookups. In total, the second step will cost (239+238) = 477 reads and 238 temp.table writes.
Execution of the latter (DEPENDENT SUBQUERY) plan reads 18 rows using an index on City.Country for each record it receives for table Country. This works out to a cost of (18*239) = 4302 reads. Had there been fewer subquery invocations, this plan would have been better than the one with Materialization. By the way, MariaDB has an option to use such a query plan, too (see ), but it did not choose it.
MariaDB is able to use Semi-join materialization strategy when the subquery has grouping (other semi-join strategies are not applicable in this case).
This allows for efficient execution of queries that search for the best/last element in a certain group.
For example, let's find cities that have the biggest population on their continent:
the cities are:
Semi-join materialization
Can be used for uncorrelated IN-subqueries. The subselect may use grouping and/or aggregate functions.
Is shown in EXPLAIN as type=MATERIALIZED for the subquery, and a line withtable=<subqueryN> in the parent subquery.
Is enabled when one has both materialization=on and semijoin=on
This page is licensed: CC BY-SA / Gnu FDL
MariaDB 5.3 has an optimizer debugging patch. The patch is pushed into:
lp:maria-captains/maria/5.3-optimizer-debugging
The patch is wrapped in #ifdef, but there is a #define straight in mysql_priv.h so simply compiling that tree should produce a binary with optimizer debugging enabled.
The patch adds two system variables:
@@debug_optimizer_prefer_join_prefix
@@debug_optimizer_dupsweedout_penalized
the variables are present as session/global variables, and are also settable via the server command line.
If this variable is non-NULL, it is assumed to specify a join prefix as a comma-separated list of table aliases:
The optimizer will try its best to build a join plan which matches the specified join prefix. It does this by comparing join prefixes it is considering with @@debug_optimizer_prefer_join_prefix, and multiplying cost by a million if the plan doesn't match the prefix.
As a result, you can more-or-less control the join order. For example, let's take this query:
and request a join order of C,A,B:
We got it.
Note that this is still a best-effort approach:
you won't be successful in forcing join orders which the optimizer considers invalid (e.g. for "t1 LEFT JOIN t2" you won't be able to get a join order of t2,t1).
The optimizer does various plan pruning and may discard the requested join order before it has a chance to find out that it is a million-times cheaper than any other.
It is possible to force the join order of joins plus semi-joins. This may cause a different strategy to be used:
Semi-join materialization is a somewhat special case, because "join prefix" is not exactly what you see in the EXPLAIN output. For semi-join materialization:
don't put "<subqueryN>" into @@debug_optimizer_prefer_join_prefix
instead, put all of the materialization tables into the place where you want the <subqueryN> line.
Attempts to control the join order inside the materialization nest will be unsuccessful. Example: we want A-C-B-AA:
but we get A-B-C-AA.
There are four semi-join execution strategies:
FirstMatch
Materialization
LooseScan
DuplicateWeedout
The first three strategies have flags in @@optimizer_switch that can be used to disable them. The DuplicateWeedout strategy does not have a flag. This was done for a reason, as that strategy is the catch-all strategy and it can handle all kinds of subqueries, in all kinds of join orders. (We're slowly moving to the point where it will be possible to run with FirstMatch enabled and everything else disabled but we are not there yet.)
Since DuplicateWeedout cannot be disabled, there are cases where it "gets in the way" by being chosen over the strategy you need. This is whatdebug_optimizer_dupsweedout_penalized is for. if you set:
...the costs of query plans that use DuplicateWeedout will be multiplied by a millon. This doesn't mean that you will get rid of DuplicateWeedout — due to it is still possible to have DuplicateWeedout used even if a cheaper plan exits. A partial remedy to this is to run with
It is possible to use both debug_optimizer_dupsweedout_penalized and debug_optimizer_prefer_join_prefix at the same time. This should give you the desired strategy and join order.
See mysql-test/t/debug_optimizer.test (in the MariaDB source code) for examples
This page is licensed: CC BY-SA / Gnu FDL
This section details special comments you can add to SQL statements to influence the query optimizer, helping you manually select better execution plans for improved performance and query tuning.
Optimizer hints are options available that affect the execution plan.
SELECT Modifiers have been in MariaDB for a long time, while Expanded Optimizer Hints were introduced in MariaDB 12.0 and 12.1.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB [securedb]> select @@version;
+-----------------------------------+
| @@version |
+-----------------------------------+
| 10.6.20-16-MariaDB-enterprise-log |
+-----------------------------------+
1 row in set (0.001 sec)
MariaDB [securedb]> select @@optimizer_adjust_secondary_key_costs;
+---------------------------------------------+
| @@optimizer_adjust_secondary_key_costs |
+---------------------------------------------+
| fix_reuse_range_for_ref,fix_card_multiplier |
+---------------------------------------------+SET @@optimizer_adjust_secondary_key_costs='all';SET optimizer_switch='index_condition_pushdown=off'MariaDB [test]> EXPLAIN SELECT * FROM tbl WHERE key_col1 BETWEEN 10 AND 11 AND key_col2 LIKE '%foo%';
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
| 1 | SIMPLE | tbl | range | key_col1 | key_col1 | 5 | NULL | 2 | Using index condition |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+ALTER TABLE lineitem ADD INDEX s_r (l_shipdate, l_receiptdate);SELECT COUNT(*) FROM lineitem
WHERE
l_shipdate BETWEEN '1993-01-01' AND '1993-02-01' AND
datediff(l_receiptdate,l_shipdate) > 25 AND
l_quantity > 40;-+----------+-------+----------------------+-----+---------+------+--------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
-+----------+-------+----------------------+-----+---------+------+--------+-------------+
| lineitem | range | s_r | s_r | 4 | NULL | 152064 | Using where |
-+----------+-------+----------------------+-----+---------+------+--------+-------------+-+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
-+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
| lineitem | range | s_r | s_r | 4 | NULL | 152064 | Using index condition; Using where |
-+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+SELECT *
FROM Country
WHERE
Country.code IN (SELECT City.Country
FROM City
WHERE
City.Population > 0.33 * Country.Population AND
City.Population > 1*1000*1000);EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 0.33 * Country.Population
AND City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
TABLE: City
type: RANGE
possible_keys: Population,Country
KEY: Population
key_len: 4
ref: NULL
ROWS: 238
Extra: USING INDEX CONDITION; Start temporary
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
TABLE: Country
type: eq_ref
possible_keys: PRIMARY
KEY: PRIMARY
key_len: 3
ref: world.City.Country
ROWS: 1
Extra: USING WHERE; End temporary
2 rows in set (0.00 sec)EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 0.33 * Country.Population
AND City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
TABLE: Country
type: ALL
possible_keys: NULL
KEY: NULL
key_len: NULL
ref: NULL
ROWS: 239
Extra: USING WHERE
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
TABLE: City
type: index_subquery
possible_keys: Population,Country
KEY: Country
key_len: 3
ref: func
ROWS: 18
Extra: USING WHERE
2 rows in set (0.00 sec)CREATE TABLE items (
price DECIMAL(8,2),
weight DECIMAL(8,2),
...
INDEX(weight)
);-- Find items that cost more than 1000 $currency_units per kg:
SET optimizer_switch='not_null_range_scan=ON';
EXPLAIN
SELECT * FROM items WHERE items.price > items.weight / 1000;weight IS NOT NULL+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | items | range | NULL | weight | 5 | NULL | 1 | Using where |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+-- Find orders that were returned
SELECT * FROM current_orders AS O, order_returns AS RET
WHERE
O.return_id= RET.id;SET optimizer_switch='not_null_range_scan=ON';

subquery2. This is the table that we got as a result of the materialization of the select with id=2.The materialization=on|off flag is shared with Non-semijoin materialization.

Fetch all the rows -- this is costly
Append RAND() to the rows
Sort the rows -- also costly
Pick the first 10.
All the algorithms given below are "fast", but most introduce flaws:
Bias -- some rows are more like to be fetched than others.
Repetitions -- If two random sets contain the same row, they are likely to contain other dups.
Sometimes failing to fetch the desired number of rows.
"Fast" means avoiding reading all the rows. There are many techniques that require a full table scan, or at least an index scan. They are not acceptable for this list. There is even a technique that averages half a scan; it is relegated to a footnote.
Here's a way to measure performance without having a big table.
If some of the "Handler" numbers look like the number of rows in the table, then there was a table scan.
None of the queries presented here need a full table (or index) scan. Each has a time proportional to the number of rows returned.
Virtually all published algorithms involve a table scan. The previously published version of this blog had, embarrassingly, several algorithms that had table scans.
Sometimes the scan can be avoided via a subquery. For example, the first of these will do a table scan; the second will not.
Requirement: AUTO_INCREMENT id
Requirement: No gaps in id
(Of course, you might be able to simplify this. For example, min_id is likely to be 1. Or precalculate limits into @min and @max.)
Requirement: AUTO_INCREMENT id
Requirement: No gaps in id
Flaw: Sometimes delivers fewer than 10 rows
The FLOOR expression could lead to duplicates, hence the inflated inner LIMIT. There could (rarely) be so many duplicates that the inflated LIMIT leads to fewer than the desired 10 different rows. One approach to that Flaw is to rerun the query if it delivers too few rows.
A variant:
Again, ugly but fast, regardless of table size.
Requirement: AUTO_INCREMENT, possibly with gaps due to DELETEs, etc
Flaw: Only semi-random (rows do not have an equal chance of being picked), but it does partially compensate for the gaps
Flaw: The first and last few rows of the table are less likely to be delivered.
This gets 50 "consecutive" ids (possibly with gaps), then delivers a random 10 of them.
Yes, it is complex, but yes, it is fast, regardless of the table size.
(Unfinished: need to check these.)
Assuming rnd is a FLOAT (or DOUBLE) populated with RAND() and INDEXed:
Requirement: extra, indexed, FLOAT column
Flaw: Fetches 10 adjacent rows (according to rnd), hence not good randomness
Flaw: Near 'end' of table, can't find 10 rows.
These two variants attempt to resolve the end-of-table flaw:
Requirement: UUID/GUID/MD5/SHA1 column exists and is indexed.
Similar code/benefits/flaws to AUTO_INCREMENT with gaps.
Needs 7 random HEX digits:
can be used as a start for adapting a gapped AUTO_INCREMENT case. If the field is BINARY instead of hex, then
Rick James graciously allowed us to use this article in the documentation.
Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: random
This page is licensed: CC BY-SA / Gnu FDL
The purpose of this optimization is to provide the means to terminate the execution of SELECT statements which examine too many rows, and thus use too many resources. This is achieved through an extension of the LIMIT clause —LIMIT ROWS EXAMINED number_of_rows. Whenever possible the semantics of LIMIT ROWS EXAMINED is the same as that of normal LIMIT (for instance for aggregate functions).
The LIMIT ROWS EXAMINED clause is taken into account by the query engine only during query execution. Thus the clause is ignored in the following cases:
If a query is EXPLAIN-ed.
During query optimization.
During auxiliary operations such as writing to system tables (e.g. logs).
The clause is not applicable to DELETE or UPDATE statements, and if used in those statements produces a syntax error.
The effects of this clause are as follows:
The server counts the number of read, inserted, modified, and deleted rows during query execution. This takes into account the use of temporary tables, and sorting for intermediate query operations.
Once the counter exceeds the value specified in the LIMIT ROWS EXAMINED clause, query execution is terminated as soon as possible.
The effects of terminating the query because of LIMIT ROWS EXAMINED are as follows:
The result of the query is a subset of the complete query, depending on when the query engine detected that the limit was reached. The result may be empty if no result rows could be computed before reaching the limit.
A warning is generated of the form: "Query execution was interrupted. The query examined at least 100 rows, which exceeds LIMIT ROWS EXAMINED (20). The query result may be incomplete."
If query processing was interrupted during filesort, an error is returned in addition to the warning.
If a UNION was interrupted during execution of one of its queries, the last step of the UNION is still executed in order to produce a partial result.
Depending on the join and other execution strategies used for a query, the same query may produce no result at all, or a different subset of the complete result when terminated due to LIMIT ROWS EXAMINED.
If the query contains a GROUP BY clause, the last group where the limit was reached will be discarded.
The LIMIT ROWS EXAMINED clause cannot be specified on a per-subquery basis. There can be only one LIMIT ROWS EXAMINED clause for the whole SELECT statement. If a SELECT statement contains several subqueries with LIMIT ROWS EXAMINED, the one that is parsed last is taken into
account.
A simple example of the clause is:
The LIMIT ROWS EXAMINED clause is global for the whole statement.
If a composite query (such as UNION, or query with derived tables or with subqueries) contains more than one LIMIT ROWS EXAMINED, the last one parsed is taken into account. In this manner either the last or the outermost one is taken into account. For instance, in the query:
The limit that is taken into account is 11, not 0.
This page is licensed: CC BY-SA / Gnu FDL
SELECT * FROM Country
WHERE Country.code IN (SELECT City.Country
FROM City
WHERE City.Population > 7*1000*1000)
AND Country.continent='Europe'MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 15 | |
| 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.City.Country | 1 | |
| 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 7*1000*1000);
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
| 1 | PRIMARY | Country | ALL | NULL | NULL | NULL | NULL | 239 | Using where |
| 2 | DEPENDENT SUBQUERY | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition; Using where |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| 1 | PRIMARY | Country | ALL | PRIMARY | NULL | NULL | NULL | 239 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 3 | func | 1 | |
| 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN
(select City.Country from City where City.Population > 1*1000*1000) ;
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | Country | ALL | NULL | NULL | NULL | NULL | 239 | Using where |
| 2 | DEPENDENT SUBQUERY | City | index_subquery | Population,Country | Country | 3 | func | 18 | Using where |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+EXPLAIN
SELECT * FROM City
WHERE City.Population IN (SELECT max(City.Population) FROM City, Country
WHERE City.Country=Country.Code
GROUP BY Continent)
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 239 | |
| 1 | PRIMARY | City | ref | Population | Population | 4 | <subquery2>.max(City.Population) | 1 | |
| 2 | MATERIALIZED | Country | ALL | PRIMARY | NULL | NULL | NULL | 239 | Using temporary |
| 2 | MATERIALIZED | City | ref | Country | Country | 3 | world.Country.Code | 18 | |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
4 rows in set (0.00 sec)+------+-------------------+---------+------------+
| ID | Name | Country | Population |
+------+-------------------+---------+------------+
| 1024 | Mumbai (Bombay) | IND | 10500000 |
| 3580 | Moscow | RUS | 8389200 |
| 2454 | Macao | MAC | 437500 |
| 608 | Cairo | EGY | 6789479 |
| 2515 | Ciudad de México | MEX | 8591309 |
| 206 | São Paulo | BRA | 9968485 |
| 130 | Sydney | AUS | 3276207 |
+------+-------------------+---------+------------+SET debug_optimizer_prefer_join_prefix='tbl1,tbl2,tbl3';MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten B, ten C;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) |
| 1 | SIMPLE | C | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
3 rows in set (0.00 sec)MariaDB [test]> SET debug_optimizer_prefer_join_prefix='C,A,B';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten B, ten C;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| 1 | SIMPLE | C | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) |
| 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
3 rows in set (0.00 sec)MariaDB [test]> SET debug_optimizer_prefer_join_prefix=NULL;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> EXPLAIN SELECT * FROM ten A WHERE a IN (SELECT B.a FROM ten B, ten C WHERE C.a + A.a < 4);
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
| 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | B | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | PRIMARY | C | ALL | NULL | NULL | NULL | NULL | 10 | Using where; FirstMatch(A) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
3 rows in set (0.00 sec)
MariaDB [test]> SET debug_optimizer_prefer_join_prefix='C,A,B';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> EXPLAIN SELECT * FROM ten A WHERE a IN (SELECT B.a FROM ten B, ten C WHERE C.a + A.a < 4);
+----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| 1 | PRIMARY | C | ALL | NULL | NULL | NULL | NULL | 10 | Start temporary |
| 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (flat, BNL join) |
| 1 | PRIMARY | B | ALL | NULL | NULL | NULL | NULL | 10 | Using where; End temporary |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
3 rows in set (0.00 sec)MariaDB [test]> SET debug_optimizer_prefer_join_prefix='A,C,B,AA';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten AA WHERE A.a IN (SELECT B.a FROM ten B, ten C);
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
| 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | func | 1 | |
| 1 | PRIMARY | AA | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) |
| 2 | SUBQUERY | B | ALL | NULL | NULL | NULL | NULL | 10 | |
| 2 | SUBQUERY | C | ALL | NULL | NULL | NULL | NULL | 10 | |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
5 rows in set (0.00 sec)MariaDB [test]> SET debug_optimizer_dupsweedout_penalized=TRUE;MariaDB [test]> SET optimizer_prune_level=0;FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';SELECT * FROM RandTest AS a
WHERE id = FLOOR(@min + (@max - @min + 1) * RAND()); -- BAD: table scan
SELECT *
FROM RandTest AS a
JOIN (
SELECT FLOOR(@min + (@max - @min + 1) * RAND()) AS id -- Good; single eval.
) b USING (id);SELECT r.*
FROM (
SELECT FLOOR(mm.min_id + (mm.max_id - mm.min_id + 1) * RAND()) AS id
FROM (
SELECT MIN(id) AS min_id,
MAX(id) AS max_id
FROM RandTest
) AS mm
) AS init
JOIN RandTest AS r ON r.id = init.id;-- First select is one-time:
SELECT @min := MIN(id),
@max := MAX(id)
FROM RandTest;
SELECT DISTINCT *
FROM RandTest AS a
JOIN (
SELECT FLOOR(@min + (@max - @min + 1) * RAND()) AS id
FROM RandTest
LIMIT 11 -- more than 10 (to compensate for dups)
) b USING (id)
LIMIT 10; -- the desired number of rowsSELECT r.*
FROM (
SELECT FLOOR(mm.min_id + (mm.max_id - mm.min_id + 1) * RAND()) AS id
FROM (
SELECT MIN(id) AS min_id,
MAX(id) AS max_id
FROM RandTest
) AS mm
JOIN ( SELECT id dummy FROM RandTest LIMIT 11 ) z
) AS init
JOIN RandTest AS r ON r.id = init.id
LIMIT 10;-- First select is one-time:
SELECT @min := MIN(id),
@max := MAX(id)
FROM RandTest;
SELECT a.*
FROM RandTest a
JOIN ( SELECT id FROM
( SELECT id
FROM ( SELECT @min + (@max - @min + 1 - 50) * RAND()
AS start FROM DUAL ) AS init
JOIN RandTest y
WHERE y.id > init.start
ORDER BY y.id
LIMIT 50 -- Inflated to deal with gaps
) z ORDER BY RAND()
LIMIT 10 -- number of rows desired (change to 1 if looking for a single row)
) r ON a.id = r.id;SELECT r.*
FROM ( SELECT RAND() AS start FROM DUAL ) init
JOIN RandTest r
WHERE r.rnd >= init.start
ORDER BY r.rnd
LIMIT 10;SELECT r.*
FROM ( SELECT RAND() * ( SELECT rnd
FROM RandTest
ORDER BY rnd DESC
LIMIT 10,1 ) AS start
) AS init
JOIN RandTest r
WHERE r.rnd > init.start
ORDER BY r.rnd
LIMIT 10;
SELECT @start := RAND(),
@cutoff := CAST(1.1 * 10 + 5 AS DECIMAL(20,8)) / TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'dbname'
AND TABLE_NAME = 'RandTest'; -- 0.0030
SELECT d.*
FROM (
SELECT a.id
FROM RandTest a
WHERE rnd BETWEEN @start AND @start + @cutoff
) sample
JOIN RandTest d USING (id)
ORDER BY rand()
LIMIT 10;RIGHT( HEX( (1<<24) * (1+RAND()) ), 6)UNHEX(RIGHT( HEX( (1<<24) * (1+RAND()) ), 6))SELECT ... FROM ... WHERE ...
[group_clause] [order_clause]
LIMIT [[OFFSET,] row_count] ROWS EXAMINED rows_limit;SELECT * FROM t1, t2 LIMIT 10 ROWS EXAMINED 10000;SELECT * FROM t1
WHERE c1 IN (SELECT * FROM t2 WHERE c2 > ' ' LIMIT ROWS EXAMINED 0)
LIMIT ROWS EXAMINED 11;The IPv6 parameters are 32-digit hex because it was the simpler that BINARY(16) or IPv5 for a reference implementation.
The datatype of an 'owner' (MEDIUMINT UNSIGNED: 0..16M) -- adjust if needed.
The address "Off the end" (255.255.255.255+1 - represented as NULL).
The table is initialized to one row: (ip=0, owner=0), meaning "all addresses are free See the comments in the code for more details.
Inside the Procedures, and in the Ips table, an address is stored as BINARY(16) for efficiency. HEX() and UNHEX() are used at the boundaries.
Adding/subtracting 1 is rather complex (see the code).
The datatype of an 'owner' (MEDIUMINT UNSIGNED: 0..16M); 'free' is represented by 0. You may need a bigger datatype.
The address "Off the end" (ffff.ffff.ffff.ffff.ffff.ffff.ffff.ffff+1 is represented by NULL).
The table is initialized to one row: (UNHEX('00000000000000000000000000000000'), 0), meaning "all addresses are free.
You may need to decide on a canonical representation of IPv4 in IPv6. See the comments in the code for more details.
A special value (such as 0 or '') must be provided for 'free'.
The table must be initialized to one row: (SmallestAddress, Free)


A foreign key is a database constraint that references columns in a parent table to enforce data integrity in a child table. When used, MariaDB checks to maintain these integrity rules.
A foreign key is a constraint which can be used to enforce data integrity. It is composed by a column (or a set of columns) in a table called the child table, which references to a column (or a set of columns) in a table called the parent table. If foreign keys are used, MariaDB performs some checks to enforce that some integrity rules are always enforced. For a more exhaustive explanation, see .
Foreign keys can only be used with storage engines that support them. The default InnoDB supports foreign keys.
Partitioned tables cannot contain foreign keys, and cannot be referenced by a foreign key.
Note: Until , MariaDB accepts the shortcut format with a REFERENCES clause only in ALTER TABLE and CREATE TABLE statements, but that syntax does nothing. For example:
MariaDB simply parses it without returning any error or warning, for compatibility with other DBMS's. However, only the syntax described below creates foreign keys. From , MariaDB will attempt to apply the constraint. See the below.
Foreign keys are created with or . The definition must follow this syntax:
The symbol clause, if specified, is used in error messages and must be unique in the database.
The columns in the child table must be a BTREE (not HASH, RTREE, or FULLTEXT — see ) index, or the leftmost part of a BTREE index. Index prefixes are not supported (thus, and columns cannot be used as foreign keys). If MariaDB automatically creates an index for the foreign key (because it does not exist and is not explicitly created), its name will be index_name.
The referenced columns in the parent table must be a an index or a prefix of an index.
The foreign key columns and the referenced columns must be of the same type, or similar types. For integer types, the size and sign must also be the same.
Both the foreign key columns and the referenced columns can be columns. However, the ON UPDATE CASCADE, ON UPDATE SET NULL, ON DELETE SET NULL clauses are not allowed in this case.
The parent and the child table must use the same storage engine, and must not be TEMPORARY or partitioned tables. They can be the same table.
If a foreign keys exists, each row in the child table must match a row in the parent table. Multiple child rows can match the same parent row. A child row matches a parent row if all its foreign key values are identical to a parent row's values in the parent table. However, if at least one of the foreign key values is NULL, the row has no parents, but it is still allowed.
MariaDB performs certain checks to guarantee that the data integrity is enforced:
Trying to insert non-matching rows (or update matching rows in a way that makes them non-matching rows) in the child table produces a 1452 error ( '23000').
When a row in the parent table is deleted and at least one child row exists, MariaDB performs an action which depends on the ON DELETE clause of the foreign key.
When a value in the column referenced by a foreign key changes and at least one child row exists, MariaDB performs an action which depends on the ON UPDATE clause of the foreign key.
The allowed actions for ON DELETE and ON UPDATE are:
RESTRICT: The change on the parent table is prevented. The statement terminates with a 1451 error ( '2300'). This is the default behavior for both ON DELETE and ON UPDATE.
NO ACTION: Synonym for RESTRICT.
The delete or update operations triggered by foreign keys do not activate and are not counted in the and status variables.
Foreign key constraints can be disabled by setting the server system variable to 0. This speeds up the insertion of large quantities of data.
The table contains information about foreign keys. The individual columns are listed in the table.
The InnoDB-specific Information Schema tables also contain information about the InnoDB foreign keys. The foreign key information is stored in the . Data about the individual columns are stored in .
The most human-readable way to get information about a table's foreign keys sometimes is the statement.
Foreign keys have the following limitations in MariaDB:
Currently, foreign keys are only supported by InnoDB.
Cannot be used with views.
The SET DEFAULT action is not supported.
Foreign keys actions do not activate .
Let's see an example. We will create an author table and a book table. Both tables have a primary key called id. book also has a foreign key composed by a field called author_id, which refers to the author primary key. The foreign key constraint name is optional, but we'll specify it because we want it to appear in error messages: fk_book_author.
Now, if we try to insert a book with a non-existing author, we will get an error:
The error is very descriptive.
Now, let's try to properly insert two authors and their books:
It worked!
Now, let's delete the second author. When we created the foreign key, we specified ON DELETE CASCADE. This should propagate the deletion, and make the deleted author's books disappear:
We also specified ON UPDATE RESTRICT. This should prevent us from modifying an author's id (the column referenced by the foreign key) if a child row exists:
This page is licensed: CC BY-SA / Gnu FDL
When rows are deleted from an InnoDB table, the rows are simply marked as deleted and not physically deleted. The free space is not returned to the operating system for re-use.
The purge thread will physically delete index keys and rows, but the free space introduced is still not returned to operating system. This can lead to gaps in the pages. If you have variable length rows, new rows may be larger than old rows and cannot make use of the available space.
You can run OPTIMIZE TABLE or ALTER TABLE
ENGINE=InnoDB to reconstruct the table. Unfortunately running OPTIMIZE TABLE against an InnoDB table stored in the shared table-space file ibdata1 does two things:
Makes the table’s data and indexes contiguous inside ibdata1.
Increases the size of ibdata1 because the contiguous data and index pages are appended to ibdata1.
The feature described below has been deprecated in and was removed in . See and .
merged Facebook's defragmentation code prepared for MariaDB by Matt, Seong Uck Lee from Kakao. The only major difference to Facebook's code and Matt’s patch is that MariaDB does not introduce new literals to SQL and makes no changes to the server code. Instead, is used and all code changes are inside the InnoDB/XtraDB storage engines.
The behaviour of OPTIMIZE TABLE is unchanged by default, and to enable this new feature, you need to set the system variable to 1.
No new tables are created and there is no need to copy data from old tables to new tables. Instead, this feature loads n pages (determined by ) and tries to move records so that pages would be full of records and then frees pages that are fully empty after the operation.
Note that tablespace files (including ibdata1) will not shrink as the result of defragmentation, but one will get better memory utilization in the InnoDB buffer pool as there are fewer data pages in use.
A number of new system and status variables for controlling and monitoring the feature are introduced.
: Enable InnoDB defragmentation.
: Number of pages considered at once when merging multiple pages to defragment.
: Number of defragment stats changes there are before the stats are written to persistent storage.
: Number of records of space that defragmentation should leave on the page.
: Number of defragment re-compression failures
: Number of defragment failures.
: Number of defragment operations.
After these CREATE and INSERT operations, the following information can be seen from the INFORMATION SCHEMA:
Deleting three-quarters of the records, leaving gaps, and then optimizing:
Now some pages have been freed, and some merged:
See on the Mariadb.org blog for more details.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB has support for full-text indexing and searching:
A full-text index in MariaDB is an index of type FULLTEXT, and it allows more options when searching for portions of text from a field.
Partitioned tables cannot contain fulltext indexes, even if the storage engine supports them.
A FULLTEXT index definition can be given in the statement when a table is created, or added later using or .
For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.
Full-text searching is performed using syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a literal string, not a variable or a column name.
Partial words are excluded.
Words less than 4 (MyISAM) or 3 (InnoDB) characters in length will not be stored in the fulltext index. This value can be adjusted by changing the system variable (or, for , ).
Words longer than 84 characters in length will also not be stored in the fulltext index. This values can be adjusted by changing the system variable (or, for , ).
MariaDB calculates a relevance for each result, based on a number of factors, including the number of words in the index, the number of unique words in a row, the total number of words in both the index and the result, and the weight of the word. In English, 'cool' will be weighted less than 'dandy', at least at present! The relevance can be returned as part of a query simply by using the MATCH function in the field list.
IN NATURAL LANGUAGE MODE is the default type of full-text search, and the keywords can be omitted. There are no special operators, and searches consist of one or more comma-separated keywords.
Searches are returned in descending order of relevance.
Boolean search permits the use of a number of special operators:
Searches are not returned in order of relevance, and nor does the 50% limit apply. Stopwords and word minimum and maximum lengths still apply as usual.
A query expansion search is a modification of a natural language search. The search string is used to perform a regular natural language search. Then, words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. It can be useful when relying on implied knowledge within the data, for example that MariaDB is a database.
Creating a table, and performing a basic search:
Multiple words:
Since 'Once' is a , no result is returned:
Inserting the word 'wicked' into more than half the rows excludes it from the results:
Using IN BOOLEAN MODE to overcome the 50% limitation:
Returning the relevance:
WITH QUERY EXPANSION. In the following example, 'MariaDB' is always associated with the word 'database', so it is returned when query expansion is used, even though not explicitly requested.
Partial word matching with IN BOOLEAN MODE:
Using boolean operators
For simpler searches of a substring in text columns, see the operator.
This page is licensed: CC BY-SA / Gnu FDL
You are ingesting lots of data. Performance is bottlenecked in the INSERT area.
This will be couched in terms of Data Warehousing, with a huge Fact table and Summary (aggregation) tables.
Have a separate staging table.
Inserts go into Staging.
Normalization and Summarization reads Staging, not Fact.
After normalizing, the data is copied from Staging to Fact.
Staging is one (or more) tables in which the data lives only long enough to be handed off to Normalization, Summary, and the Fact tables.
Since we are probably talking about a billion-row table, shrinking the width of the Fact table by normalizing (as mentioned here). Changing an to a will save a GB. Replacing a string by an id (normalizing) saves many GB. This helps disk space and cacheability, hence speed.
Some variations:
Big dump of data once an hour, versus continual stream of records.
The input stream could be single-threaded or multi-threaded.
You might have 3rd party software tying your hands.
Generally the fastest injection rate can be achieved by "staging" the INSERTs in some way, then batch processing the staged records. This blog discusses various techniques for staging and batch processing.
Let's say your Input has a host_name column, but you need to turn that into a smaller host_id in the Fact table. The "Normalization" table, as I call it, looks something like
Here's how you can use Staging as an efficient way achieve the swap from name to id.
Staging has two fields (for this normalization example):
Meawhile, the Fact table has:
SQL #1 (of 2):
By isolating this as its own transaction, we get it finished in a hurry, thereby minimizing blocking. By saying IGNORE, we don't care if other threads are 'simultaneously' inserting the same host_names.
There is a subtle reason for the LEFT JOIN. If, instead, it were INSERT IGNORE..SELECT DISTINCT, then the INSERT would preallocate auto_increment ids for as many rows as the SELECT provides. This is very likely to "burn" a lot of ids, thereby leading to overflowing MEDIUMINT unnecessarily. The LEFT JOIN leads to finding just the new ids that are needed (except for the rare possibility of a 'simultaneous' insert by another thread). More rationale:
SQL #2:
This gets the IDs, whether already existing, set by another thread, or set by SQL #1.
If the size of Staging changes depending on the busy versus idle times of the day, this pair of SQL statements has another comforting feature. The more rows in Staging, the more efficient the SQL runs, thereby helping compensate for the "busy" times.
The companion folds SQL #2 into the INSERT INTO Fact. But you may need host_id for further normalization steps and/or Summarization steps, so this explicit UPDATE shown here is often better.
The simple way to stage is to ingest for a while, then batch-process what is in Staging. But that leads to new records piling up waiting to be staged. To avoid that issue, have 2 processes:
one process (or set of processes) for INSERTing into Staging;
one process (or set of processes) to do the batch processing (normalization, summarization).
To keep the processes from stepping on each other, we have a pair of staging tables:
Staging is being INSERTed into;
StageProcess is one being processed for normalization, summarization, and moving to the Fact table.
A separate process does the processing, then swaps the tables:
This may not seem like the shortest way to do it, but has these features:
The DROP + CREATE might be faster than TRUNCATE, which is the desired effect.
The RENAME is atomic, so the INSERT process(es) never find that Staging is missing.
A variant on the 2-table flip-flop is to have a separate Staging table for each Insertion process. The Processing process would run around to each Staging in turn.
A variant on that would be to have a separate processing process for each Insertion process.
The choice depends on which is faster (insertion or processing). There are tradeoffs; a single processing thread avoids some locks, but lacks some parallelism.
Fact table -- , if for no other reason than that a system crash would not need a REPAIR TABLE. (REPAIRing a billion-row table can take hours or days.)
Normalization tables -- InnoDB, primarily because it can be done efficiently with 2 indexes, whereas, MyISAM would need 4 to achieve the same efficiency.
Staging -- Lots of options here.
If you have multiple Inserters and a single Staging table, InnoDB is desirable due to row-level, not table-level, locking.
MEMORY may be the fastest and it avoids I/O. This is good for a single staging table.
For multiple Inserters, a separate Staging table for each Inserter is desired.
For multiple Inserters into a single Staging table, InnoDB may be faster. (MEMORY does table-level locking.)
Confused? Lost? There are enough variations in applications that make it impractical to predict what is best. Or, simply good enough. Your ingestion rate may be low enough that you don't hit the brick walls that I am helping you avoid.
Should you do "CREATE TEMPORARY TABLE"? Probably not. Consider Staging as part of the data flow, not to be DROPped.
This is mostly covered here: Summarize from the Staging table instead of the Fact table.
Row Based Replication (RBR) is probably the best option.
The following allows you to keep more of the Ingestion process in the Master, thereby not bogging down the Slave(s) with writes to the Staging table.
RBR
Staging is in a separate database
That database is not replicated (binlog-ignore-db on Master)
In the Processing steps, USE that database, reach into the main db via syntax like "MainDb.Hosts". (Otherwise, the binlog-ignore-db does the wrong thing.)
That way
Writes to Staging are not replicated.
Normalization sends only the few updates to the normalization tables.
Summarization sends only the updates to the summary tables.
Flip-flop does not replicate the DROP, CREATE or RENAME.
You could possibly spread the data you are trying ingest across multiple machines in a predictable way (sharding on hash, range, etc). Running "reports" on a sharded Fact table is a challenge unto itself. On the other hand, Summary Tables rarely get too big to manage on a single machine.
For now, Sharding is beyond the scope of this blog.
I have implicitly assumed the data is being pushed into the database. If, instead, you are "pulling" data from some source(s), then there are some different considerations.
Case 1: An hourly upload; run via cron
Grab the upload, parse it
Put it into the Staging table
Normalize -- each SQL in its own transaction (autocommit)
BEGIN
If you need parallelism in Summarization, you will have to sacrifice the transactional integrity of steps 4-7.
Caution: If these steps add up to more than an hour, you are in deep dodo.
Case 2: You are polling for the data
It is probably reasonable to have multiple processes doing this, so it will be detailed about locking.
Create a Staging table for this polling processor. Loop:
With some locked mechanism, decide which 'thing' to poll.
Poll for the data, pull it in, parse it. (Potentially polling and parsing are significantly costly)
Put it into the process-specific Staging table
iblog_file_size should be larger than the change in the STATUS "Innodb_os_log_written" across the BEGIN...COMMIT transaction (for either Case).
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
You have a website with news articles, or a blog, or some other thing with a list of things that might be too long for a single page. So, you decide to break it into chunks of, say, 10 items and provide a [Next] button to go the next "page".
You spot OFFSET and LIMIT in MariaDB and decide that is the obvious way to do it.
Note that the problem requirement needs a [Next] link on each page so that the user can 'page' through the data. He does not really need "GoTo Page #". Jump to the [First] or [Last] page may be useful.
All is well -- until you have 50,000 items in a list. And someone tries to walk through all 5000 pages. That 'someone' could be a search engine crawler.
Where's the problem? Performance. Your web page is doing "SELECT ... OFFSET 49990 LIMIT 10" (or the equivalent "LIMIT 49990,10"). MariaDB has to find all 50,000 rows, step over the first 49,990, then deliver the 10 for that distant page.
If it is a crawler ('spider') that read all the pages, then it actually touched about 125,000,000 items to read all 5,000 pages.
Reading the entire table, just to get a distant page, can be so much I/O that it can cause timeouts on the web page. Or it can interfere with other activity, causing other things to be slow.
In addition to a performance problem, ...
If an item is inserted or deleted between the time you look at one page and the next, you could miss an item, or see an item duplicated.
The pages are not easily bookmarked or sent to someone else because the contents shift over time.
The WHERE clause and the may even make it so that all 50,000 items have to be read, just to find the 10 items for page 1!
Hardware? No, that's just a bandaid. The data will continue to grow and even the new hardware won't handle it.
Better INDEX? No. You must get away from reading the entire table to get the 5000th page.
Build another table saying where the pages start? Get real! That would be a maintenance nightmare, and expensive.
Bottom line: Don't use OFFSET; instead remember where you "left off".
With INDEX(id), this suddenly becomes very efficient.
You are probably doing this now: datetime DESC LIMIT 49990,10 You probably have some unique id on the table. This can probably be used for "left off".
Currently, the [Next] button probably has a url something like ?topic=xyz&page=4999&limit=10 The 'topic' (or 'tag' or 'provider' or 'user' or etc) says which set of items are being displayed. The product of page*limit gives the OFFSET. (The "limit=10" might be in the url, or might be hard-coded; this choice is not relevant to this discussion.)
The new variant would be ?topic=xyz&id=12345&limit=10. (Note: the 12345 is not computable from 4999.) By using INDEX(topic, id) you can efficiently say
That will hit only 10 rows. This is a huge improvement for later pages. Now for more details.
What if there are exactly 10 rows left when you display the current page? It would make the UI nice if you grayed out the [Next] button, wouldn't it. (Or you could suppress the button all together.)
How to do that? Instead of LIMIT 10, use LIMIT 11. That will give you the 10 items needed for the current page, plus an indication of whether there is another page. And the id for that page.
So, take the 11th id for the [Next] button: <a href=?topic=xyz&id=$id11&limit=10>Next
Let's extend the 11 trick to also find the next 5 pages and build links for them.
Plan A is to say LIMIT 51. If you are on page 12, that would give you links for pages 13 (using 11th id) through pages 17 (51st).
Plan B is to do two queries, one to get the 10 items for the current page, the other to get the next 41 ids (LIMIT 10, 41) for the next 5 pages.
Which plan to pick? It depends on many things, so benchmark.
Reaching forward and backward by 5 pages is not too much work. It would take two separate queries to find the ids in both directions. Also, having links that take you to the First and Last pages would be easy to do. No id is needed; they can be something like
The UI would recognize those, then generate a SELECT with something like
The last items would be delivered in reverse order. Either deal with that in the UI, or make the SELECT more complex:
Let's say you are on page 12 of lots of pages. It could show these links:
where the ellipsis is really used. Some end cases:
The goal is to touch only the relevant rows, not all the rows leading up to the desired rows. This is nicely achieved, except for building links to the "next 5 pages". That may (or may not) be efficiently resolved by the simple SELECT id, discussed above. The reason that may not be efficient deals with the WHERE clause.
Let's discuss the optimal and suboptimal indexes.
For this discussion, I am assuming
The datetime field might have duplicates -- this can cause troubles
The id field is unique
The id field is close enough to datetime-ordered to be used instead of datetime.
Very efficient -- it does all the work in the index:
That will hit at least 51 consecutive index entries, plus at least 51 randomly located data rows.
Efficient -- back to the previous degree of efficiency:
Note how all the '=' parts of the WHERE come first; then comes both the '>=' and 'ORDER BY', both on id. This means that the INDEX can be used for all the WHERE, plus the ORDER BY.
You lose the "out of" except when the count is small. Instead, say something like
Alternatively... Only a few searches will have too many items to count. Keep another table with the search criteria and a count. This count can be computed daily (or hourly) by some background script. When discovering that the topic is a busy one, look it up in the table to get
The background script would round the count off.
The quick way to get an estimated number of rows for an InnoDB table is
However, it does not allow for the WHERE clause that you probably have.
If the search criteria cannot be confined to an INDEX in a single table, this technique is doomed. I have another paper that discusses "Lists", which solves that (which extra development work), and even improves on what is discussed here.
This depends on
How many rows (total)
Whether the WHERE clause prevented the efficient use of the ORDER BY
Whether the data is bigger than the cache. This last one kicks in when building one page requires reading more data from disk can be cached. At that point, the problem goes from being CPU-bound to being I/O-bound. This is likely to suddenly slow down the loading of a pages by a factor of 10.
Cannot "jump to Page N", for an arbitrary N. Why do you want to do that?
Walking backward from the end does not know the page numbers.
The code is more complex.
Designed about 2007; posted 2012.
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
This article describes different techniques for inserting data quickly into MariaDB.
When inserting new data into MariaDB, the things that take time are (in order of importance):
Syncing data to disk (as part of the end of transactions)
Adding new keys. The larger the index, the more time it takes to keep keys updated.
Checking against foreign keys (if they exist).
Adding rows to the storage engine.
Sending data to the server.
The following describes the different techniques (again, in order of importance) you can use to quickly insert data into a table.
You can temporarily disable updating of non-unique indexes. This is mostly useful when there are zero (or very few) rows in the table into which you are inserting data.
In many storage engines (at least MyISAM and Aria),ENABLE KEYS works by scanning through the row data and collecting keys, sorting them and then creating the index blocks. This is an order of magnitude
faster than creating the index one row at a time and it also uses less key buffer memory.
Note: When you insert into an empty table with or , MariaDB automatically does a before and an afterwards.
When inserting big amounts of data, integrity checks are sensibly time-consuming. It is possible to disable the UNIQUE indexes and the checks using the and the system variables:
For InnoDB tables, the can be temporarily set to 2, which is the fastest setting:
Also, if the table has or columns, you may want to drop them, insert all data, and recreate them.
The fastest way to insert data into MariaDB is through the command.
The simplest form of the command is:
You can also read a file locally on the machine where the client is running by using:
This is not as fast as reading the file on the server side, but the difference is not that big.
LOAD DATA INFILE is very fast because:
there is no parsing of SQL.
data is read in big blocks.
if the table is empty at the beginning of the operation, all non-unique indexes are disabled during the operation.
the engine is told to cache rows first and then insert them in big blocks (At least MyISAM and Aria support this).
Because of the above speed advantages there are many cases, when you need to insert many rows at a time, where it may be faster to create a file locally, add the rows there, and then use LOAD DATA INFILE to load them; compared to using INSERT to insert the rows.
You will also get forLOAD DATA INFILE.
You can import many files in parallel with (mysqlimport before ). For example:
Internally uses to read in the data.
When doing many inserts in a row, you should wrap them with BEGIN / END to avoid doing a full transaction (which includes a disk sync) for every row. For example, doing a begin/end every 1000 inserts will speed up your inserts by almost 1000 times.
The reason why you may want to have many BEGIN/END statements instead of just one is that the former will use up less transaction log space.
You can insert many rows at once with multi-value row inserts:
The limit for how much data you can have in one statement is controlled by the server variable.
If you need to insert data into several tables at once, the best way to do so is to enable multi-row statements and send many inserts to the server at once:
is a function that returns the lastauto_increment value inserted.
By default, the command line mariadb client will send the above as multiple statements.
To test this in the mariadb client you have to do:
Note: for multi-query statements to work, your client must specify theCLIENT_MULTI_STATEMENTS flag to mysql_real_connect().
See for the full list of server variables.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB supports the Lateral Derived optimization, also referred to as "Split Grouping Optimization" or "Split Materialized Optimization" in some sources.
The optimization's use case is
The query uses a derived table (or a VIEW, or a non-recursive CTE)
The derived table/View/CTE has a GROUP BY operation as its top-level operation
The query only needs data from a few GROUP BY groups
An example of this: consider a VIEW that computes totals for each customer in October:
And a query that does a join with the customer table to get October totals for "Customer#1" and Customer#2:
Before Lateral Derived optimization, MariaDB would execute the query as follows:
Materialize the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.
Join it with table customer.
The EXPLAIN would look like so:
It is obvious that Step #1 is very inefficient: we compute totals for all customers in the database, while we will only need them for two customers. (If there are 1000 customers, we are doing 500x more work than needed here)
Lateral Derived optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables.
This allows pushing the equality customer.customer_id=OCT_TOTALS.customer_id down into the derived table/view, where it can be used to limit the computation to compute totals only for the customer of interest.
The query plan will look as follows:
Scan table customer and find customer_id for Customer#1 and Customer#2.
For each customer_id, compute the October totals, for this specific customer.
The EXPLAIN output will look like so:
Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.
In EXPLAIN FORMAT=JSON output, the optimization is shown like so:
Note the "lateral": 1 member.
Lateral Derived is enabled by default. The optimizer will make a cost-based decision whether the optimization should be used.
If you need to disable the optimization, it has an flag. It can be disabled like so:
From MariaDB 12.1, it is possible to enable or disable the optimization with an optimizer hint, .
For example, by default, this table and query makes use of the optimization:
CREATE TABLE t1 ( n1 INT(10) NOT NULL, n2 INT(10) NOT NULL, c1 CHAR(1) NOT NULL, KEY c1 (c1), KEY n1_c1_n2 (n1,c1,n2) ) ENGINE=innodb CHARSET=latin1;
INSERT INTO t1 VALUES (0, 2, 'a'), (1, 3, 'a');
INSERT INTO t1 SELECT seq+1,seq+2,'c' FROM seq_1_to_1000;
ANALYZE TABLE t1;
EXPLAIN SELECT t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G
Jira task:
Commit:
This page is licensed: CC BY-SA / Gnu FDL
Open-ended set of "attributes" (key=value) for each "entity". That is, the list of attributes is not known at development time, and will grow in the future. (This makes one column per attribute impractical.)
"ad hoc" queries testing attributes.
Attribute values come in different types (numbers, strings, dates, etc.)
Scale to lots of entities, yet perform well.
It goes by various names
EAV -- Entity - Attribute - Value
key-value
RDF -- This is a flavor of EAV
MariaDB has dynamic columns that look something like the solution below, with the added advantage of being able to index the columns otherwise hidden in the blob. (There are caveats.)
Table with 3 columns: entity_id, key, value
The "value" is a string, or maybe multiple columns depending on datatype or other kludges.
a JOIN b ON a.entity=b.entity AND b.key='x' JOIN c ON ... WHERE a.value=... AND b.value=...
The SELECTs get messy -- multiple JOINs
Datatype issues -- It's clumsy to be putting numbers into strings
Numbers stored in do not compare 'correctly', especially for range tests.
Bulky.
Decide which columns need to be searched/sorted by SQL queries. No, you don't need all the columns to be searchable or sortable. Certain columns are frequently used for selection; identify these. You probably won't use all of them in all queries, but you will use some of them in every query.
The solution uses one table for all the EAV stuff. The columns include the searchable fields plus one . Searchable fields are declared appropriately (, , etc). The BLOB contains JSON-encoding of all the extra fields.
The table should be , hence it should have a PRIMARY KEY. The entitity_id is the 'natural' PK. Add a small number of other indexes (often 'composite') on the searchable fields. is unlikely to be of any use, unless the Entities should purged after some time. (Example: News Articles)
You have included the most important fields to search on -- date, category, etc. These should filter the data down significantly. When you also need to filter on something more obscure, that will be handled differently. The application code will look at the BLOB for that; more on this later.
You are not really going to search on more than a few fields.
The disk footprint is smaller; Smaller --> More cacheable --> Faster
It needs no JOINs
The indexes are useful
Build the extra (or all) key-value pairs in a hash (associative array) in your application. Encode it. COMPRESS it. Insert that string into the .
JSON is recommended, but not mandatory; it is simpler than XML. Other serializations (eg, YAML) could be used.
COMPRESS the JSON and put it into a (or ) instead of a field. Compression gives about 3x shrinkage.
When SELECTing, UNCOMPRESS the blob. Decode the string into a hash. You are now ready to interrogate/display any of the extra fields.
Schema is reasonably compact (compression, real datatypes, less redundancy, etc, than EAV)
Queries are fast (since you have picked 'good' indexes)
Expandable (JSON is happy to have new fields)
Compatible (No 3rd party products, just supported products)
Posted Jan, 2014; Refreshed Feb, 2016.
MariaDB's
This looks very promising; I will need to do more research to see how much of this article is obviated by it: ,
If you insist on EAV, set .
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
Multi Range Read is an optimization aimed at improving performance for IO-bound queries that need to scan lots of rows.
Multi Range Read can be used with
range access
ref and eq_ref access, when they are using
as shown in this diagram:
This document starts out trivial and perhaps boring, but builds up to more interesting information, perhaps things you did not realize about how MariaDB and MySQL indexing works.
This also explains (to some extent).
(Most of this applies to other databases, too.)
SELECT *
FROM items
WHERE messy_filtering
ORDER BY date DESC
OFFSET $M LIMIT $NMySQL 5.7 Has JSON datatype, plus functions to access parts
MongoDB, CouchDB -- and others -- Not SQL-based.
Dedupping the values is clumsy.
Performance is as good as the indexes you have on the 'searchable fields'.
Optionally, you can duplicate the indexed fields in the BLOB.
Values missing from 'searchable fields' would need to be NULL (or whatever), and the code would need to deal with such.
If you choose to use the JSON features of MariaDB or 5.7, you will have to forgo the compression feature described.
MySQL 5.7.8's JSON native JSON datatype uses a binary format for more efficient access.
(Drawback) Cannot use the non-indexed attributes in WHERE or ORDER BY clauses, must deal with that in the app. (MySQL 5.7 partially alleviates this.)
For MyISAM/Aria fulltext indexes only, if a word appears in more than half the rows, it is also excluded from the results of a fulltext search.
For InnoDB indexes, only committed rows appear - modifications from the current transaction do not apply.
*
The wildcard, indicating zero or more characters. It can only appear at the end of a word.
"
Anything enclosed in the double quotes is taken as a whole (so you can match phrases, for example).
+
The word is mandatory in all rows returned.
-
The word cannot appear in any row returned.
<
The word that follows has a lower relevance than other words, although rows containing it will still match
>
The word that follows has a higher relevance than other words.
()
Used to group words into subexpressions.
~
The word following contributes negatively to the relevance of the row (which is different to the '-' operator, which specifically excludes the word, or the '<' operator, which still causes the word to contribute positively to the relevance of the row.
With one non-InnoDB Staging table per Inserter, using an explicit LOCK TABLE avoids repeated implicit locks on each INSERT.
But, if you are doing LOCK TABLE and the Processing thread is separate, an UNLOCK is necessary periodically to let the RENAME grab the table.
"Batch INSERTs" (100-1000 rows per SQL) eliminates much of the issues of the above bullet items.
Copy from Staging to Fact.
COMMIT
Normalize -- each SQL in its own transaction (autocommit)
BEGIN
Summarize
Copy from Staging to Fact.
COMMIT
Declare that you are finished with this 'thing' (see step 1) EndLoop.
EXPLAIN SELECT /*+ NO_SPLIT_MATERIALIZED(t) */ t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G
No optimizer hint is available.
MATCH (col1,col2,...) AGAINST (expr [search_modifier])CREATE TABLE ft_myisam(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
INSERT INTO ft_myisam(copy) VALUES ('Once upon a time'),
('There was a wicked witch'), ('Who ate everybody up');
SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
+--------------------------+
| copy |
+--------------------------+
| There was a wicked witch |
+--------------------------+SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked,witch');
+---------------------------------+
| copy |
+---------------------------------+
| There was a wicked witch |
+---------------------------------+SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('Once');
Empty set (0.00 sec)INSERT INTO ft_myisam(copy) VALUES ('Once upon a wicked time'),
('There was a wicked wicked witch'), ('Who ate everybody wicked up');
SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
Empty set (0.00 sec)SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked' IN BOOLEAN MODE);
+---------------------------------+
| copy |
+---------------------------------+
| There was a wicked witch |
| Once upon a wicked time |
| There was a wicked wicked witch |
| Who ate everybody wicked up |
+---------------------------------+SELECT copy,MATCH(copy) AGAINST('witch') AS relevance
FROM ft_myisam WHERE MATCH(copy) AGAINST('witch');
+---------------------------------+--------------------+
| copy | relevance |
+---------------------------------+--------------------+
| There was a wicked witch | 0.6775632500648499 |
| There was a wicked wicked witch | 0.5031757950782776 |
+---------------------------------+--------------------+CREATE TABLE ft2(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
INSERT INTO ft2(copy) VALUES
('MySQL vs MariaDB database'),
('Oracle vs MariaDB database'),
('PostgreSQL vs MariaDB database'),
('MariaDB overview'),
('Foreign keys'),
('Primary keys'),
('Indexes'),
('Transactions'),
('Triggers');
SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database');
+--------------------------------+
| copy |
+--------------------------------+
| MySQL vs MariaDB database |
| Oracle vs MariaDB database |
| PostgreSQL vs MariaDB database |
+--------------------------------+
3 rows in set (0.00 sec)
SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database' WITH QUERY EXPANSION);
+--------------------------------+
| copy |
+--------------------------------+
| MySQL vs MariaDB database |
| Oracle vs MariaDB database |
| PostgreSQL vs MariaDB database |
| MariaDB overview |
+--------------------------------+
4 rows in set (0.00 sec)SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('Maria*' IN BOOLEAN MODE);
+--------------------------------+
| copy |
+--------------------------------+
| MySQL vs MariaDB database |
| Oracle vs MariaDB database |
| PostgreSQL vs MariaDB database |
| MariaDB overview |
+--------------------------------+SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('+MariaDB -database'
IN BOOLEAN MODE);
+------------------+
| copy |
+------------------+
| MariaDB overview |
+------------------+CREATE TABLE Hosts (
host_id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
host_name VARCHAR(99) NOT NULL,
PRIMARY KEY (host_id), -- for mapping one direction
INDEX(host_name, host_id) -- for mapping the other direction
) ENGINE=InnoDB; -- InnoDB works best for Many:Many mapping tablehost_name VARCHAR(99) NOT NULL, -- Comes from the insertion proces
host_id MEDIUMINT UNSIGNED NULL, -- NULL to start with; see code belowhost_id MEDIUMINT UNSIGNED NOT NULL,# This should not be in the main transaction, and it should be done with autocommit = ON
# In fact, it could lead to strange errors if this were part
# of the main transaction and it ROLLBACKed.
INSERT IGNORE INTO Hosts (host_name)
SELECT DISTINCT s.host_name
FROM Staging AS s
LEFT JOIN Hosts AS n ON n.host_name = s.host_name
WHERE n.host_id IS NULL;# Also not in the main transaction, and it should be with autocommit = ON
# This multi-table UPDATE sets the ids in Staging:
UPDATE Hosts AS n
JOIN Staging AS s ON s.host_name = n host_name
SET s.host_id = n.host_idDROP TABLE StageProcess;
CREATE TABLE StageProcess LIKE Staging;
RENAME TABLE Staging TO tmp, StageProcess TO Staging, tmp TO StageProcess;# First page (latest 10 items):
SELECT ... WHERE ... ORDER BY id DESC LIMIT 10
# Next page (second 10):
SELECT ... WHERE ... AND id < $left_off ORDER BY id DESC LIMIT 10WHERE topic = 'xyz'
AND id >= 1234
ORDER BY id
LIMIT 10<a href=?topic=xyz&id=FIRST&limit=10>First</a>
<a href=?topic=xyz&id=LAST&limit=10>Last</a>WHERE topic = 'xyz'
ORDER BY id ASC -- ASC for First; DESC for Last
LIMIT 10( SELECT ...
WHERE topic = 'xyz'
ORDER BY id DESC
LIMIT 10
) ORDER BY id ASC[First] ... [7] [8] [9] [10] [11] 12 [13] [14] [15] [16] [17] ... [Last]# Page one of three:
First [2] [3]
# Page one of many:
First [2] [3] [4] [5] ... [Last]
# Page two of many:
[First] 2 [3] [4] [5] ... [Last]
# If you jump to the Last page, you don't know what page number it is.
# So, the best you can do is perhaps:
# [First] ... [Prev] LastINDEX(topic, id)
WHERE topic = 'xyz'
AND id >= 876
ORDER BY id ASC
LIMIT 10,41
<</code??
That will hit 51 consecutive index entries, 0 data rows.
Inefficient -- it must reach into the data:
<<code>>
INDEX(topic, id)
WHERE topic = 'xyz'
AND id >= 876
AND is_deleted = 0
ORDER BY id ASC
LIMIT 10,41INDEX(topic, is_deleted, id)
WHERE topic = 'xyz'
AND id >= 876
AND is_deleted = 0
ORDER BY id ASC
LIMIT 10,41Items 11-20 out of ManyItems 11-20 out of about 49,000SELECT table_rows
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'database_name'
AND TABLE_NAME = 'table_name'CREATE VIEW OCT_TOTALS AS
SELECT
customer_id,
SUM(amount) AS TOTAL_AMT
FROM orders
WHERE
order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY
customer_id;SELECT *
FROM
customer, OCT_TOTALS
WHERE
customer.customer_id=OCT_TOTALS.customer_id AND
customer.customer_name IN ('Customer#1', 'Customer#2')+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
| 1 | PRIMARY | customer | range | PRIMARY,name | name | 103 | NULL | 2 | Using where; Using index |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.customer.customer_id | 36 | |
| 2 | DERIVED | orders | index | NULL | o_cust_id | 4 | NULL | 36738 | Using where |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------++------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
| 1 | PRIMARY | customer | range | PRIMARY,name | name | 103 | NULL | 2 | Using where; Using index |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.customer.customer_id | 2 | |
| 2 | LATERAL DERIVED | orders | ref | o_cust_id | o_cust_id | 4 | test.customer.customer_id | 1 | Using where |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+...
"table": {
"table_name": "<derived2>",
"access_type": "ref",
...
"materialized": {
"lateral": 1,SET optimizer_switch='split_materialized=off'*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t1
type: ref
possible_keys: c1,n1_c1_n2
key: c1
key_len: 1
ref: const
rows: 2
Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ref
possible_keys: key0
key: key0
key_len: 8
ref: test.t1.n1,test.t1.n2
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: LATERAL DERIVED
table: t1
type: ref
possible_keys: c1,n1_c1_n2
key: n1_c1_n2
key_len: 4
ref: test.t1.n1
rows: 1
Extra: Using where; Using index
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t1
type: ref
possible_keys: c1,n1_c1_n2
key: c1
key_len: 1
ref: const
rows: 2
Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ref
possible_keys: key0
key: key0
key_len: 8
ref: test.t1.n1,test.t1.n2
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: t1
type: ref
possible_keys: c1
key: c1
key_len: 1
ref: const
rows: 2
Extra: Using index condition; Using where; Using temporary; Using filesortTrying to drop a table that is referenced by a foreign key produces a 1217 error (SQLSTATE '23000').
A TRUNCATE TABLE against a table containing one or more foreign keys is executed as a DELETE without WHERE, so that the foreign keys are enforced for each row.
CASCADESET NULL: The change is allowed, and the child row's foreign key columns are set to NULL.
SET DEFAULT: Only worked with PBXT. Similar to SET NULL, but the foreign key columns were set to their default values. If default values do not exist, an error is produced.
If ON UPDATE CASCADE recurses to update the same table it has previously updated during the cascade, it acts like RESTRICT.
Indexed generated columns (both VIRTUAL and PERSISTENT) are not supported as InnoDB foreign key indexes.
Prior to MariaDB 12.1, foreign key names are required to be unique per database. From MariaDB 12.1, foreign key names are only required to be unique per table.
innodb_defragment_fill_factor: Indicates how full defragmentation should fill a page.
innodb_defragment_frequency: Maximum times per second for defragmenting a single index.
for empty tables, some transactional engines (like Aria) do not log the inserted data in the transaction log because one can roll back the operation by just doing a TRUNCATE on the table.
Increase this if you have many indexes in InnoDB/XtraDB tables
Increase this if you have many indexes in MyISAM tables
Increase this to allow bigger multi-insert statements
Read block size when reading a file with LOAD DATA
Consider a range query:
When this query is executed, disk IO access pattern will follow the red line in this figure:
Execution will hit the table rows in random places, as marked with the blue line/numbers in the figure.
When the table is sufficiently big, each table record read will need to actually go to disk (and be served from buffer pool or OS cache), and query execution will be too slow to be practical. For example, a 10,000 RPM disk drive is able to make 167 seeks per second, so in the worst case, query execution will be capped at reading about 167 records per second.
SSD drives do not need to do disk seeks, so they will not be hurt as badly, however the performance will still be poor in many cases.
Multi-Range-Read optimization aims to make disk access faster by sorting record read requests and then doing one ordered disk sweep. If one enables Multi Range Read, EXPLAIN will show that a "Rowid-ordered scan" is used:
and the execution will proceed as follows:
Reading disk data sequentially is generally faster, because
Rotating drives do not have to move the head back and forth
One can take advantage of IO-prefetching done at various levels
Each disk page will be read exactly once, which means we won't rely on disk cache (or buffer pool) to save us from reading the same page multiple times.
The above can make a huge difference on performance. There is also a catch, though:
If you're scanning small data ranges in a table that is sufficiently small so that it completely fits into the OS disk cache, then you may observe that the only effect of MRR is that extra buffering/sorting adds some CPU overhead.
LIMIT n and ORDER BY ... LIMIT n queries with small values of n may become slower. The reason is that MRR reads data in disk order, while ORDER BY ... LIMIT n wants first n records in index order.
Batched Key Access can benefit from rowid sorting in the same way as range access does. If one has a join that uses index lookups:
Execution of this query will cause table t2 to be hit in random locations by lookups made through t2.key1=t1.col. If you enable Multi Range and Batched Key Access, you will get table t2 to be accessed using a Rowid-ordered scan:
The benefits will be similar to those listed for range access.
An additional source of speedup is this property: if there are multiple records in t1 that have the same value of t1.col1, then regular Nested-Loops join will make multiple index lookups for the same value of t2.key1=t1.col1. The lookups may or may not hit the cache, depending on how big the join is. With Batched Key Access and Multi-Range Read, no duplicate index lookups will be made.
Let us consider again the nested loop join example, with ref access on the second table:
Execution of this query plan will cause random hits to be made into the index t2.key1, as shown in this picture:
In particular, on step #5 we'll read the same index page that we've read on step #2, and the page we've read on step #4 will be re-read on step#6. If all pages you're accessing are in the cache (in the buffer pool, if you're using InnoDB, and in the key cache, if you're using MyISAM), this is not a problem. However, if your hit ratio is poor and you're going to hit the disk, it makes sense to sort the lookup keys, like shown in this figure:
This is roughly what Key-ordered scan optimization does. In EXPLAIN, it looks as follows:
((TODO: a note about why sweep-read over InnoDB's clustered primary index scan (which is, actually the whole InnoDB table itself) will use Key-ordered scan algorithm, but not Rowid-ordered scan algorithm, even though conceptually they are the same thing in this case))
As was shown above, Multi Range Read requires sort buffers to operate. The size of the buffers is limited by system variables. If MRR has to process more data than it can fit into its buffer, it will break the scan into multiple passes. The more passes are made, the less is the speedup though, so one needs to balance between having too big buffers (which consume lots of memory) and too small buffers (which limit the possible speedup).
When MRR is used for range access, the size of its buffer is controlled by the mrr_buffer_size system variable. Its value specifies how much space can be used for each table. For example, if there is a query which is a 10-way join and MRR is used for each table, 10*@@mrr_buffer_size bytes may be used.
When Multi Range Read is used by Batched Key Access, then buffer space is managed by BKA code, which will automatically provide a part of its buffer space to MRR. You can control the amount of space used by BKA by setting
join_buffer_size to limit how much memory BKA uses for each table, and
join_buffer_space_limit to limit the total amount of memory used by BKA in the join.
There are three status variables related to Multi Range Read:
Counts how many Multi Range Read scans were performed
Number of times key buffer was refilled (not counting the initial fill)
Number of times rowid buffer was refilled (not counting the initial fill)
Non-zero values of Handler_mrr_key_refills and/or Handler_mrr_rowid_refills mean that Multi Range Read scan did not have enough memory and had to do multiple key/rowid sort-and-sweep passes. The greatest speedup is achieved when Multi Range Read runs everything in one pass, if you see lots of refills it may be beneficial to increase sizes of relevant buffers mrr_buffer_size join_buffer_size and join_buffer_space_limit
When a Multi Range Read scan makes an index lookup (or some other "basic" operation), the counter of the "basic" operation, e.g. Handler_read_key, will also be incremented. This way, you can still see total number of index accesses, including those made by MRR. Per-user/table/index statistics counters also include the row reads made by Multi Range Read scans.
Multi Range Read is used for scans that do full record reads (i.e., they are not "Index only" scans). A regular non-index-only scan will read
an index record, to get a rowid of the table record
a table record
Both actions will be done by making one call to the storage engine, so the effect of the call will be that the relevan Handler_read_XXX counter will be incremented BY ONE, and Innodb_rows_read will be incremented BY ONE.
Multi Range Read will make separate calls for steps #1 and #2, causing TWO increments to Handler_read_XXX counters and TWO increments to Innodb_rows_read counter. To the uninformed, this looks as if Multi Range Read was making things worse. Actually, it doesn't - the query will still read the same index/table records, and actually Multi Range Read may give speedups because it reads data in disk order.
Multi Range Read is used by
range access method for range scans.
Batched Key Access for joins
Multi Range Read can cause slowdowns for small queries over small tables, so it is disabled by default.
There are two strategies:
Rowid-ordered scan
Key-ordered scan
: and you can tell if either of them is used by checking the Extra column in EXPLAIN output.
There are three flags you can switch ON:
mrr=on - enable MRR and rowid ordered scans
mrr_sort_keys=on - enable Key-ordered scans (you must also set mrr=on for this to have any effect)
MySQL supports only Rowid ordered scan strategy, which it shows in EXPLAIN as Using MRR.
EXPLAIN in MySQL shows Using MRR, while in MariaDB it may show
Rowid-ordered scan
Key-ordered scan
Key-ordered Rowid-ordered scan
MariaDB uses as a limit of MRR buffer size for range access, while MySQL uses .
MariaDB has three MRR counters: , Handler_mrr_extra_rowid_sorts, Handler_mrr_extra_key_sorts, while MySQL has only Handler_mrr_init, and it will only count MRR scans that were used by BKA. MRR scans used by range access are not counted.
This page is licensed: CC BY-SA / Gnu FDL
The question is "When was Andrew Johnson president of the US?".
The available table Presidents looks like:
("Andrew Johnson" was picked for this lesson because of the duplicates.)
What index(es) would be best for that question? More specifically, what would be best for
Some INDEXes to try...
No indexes
INDEX(first_name), INDEX(last_name) (two separate indexes)
"Index Merge Intersect"
INDEX(last_name, first_name) (a "compound" index)
INDEX(last_name, first_name, term) (a "covering" index)
Variants
Well, I am fudging a little here. I have a PRIMARY KEY on seq, but that has no advantage on the query we are studying.
First, let's describe how InnoDB stores and uses indexes.
The data and the PRIMARY KEY are "clustered" together in on BTree.
A BTree lookup is quite fast and efficient. For a million-row table there might be 3 levels of BTree, and the top two levels are probably cached.
Each secondary index is in another BTree, with the PRIMARY KEY at the leaf.
Fetching 'consecutive' (according to the index) items from a BTree is very efficient because they are stored consecutively.
For the sake of simplicity, we can count each BTree lookup as 1 unit of work, and ignore scans for consecutive items. This approximates the number of disk hits for a large table in a busy system.
For MyISAM, the PRIMARY KEY is not stored with the data, so think of it as being a secondary key (over-simplified).
The novice, once he learns about indexing, decides to index lots of columns, one at a time. But...
MariaDB rarely uses more than one index at a time in a query. So, it will analyze the possible indexes.
first_name -- there are 2 possible rows (one BTree lookup, then scan consecutively)
last_name -- there are 2 possible rows Let's say it picks last_name. Here are the steps for doing the SELECT:
Using INDEX(last_name), find 2 index entries with last_name = 'Johnson'.
Get the PRIMARY KEY (implicitly added to each secondary index in InnoDB); get (17, 36).
Reach into the data using seq = (17, 36) to get the rows for Andrew Johnson and Lyndon B. Johnson.
Use the rest of the WHERE clause filter out all but the desired row.
Deliver the answer (1865-1869).
OK, so you get really smart and decide that MariaDB should be smart enough to use both name indexes to get the answer. This is called "Intersect".
Using INDEX(last_name), find 2 index entries with last_name = 'Johnson'; get (7, 17)
Using INDEX(first_name), find 2 index entries with first_name = 'Andrew'; get (17, 36)
"And" the two lists together (7,17) & (17,36) = (17)
Reach into the data using seq = (17) to get the row for Andrew Johnson.
Deliver the answer (1865-1869).
The EXPLAIN fails to give the gory details of how many rows collected from each index, etc.
This is called a "compound" or "composite" index since it has more than one column.
Drill down the BTree for the index to get to exactly the index row for Johnson+Andrew; get seq = (17).
Reach into the data using seq = (17) to get the row for Andrew Johnson.
Deliver the answer (1865-1869). This is much better. In fact this is usually the "best".
Surprise! We can actually do a little better. A "Covering" index is one in which all of the fields of the SELECT are found in the index. It has the added bonus of not having to reach into the "data" to finish the task.
Drill down the BTree for the index to get to exactly the index row for Johnson+Andrew; get seq = (17).
Deliver the answer (1865-1869). The "data" BTree is not touched; this is an improvement over "compound".
Everything is similar to using "compound", except for the addition of "Using index".
What would happen if you shuffled the fields in the WHERE clause? Answer: The order of ANDed things does not matter.
What would happen if you shuffled the fields in the INDEX? Answer: It may make a huge difference. More in a minute.
What if there are extra fields on the end? Answer: Minimal harm; possibly a lot of good (eg, 'covering').
Reduncancy? That is, what if you have both of these: INDEX(a), INDEX(a,b)? Answer: Reduncy costs something on INSERTs; it is rarely useful for SELECTs.
Prefix? That is, INDEX(last_name(5). first_name(5)) Answer: Don't bother; it rarely helps, and often hurts. (The details are another topic.)
Refreshed -- Oct, 2012; more links -- Nov 2016
Rick James graciously allowed us to use this article in the documentation.
Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: index1
This page is licensed: CC BY-SA / Gnu FDL
This article does not apply to the thread pool implementation on Windows. On Windows, MariaDB uses a native thread pool created with the CreateThreadpool APl, which has its own methods to distribute threads between CPUs.
On Unix, the thread pool implementation uses objects called thread groups to divide up client connections into many independent sets of threads. The thread_pool_size system variable defines the number of thread groups on a system. Generally speaking, the goal of the thread group implementation is to have one running thread on each CPU on the system at a time. Therefore, the default value of the thread_pool_size system variable is auto-sized to the number of CPUs on the system.
When setting the thread_pool_size system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting its value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher. It can be changed dynamically with SET GLOBAL. For example:
It can also be set in a server option group in an option file prior to starting up the server. For example:
If you do not want MariaDB to use all CPUs on the system for some reason, then you can set it to a lower value than the number of CPUs. For example, this would make sense if the MariaDB Server process is limited to certain CPUs with the utility on Linux.
If you set the value to the number of CPUs and if you find that the CPUs are still underutilized, then try increasing the value.
The system variable tends to have the most visible performance effect. It is roughly equivalent to the number of threads that can run at the same time. In this case, run means use CPU, rather than sleep or wait. If a client connection needs to sleep or wait for some reason, then it wakes up another client connection in the thread group before it does so.
One reason that CPU underutilization may occur in rare cases is that the thread pool is not always informed when a thread is going to wait. For example, some waits, such as a page fault or a miss in the OS buffer cache, cannot be detected by MariaDB.
When a new client connection is created, its thread group is determined using the following calculation:
The connection_id value in the above calculation is the same monotonically increasing number that you can use to identify connections in output or the table.
This calculation should assign client connections to each thread group in a round-robin manner. In general, this should result in an even distribution of client connections among thread groups.
Thread groups have two different kinds of threads: a listener thread and worker threads.
A thread group's worker threads actually perform work on behalf of client connections. A thread group can have many worker threads, but usually, only one will be actively running at a time. This is not always the case. For example, the thread group can become oversubscribed if the thread pool's timer thread detects that the thread group is stalled. This is explained more in the sections below.
A thread group's listener thread listens for I/O events and distributes work to the worker threads. If it detects that there is a request that needs to be worked on, then it can wake up a sleeping worker thread in the thread group, if any exist. If the listener thread is the only thread in the thread group, then it can also create a new worker thread. If there is only one request to handle, and if the system variable is not enabled, then the listener thread can also become a worker thread and handle the request itself. This helps decrease the overhead that may be introduced by excessively waking up sleeping worker threads and excessively creating new worker threads.
The thread pool has one global thread: a timer thread. The timer thread performs tasks, such as:
Checks each thread group for stalls.
Ensures that each thread group has a listener thread.
A new thread is created in a thread group in the scenarios listed below.
In all of the scenarios below, the thread pool implementation prefers to wake up a sleeping worker thread that already exists in the thread group, rather than to create a new thread.
A thread group's listener thread can create a new worker thread when it has more client connection requests to distribute, but no pre-existing worker threads are available to work on the requests. This can help to ensure that the thread group always has enough threads to keep one worker thread active at a time.
A thread group's listener thread creates a new worker thread if all of the following conditions are met:
The listener thread receives a client connection request that needs to be worked on.
There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads, so the listener thread should not become a worker thread.
There are no active worker threads in the thread group.
There are no sleeping worker threads in the thread group that the listener thread can wake up.
A thread group's worker thread can create a new worker thread when the thread has to wait on something, and the thread group has more client connection requests queued, but no pre-existing worker threads are available to work on them. This can help to ensure that the thread group always has enough threads to keep one worker thread active at a time. For most workloads, this tends to be the primary mechanism that creates new worker threads.
A thread group's worker thread creates a new thread if all of the following conditions are met:
The worker thread has to wait on some request. For example, it might be waiting on disk I/O, or it might be waiting on a lock, or it might just be waiting for a query that called the function to finish.
There are no active worker threads in the thread group.
There are no sleeping worker threads in the thread group that the worker thread can wake up.
And one of the following conditions is also met:
The thread pool's timer thread can create a new listener thread for a thread group when the thread group has more client connection requests that need to be distributed, but the thread group does not currently have a listener thread to distribute them. This can help to ensure that the thread group does not miss client connection requests because it has no listener thread.
The thread pool's timer thread creates a new listener thread for a thread group if all of the following conditions are met:
The thread group has not handled any I/O events since the last check by the timer thread.
There is currently no listener thread in the thread group. For example, if the system variable is not enabled, then the thread group's listener thread can became a worker thread, so that it could handle some client connection request. In this case, the new thread can become the thread group's listener thread.
There are no sleeping worker threads in the thread group that the timer thread can wake up.
And one of the following conditions is also met:
The thread pool's timer thread can create a new worker thread for a thread group when the thread group is stalled. This can help to ensure that a long query can't monopole its thread group.
The thread pool's timer thread creates a new worker thread for a thread group if all of the following conditions are met:
The timer thread thinks that the thread group is stalled. This means that the following conditions have been met:
There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads.
No client connection requests have been allowed to be dequeued to run since the last stall check by the timer thread.
There are no sleeping worker threads in the thread group that the timer thread can wake up.
In some of the scenarios listed above, a thread is only created within a thread group if no new threads have been created for the thread group within the throttling interval. The throttling interval depends on the number of threads that are already in the thread group.
In and later, thread creation is not throttled until a thread group has more than 1 + threads:
The throttling factor is calculated like this (see for more information):
The thread pool has a feature that allows it to detect if a client connection is executing a long-running query that may be monopolizing its thread group. If a client connection were to monopolize its thread group, then that could prevent other client connections in the thread group from running their queries. In other words, the thread group would appear to be stalled.
This stall detection feature is implemented by creating a timer thread that periodically checks if any of the thread groups are stalled. There is only a single timer thread for the entire thread pool. The system variable defines the number of milliseconds between each stall check performed by the timer thread. The default value is 500. It can be changed dynamically with . For example:
It can also be set in a server in an prior to starting up the server. For example:
The timer thread considers a thread group to be stalled if the following is true:
There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads.
No client connection requests have been allowed to be dequeued to run since the last stall check by the timer thread.
This indicates that the one or more client connections currently using the active worker threads may be monopolizing the thread group, and preventing the queued client connections from performing work. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel.
The system variable essentially defines the limit for what a "fast query" is. If a query takes longer than , then the thread pool is likely to think that it is too slow, and it will either wake up a sleeping worker thread or create a new worker thread to let another client connection in the thread group run a query in parallel.
In general, changing the value of the system variable has the following effect:
Setting it to higher values can help avoid starting too many parallel threads if you expect a lot of client connections to execute long-running queries.
Setting it to lower values can help prevent deadlocks.
If the timer thread were to detect a stall in a thread group, then it would either wake up a sleeping worker thread or create a new worker thread in that thread group. At that point, the thread group would have multiple active worker threads. In other words, the thread group would be oversubscribed.
You might expect that the thread pool would shutdown one of the worker threads when the stalled client connection finished what it was doing, so that the thread group would only have one active worker thread again. However, this does not always happen. Once a thread group is oversubscribed, the system variable defines the upper limit for when worker threads start shutting down after they finish work for client connections. The default value is 3. It can be changed dynamically with . For example:
It can also be set in a server in an prior to starting up the server. For example:
To clarify, the system variable does not play any part in the creation of new worker threads. The system variable is only used to determine how many worker threads should remain active in a thread group, once a thread group is already oversubscribed due to stalls.
In general, the default value of 3 should be adequate for most users. Most users should not need to change the value of the system variable.
This page is licensed: CC BY-SA / Gnu FDL
This article describes the system and status variables used by the MariaDB thread pool. For a full description, see Thread Pool in MariaDB.
extra_max_connectionsDescription: The number of connections on the .
See for more information.
Command line: --extra-max-connections=#
Scope: Global
extra_portDescription: Extra port number to use for TCP connections in a one-thread-per-connection manner. If set to 0, then no extra port is used.
See for more information.
Command line: --extra-port=#
thread_handlingDescription: Determines how the server handles threads for client connections. In addition to threads for client connections, this also applies to certain internal server threads, such as . On Windows, if you would like to use the thread pool, then you do not need to do anything, because the default for the thread_handling system variable is already preset to pool-of-threads.
When the default one-thread-per-connection mode is enabled, the server uses one thread to handle each client connection.
When the pool-of-threads
thread_pool_dedicated_listenerDescription: If set to 1, then each group will have its own dedicated listener, and the listener thread will not pick up work items. As a result, the queueing time in the and the actual queue size in the table will be more exact, since IO requests are immediately dequeued from poll, without delay.
This system variable is only meaningful on Unix.
Command line: thread-pool-dedicated-listener={0|1}
thread_pool_exact_statsDescription: If set to 1, provides better queueing time statistics by using a high precision timestamp, at a small performance cost, for the time when the connection was added to the queue. This timestamp helps calculate the queuing time shown in the table.
This system variable is only meaningful on Unix.
Command line: thread-pool-exact-stats={0|1}
thread_pool_idle_timeoutDescription: The number of seconds before an idle worker thread exits. The default value is 60. If there is currently no work to do, how long should an idle thread wait before exiting?
This system variable is only meaningful on Unix.
The system variable is comparable for Windows.
thread_pool_max_threadsDescription: The maximum number of threads in the . Once this limit is reached, no new threads will be created in most cases.
On Unix, in rare cases, the actual number of threads can slightly exceed this, because each needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks.
Scope:
Command line:
thread_pool_min_threadsDescription: Minimum number of threads in the . In bursty environments, after a period of inactivity, threads would normally be retired. When the next burst arrives, it would take time to reach the optimal level. Setting this value higher than the default would prevent thread retirement even if inactive.
This system variable is only meaningful on Windows.
The system variable is comparable for Unix.
thread_pool_oversubscribeDescription: Determines how many worker threads in a thread group can remain active at the same time once a thread group is oversubscribed due to stalls. The default value is 3. Usually, a thread group only has one active worker thread at a time. However, the timer thread can add more active worker threads to a thread group if it detects a stall. There are trade-offs to consider when deciding whether to allow only one thread per CPU to run at a time, or whether to allow more than one thread per CPU to run at a time. Allowing only one thread per CPU means that the thread can have unrestricted access to the CPU while its running, but it also means that there is additional overhead from putting threads to sleep or waking them up more frequently. Allowing more than one thread per CPU means that the threads have to share the CPU, but it also means that there is less overhead from putting threads to sleep or waking them up.
See for more information.
thread_pool_prio_kickup_timerDescription: Time in milliseconds before a dequeued low-priority statement is moved to the high-priority queue.
This system variable is only meaningful on Unix.
Command line: thread-pool-kickup-timer=#
Scope: Global
thread_pool_priorityDescription: priority. High-priority connections usually start executing earlier than low-priority. If set to 'auto' (the default), the actual priority (low or high) is determined by whether or not the connection is inside a transaction.
Command line: --thread-pool-priority=#
Scope: Global,Connection
Data Type: enum
thread_pool_sizeDescription: The number of in the , which determines how many statements can execute simultaneously. The default value is the number of CPUs on the system. When setting this system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting this system variable's value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher.
See for more information.
This system variable is only meaningful on Unix.
thread_pool_stall_limitDescription: The number of milliseconds between each stall check performed by the timer thread. The default value is 500. Stall detection is used to prevent a single client connection from monopolizing a thread group. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel. However, note that the timer thread will not create a new worker thread if the number of threads in the thread pool is already greater than or equal to the maximum defined by the variable, unless the thread group does not already have a listener thread.
See for more information.
Threadpool_idle_threadsDescription: Number of inactive threads in the . Threads become inactive for various reasons, such as by waiting for new work. However, an inactive thread is not necessarily one that has not been assigned work. Threads are also considered inactive if they are being blocked while waiting on disk I/O, or while waiting on a lock, etc.
This status variable is only meaningful on Unix.
Scope: Global, Session
Data Type:
Threadpool_threadsDescription: Number of threads in the . In rare cases, this can be slightly higher than , because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks.
Scope: Global, Session
Data Type: numeric
This page is licensed: CC BY-SA / Gnu FDL
How to DELETE lots of rows from a large table? Here is an example of purging items older than 30 days:
If there are millions of rows in the table, this statement may take minutes, maybe hours.
Any suggestions on how to speed this up?
will lock the table during the entire operation, thereby nothing else can be done with the table.
won't lock the table, but it will chew up a lot of resources, leading to sluggishness.
InnoDB has to write the undo information to its transaction logs; this significantly increases the I/O required.
, being asynchronous, will effectively be delayed (on Slaves) while the DELETE is running.
To be ready for a crash, a transactional engine such as InnoDB will record what it is doing to a log file. To make that somewhat less costly, the log file is sequentially written. If the log files you have (there are usually 2) fill up because the delete is really big, then the undo information spills into the actual data blocks, leading to even more I/O.
Deleting in chunks avoids some of this excess overhead.
Limited benchmarking of total delete elapsed time show two observations:
Total delete time approximately doubles above some 'chunk' size (as opposed to below that threshold). I do not have a formula relating the log file size with the threshold cutoff.
Chunk size below several hundred rows is slower. This is probably because the overhead of starting/ending each chunk dominates the timing.
Solutions
PARTITION -- Requires some careful setup, but is excellent for purging a time-base series.
DELETE in chunks -- Carefully walk through the table N rows at a time.
The idea here is to have a sliding window of . Let's say you need to purge news articles after 30 days. The "partition key" would be the (or ) that is to be used for purging, and the PARTITIONs would be "range". Every night, a cron job would come along and build a new partition for the next day, and drop the oldest partition.
Dropping a partition is essentially instantaneous, much faster than deleting that many rows. However, you must design the table so that the entire partition can be dropped. That is, you cannot have some items living longer than others.
PARTITION tables have a lot of restrictions, some are rather weird. You can either have no UNIQUE (or PRIMARY) key on the table, or every UNIQUE key must include the partition key. In this use case, the partition key is the datetime. It should not be the first part of the PRIMARY KEY (if you have a PRIMARY KEY).
You can PARTITION InnoDB or MyISAM tables.
Since two news articles could have the same timestamp, you cannot assume the partition key is sufficient for uniqueness of the PRIMARY KEY, so you need to find something else to help with that.
Reference implementation for Partition maintenance
Although the discussion in this section talks about DELETE, it can be used for any other "chunking", such as, say, UPDATE, or SELECT plus some complex processing.
(This discussion applies to both MyISAM and InnoDB.)
When deleting in chunks, be sure to avoid doing a table scan. The code below is good at that; it scans no more than 1001 rows in any one query. (The 1000 is tunable.)
Assuming you have news articles that need to be purged, and you have a schema something like
Then, this pseudo-code is a good way to delete the rows older than 30 days:
Notes (Most of these caveats will be covered later):
It uses the PK instead of the secondary key. This gives much better locality of disk hits, especially for InnoDB.
You could (should?) do something to avoid walking through recent days but doing nothing. Caution -- the code for this could be costly.
The 1000 should be tweaked so that the DELETE usually takes under, say, one second.
No INDEX on ts is needed. (This helps INSERTs a little.)
If there are big gaps in id values (and there will after the first purge), then
That code works whether id is numeric or character, and it mostly works even if id is not UNIQUE. With a non-unique key, the risk is that you could be caught in a loop whenever @z==@a. That can be detected and fixed thus:
The drawback is that there could be more than 1000 items with a single id. In most practical cases, that is unlikely.
If you do not have a primary (or unique) key defined on the table, and you have an INDEX on ts, then consider
This technique is NOT recommended because the LIMIT leads to a warning on replication about it being non-deterministic (discussed below).
Have a 'reasonable' size for .
Use AUTOCOMMIT=1 for the session doing the deletions.
Pick about 1000 rows for the chunk size.
Adjust the row count down if asynchronous replication (Statement Based) causes too much delay on the Slaves or hogs the table too much.
To perform the chunked deletes recommended above, you need a way to walk through the PRIMARY KEY. This can be difficult if the PK has more than one column in it.
To efficiently to do compound 'greater than':
Assume that you left off at ($g, $s) (and have handled that row):
Addenda: The above AND/OR works well in older versions of MySQL; this works better in MariaDB and newer versions of MySQL:
A caution about using @variables for strings. If, instead of '$g', you use @g, you need to be careful to make sure that @g has the same CHARACTER SET and COLLATION as Genus, else there could be a charset/collation conversion on the fly that prevents the use of the INDEX. Using the INDEX is vital for performance. It may require a COLLATE clause on SET NAMES and/or the @g in the SELECT.
This is costly. (Switch to the PARTITION solution if practical.)
MyISAM leaves gaps in the table (.MYD file); will reclaim the freed space after a big delete. But it may take a long time and lock the table.
InnoDB is block-structured, organized in a BTree on the PRIMARY KEY. An isolated deleted row leaves a block less full. A lot of deleted rows can lead to coalescing of adjacent blocks. (Blocks are normally 16KB - see .)
In InnoDB, there is no practical way to reclaim the freed space from ibdata1, other than to reuse the freed blocks eventually.
The only option with is to dump ALL tables, remove ibdata*, restart, and reload. That is rarely worth the effort and time.
InnoDB, even with innodb_file_per_table = 1, won't give space back to the OS, but at least it is only one table to rebuild with. In this case, something like this should work:
You do need enough disk space for both copies. You must not write to the table during the process.
The following technique can be used for any combination of
Deleting a large portion of the table more efficiently
Add PARTITIONing
Converting to
Defragmenting
This can be done by chunking, or (if practical) all at once:
Notes:
You do need enough disk space for both copies.
You must not write to the table during the process. (Changes to Main may not be reflected in New.)
Any UPDATE, DELETE, etc with LIMIT that is replicated to slaves (via ) may cause inconsistencies between the Master and Slaves. This is because the actual order of the records discovered for updating/deleting may be different on the slave, thereby leading to a different subset being modified. To be safe, add ORDER BY to such statements. Moreover, be sure the ORDER BY is deterministic -- that is, the fields/expressions in the ORDER BY are unique.
An example of an ORDER BY that does not quite work: Assume there are multiple rows for each 'date':
Given that id is the PRIMARY KEY (or UNIQUE), this will be safe:
Unfortunately, even with the ORDER BY, MySQL has a deficiency that leads to a bogus warning in mysqld.err. See Spurious "Statement is not safe to log in statement format." warnings
Some of the above code avoids this spurious warning by doing
That pair of statements guarantees no more than 1000 rows are touched, not the whole table.
If you KILL a DELETE (or any? query) on the master in the middle of its execution, what will be replicated?
If it is InnoDB, the query should be rolled back. (Exceptions??)
In MyISAM, rows are DELETEd as the statement is executed, and there is no provision for ROLLBACK. Some of the rows will be deleted, some won't. You probably have no clue of how much was deleted. In a single server, simply run the delete again. The delete is put into the binlog, but with error 1317. Since replication is supposed to keep the master and slave in sync, and since it has no clue of how to do that, replication stops and waits for manual intervention. In a HA (High Available) system using replication, this is a minor disaster. Meanwhile, you need to go to each slave(s) and verify that it is stuck for this reason, then do
Then (presumably) re-executing the DELETE will finish the aborted task.
(That is yet another reason to move all your tables .)
TBD -- "Row Based Replication" may impact this discussion.
The tips in this document apply to MySQL, MariaDB, and Percona.
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
The Charset Narrowing optimization handles equality comparisons like:
It enables the optimizer to construct ref access to utf8mb3_key_column based on this equality. The optimization supports comparisons of columns that use utf8mb3_general_ci to expressions that use utf8mb4_general_ci .
The optimization was introduced in MariaDB 10.6.16, , MariaDB 10.11.6, , and , where it is OFF by default. From , it is ON by default.
MariaDB supports both the UTF8MB3 and UTF8MB4 . It is possible to construct join queries that compare values in UTF8MB3 to UTF8MB4.
Suppose, we have the table 'users that uses UTF8MB4:
and table orders that uses UTF8MB3:
One can join users to orders on user_name:
Internally the optimizer will handle the equality by converting the UTF8MB3 value into UTF8MB4 and then doing the comparison. One can see the call to CONVERT in EXPLAIN FORMAT=JSON output or Optimizer Trace:
This produces the expected result but the query optimizer is not able to use the index over orders.user_name_mb3 to find matches for values of users.user_name_mb4.
The EXPLAIN of the above query looks like this:
The Charset Narrowing optimization enables the optimizer to perform the comparison between UTF8MB3 and UTF8MB4 values by "narrowing" the value in UTF8MB4 to UTF8MB3. The CONVERT call is no longer needed, and the optimizer is able to use the equality to construct ref access:
The optimization is controlled by an flag. Specify:
to enable the optimization.
: utf8mb3_key_col=utf8mb4_value cannot be used for ref access
Blog post:
This page is licensed: CC BY-SA / Gnu FDL
You want to "pivot" the data so that a linear list of values with two keys becomes a spreadsheet-like array. See examples, below.
The best solution is probably to do it in some form of client code (PHP, etc). MySQL and MariaDB do not have a syntax for SELECT that will do the work for you. The code provided here uses a
CREATE TABLE b(for_key INT REFERENCES a(not_key));[CONSTRAINT [symbol]] FOREIGN KEY
[index_name] (index_col_name, ...)
REFERENCES tbl_name (index_col_name,...)
[ON DELETE reference_option]
[ON UPDATE reference_option]
reference_option:
RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULTCREATE TABLE author (
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL
) ENGINE = InnoDB;
CREATE TABLE book (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(200) NOT NULL,
author_id SMALLINT UNSIGNED NOT NULL,
CONSTRAINT `fk_book_author`
FOREIGN KEY (author_id) REFERENCES author (id)
ON DELETE CASCADE
ON UPDATE RESTRICT
) ENGINE = InnoDB;INSERT INTO book (title, author_id) VALUES ('Necronomicon', 1);
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
(`test`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`)
REFERENCES `author` (`id`) ON DELETE CASCADE)INSERT INTO author (name) VALUES ('Abdul Alhazred');
INSERT INTO book (title, author_id) VALUES ('Necronomicon', LAST_INSERT_ID());
INSERT INTO author (name) VALUES ('H.P. Lovecraft');
INSERT INTO book (title, author_id) VALUES
('The call of Cthulhu', LAST_INSERT_ID()),
('The colour out of space', LAST_INSERT_ID());DELETE FROM author WHERE name = 'H.P. Lovecraft';
SELECT * FROM book;
+----+--------------+-----------+
| id | title | author_id |
+----+--------------+-----------+
| 3 | Necronomicon | 1 |
+----+--------------+-----------+UPDATE author SET id = 10 WHERE id = 1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
(`test`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`)
REFERENCES `author` (`id`) ON DELETE CASCADE)CREATE TABLE a(a_key INT PRIMARY KEY, not_key INT);
CREATE TABLE b(for_key INT REFERENCES a(not_key));
ERROR 1005 (HY000): Can't create table `test`.`b`
(errno: 150 "Foreign key constraint is incorrectly formed")
CREATE TABLE c(for_key INT REFERENCES a(a_key));
SHOW CREATE TABLE c;
+-------+----------------------------------------------------------------------------------+
| Table | Create Table |
+-------+----------------------------------------------------------------------------------+
| c | CREATE TABLE `c` (
`for_key` INT(11) DEFAULT NULL,
KEY `for_key` (`for_key`),
CONSTRAINT `c_ibfk_1` FOREIGN KEY (`for_key`) REFERENCES `a` (`a_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------+
INSERT INTO a VALUES (1,10);
Query OK, 1 row affected (0.004 sec)
INSERT INTO c VALUES (10);
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
(`test`.`c`, CONSTRAINT `c_ibfk_1` FOREIGN KEY (`for_key`) REFERENCES `a` (`a_key`))
INSERT INTO c VALUES (1);
Query OK, 1 row affected (0.004 sec)
SELECT * FROM c;
+---------+
| for_key |
+---------+
| 1 |
+---------+[mysqld]
...
innodb-defragment=1SET @@global.innodb_file_per_table = 1;
SET @@global.innodb_defragment_n_pages = 32;
SET @@global.innodb_defragment_fill_factor = 0.95;
CREATE TABLE tb_defragment (
pk1 BIGINT(20) NOT NULL,
pk2 BIGINT(20) NOT NULL,
fd4 TEXT,
fd5 VARCHAR(50) DEFAULT NULL,
PRIMARY KEY (pk1),
KEY ix1 (pk2)
) ENGINE=InnoDB;
DELIMITER //
CREATE PROCEDURE innodb_insert_proc (repeat_count INT)
BEGIN
DECLARE current_num INT;
SET current_num = 0;
WHILE current_num < repeat_count DO
INSERT INTO tb_defragment VALUES (current_num, 1, REPEAT('Abcdefg', 20), REPEAT('12345',5));
INSERT INTO tb_defragment VALUES (current_num+1, 2, REPEAT('HIJKLM', 20), REPEAT('67890',5));
INSERT INTO tb_defragment VALUES (current_num+2, 3, REPEAT('HIJKLM', 20), REPEAT('67890',5));
INSERT INTO tb_defragment VALUES (current_num+3, 4, REPEAT('HIJKLM', 20), REPEAT('67890',5));
SET current_num = current_num + 4;
END WHILE;
END//
DELIMITER ;
COMMIT;
SET autocommit=0;
CALL innodb_insert_proc(50000);
COMMIT;
SET autocommit=1;SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name = 'PRIMARY';
Value
313
SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name = 'ix1';
Value
72
SELECT COUNT(stat_value) FROM mysql.innodb_index_stats
WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_pages_freed');
COUNT(stat_value)
0
SELECT COUNT(stat_value) FROM mysql.innodb_index_stats
WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_page_split');
COUNT(stat_value)
0
SELECT COUNT(stat_value) FROM mysql.innodb_index_stats
WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_leaf_pages_defrag');
COUNT(stat_value)
0
SELECT table_name, data_free/1024/1024 AS data_free_MB, table_rows FROM information_schema.tables
WHERE engine LIKE 'InnoDB' AND table_name LIKE '%tb_defragment%';
TABLE_NAME data_free_MB table_rows
tb_defragment 4.00000000 50051
SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'PRIMARY';
TABLE_NAME index_name SUM(number_records) SUM(data_size)
`test`.`tb_defragment` PRIMARY 25873 4739939
SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'ix1';
TABLE_NAME index_name SUM(number_records) SUM(data_size)
`test`.`tb_defragment` ix1 50071 1051775DELETE FROM tb_defragment WHERE pk2 BETWEEN 2 AND 4;
OPTIMIZE TABLE tb_defragment;
TABLE Op Msg_type Msg_text
test.tb_defragment OPTIMIZE status OK
SHOW status LIKE '%innodb_def%';
Variable_name Value
Innodb_defragment_compression_failures 0
Innodb_defragment_failures 1
Innodb_defragment_count 4SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name = 'PRIMARY';
Value
0
SELECT COUNT(*) AS Value FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name = 'ix1';
Value
0
SELECT COUNT(stat_value) FROM mysql.innodb_index_stats
WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_pages_freed');
COUNT(stat_value)
2
SELECT COUNT(stat_value) FROM mysql.innodb_index_stats
WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_page_split');
COUNT(stat_value)
2
SELECT COUNT(stat_value) FROM mysql.innodb_index_stats
WHERE table_name LIKE '%tb_defragment%' AND stat_name IN ('n_leaf_pages_defrag');
COUNT(stat_value)
2
SELECT table_name, data_free/1024/1024 AS data_free_MB, table_rows FROM information_schema.tables
WHERE engine LIKE 'InnoDB';
TABLE_NAME data_free_MB table_rows
innodb_index_stats 0.00000000 8
innodb_table_stats 0.00000000 0
tb_defragment 4.00000000 12431
SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'PRIMARY';
TABLE_NAME index_name SUM(number_records) SUM(data_size)
`test`.`tb_defragment` PRIMARY 690 102145
SELECT table_name, index_name, SUM(number_records), SUM(data_size) FROM information_schema.innodb_buffer_page
WHERE table_name LIKE '%tb_defragment%' AND index_name LIKE 'ix1';
TABLE_NAME index_name SUM(number_records) SUM(data_size)
`test`.`tb_defragment` ix1 5295 111263ALTER TABLE table_name DISABLE KEYS;
BEGIN;
... inserting data WITH INSERT OR LOAD DATA ....
COMMIT;
ALTER TABLE table_name ENABLE KEYS;SET @@session.unique_checks = 0;
SET @@session.foreign_key_checks = 0;SET @@global.innodb_autoinc_lock_mode = 2;LOAD DATA INFILE 'file_name' INTO TABLE table_name;LOAD DATA LOCAL INFILE 'file_name' INTO TABLE table_name;mariadb-import --use-threads=10 database text-file-name [text-file-name...]BEGIN;
INSERT ...
INSERT ...
END;
BEGIN;
INSERT ...
INSERT ...
END;
...INSERT INTO table_name VALUES(1,"row 1"),(2, "row 2"),...;INSERT INTO table_name_1 (auto_increment_key, data) VALUES (NULL,"row 1");
INSERT INTO table_name_2 (auto_increment, reference, data) VALUES (NULL, LAST_INSERT_ID(), "row 2");delimiter ;;
SELECT 1; SELECT 2;;
delimiter ;EXPLAIN SELECT * FROM tbl WHERE tbl.key1 BETWEEN 1000 AND 2000;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| 1 | SIMPLE | tbl | range | key1 | key1 | 5 | NULL | 960 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+SET optimizer_switch='mrr=ON';
Query OK, 0 rows affected (0.06 sec)
EXPLAIN SELECT * FROM tbl WHERE tbl.key1 BETWEEN 1000 AND 2000;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------------------------------------+
| 1 | SIMPLE | tbl | range | key1 | key1 | 5 | NULL | 960 | Using index condition; Rowid-ordered scan |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------------------------------------+
1 row in set (0.03 sec)EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 1 | SIMPLE | t2 | ref | key1 | key1 | 5 | test.t1.col1 | 1 | |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
2 rows in set (0.00 sec)SET optimizer_switch='mrr=ON';
Query OK, 0 rows affected (0.06 sec)
SET join_cache_level=6;
Query OK, 0 rows affected (0.00 sec)
EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1;
+----+-------------+-------+------+---------------+------+---------+--------------+------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+--------------------------------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 1 | SIMPLE | t2 | ref | key1 | key1 | 5 | test.t1.col1 | 1 | Using join buffer (flat, BKA join); Rowid-ordered scan |
+----+-------------+-------+------+---------------+------+---------+--------------+------+--------------------------------------------------------+
2 rows in set (0.00 sec)EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 1 | SIMPLE | t2 | ref | key1 | key1 | 5 | test.t1.col1 | 1 | |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+SET optimizer_switch='mrr=ON,mrr_sort_keys=ON';
Query OK, 0 rows affected (0.00 sec)
SET join_cache_level=6;
Query OK, 0 rows affected (0.02 sec)
EXPLAIN SELECT * FROM t1,t2 WHERE t2.key1=t1.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
TABLE: t1
type: ALL
possible_keys: a
KEY: NULL
key_len: NULL
ref: NULL
ROWS: 1000
Extra: USING WHERE
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
TABLE: t2
type: ref
possible_keys: key1
KEY: key1
key_len: 5
ref: test.t1.col1
ROWS: 1
Extra: USING JOIN buffer (flat, BKA JOIN); KEY-ordered Rowid-ordered scan
2 rows in set (0.00 sec)+-----+------------+----------------+-----------+
| seq | last_name | first_name | term |
+-----+------------+----------------+-----------+
| 1 | Washington | George | 1789-1797 |
| 2 | Adams | John | 1797-1801 |
...
| 7 | Jackson | Andrew | 1829-1837 |
...
| 17 | Johnson | Andrew | 1865-1869 |
...
| 36 | Johnson | Lyndon B. | 1963-1969 |
...SELECT term
FROM Presidents
WHERE last_name = 'Johnson'
AND first_name = 'Andrew';SHOW CREATE TABLE Presidents \G
CREATE TABLE `presidents` (
`seq` TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT,
`last_name` VARCHAR(30) NOT NULL,
`first_name` VARCHAR(30) NOT NULL,
`term` VARCHAR(9) NOT NULL,
PRIMARY KEY (`seq`)
) ENGINE=InnoDB AUTO_INCREMENT=45 DEFAULT CHARSET=utf8
EXPLAIN SELECT term
FROM Presidents
WHERE last_name = 'Johnson'
AND first_name = 'Andrew';
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | Presidents | ALL | NULL | NULL | NULL | NULL | 44 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
# Or, using the other form of display: EXPLAIN ... \G
id: 1
select_type: SIMPLE
table: Presidents
type: ALL <-- Implies table scan
possible_keys: NULL
key: NULL <-- Implies that no index is useful, hence table scan
key_len: NULL
ref: NULL
rows: 44 <-- That's about how many rows in the table, so table scan
Extra: Using whereEXPLAIN SELECT term
FROM Presidents
WHERE last_name = 'Johnson'
AND first_name = 'Andrew' \G
select_type: SIMPLE
table: Presidents
type: ref
possible_keys: last_name, first_name
key: last_name
key_len: 92 <-- VARCHAR(30) utf8 may need 2+3*30 bytes
ref: const
rows: 2 <-- Two 'Johnson's
Extra: Using whereid: 1
select_type: SIMPLE
table: Presidents
type: index_merge
possible_keys: first_name,last_name
key: first_name,last_name
key_len: 92,92
ref: NULL
rows: 1
Extra: Using intersect(first_name,last_name); Using whereALTER TABLE Presidents
(DROP old indexes AND...)
ADD INDEX compound(last_name, first_name);
id: 1
select_type: SIMPLE
table: Presidents
type: ref
possible_keys: compound
key: compound
key_len: 184 <-- The length of both fields
ref: const,const <-- The WHERE clause gave constants for both
rows: 1 <-- Goodie! It homed in on the one row.
Extra: Using where... ADD INDEX covering(last_name, first_name, term);
id: 1
select_type: SIMPLE
table: Presidents
type: ref
possible_keys: covering
key: covering
key_len: 184
ref: const,const
rows: 1
Extra: Using where; Using index <-- NoteINDEX(last, first)
... WHERE last = '...' -- good (even though `first` is unused)
... WHERE first = '...' -- index is useless
INDEX(first, last), INDEX(last, first)
... WHERE first = '...' -- 1st index is used
... WHERE last = '...' -- 2nd index is used
... WHERE first = '...' AND last = '...' -- either could be used equally well
INDEX(last, first)
Both of these are handled by that one INDEX:
... WHERE last = '...'
... WHERE last = '...' AND first = '...'
INDEX(last), INDEX(last, first)
In light of the above example, don't bother including INDEX(last).SET GLOBAL thread_pool_size=32;[mariadb]
..
thread_handling=pool-of-threads
thread_pool_size=32DELETE FROM tbl WHERE
ts < CURRENT_DATE() - INTERVAL 30 DAYutf8mb3_key_column=utf8mb4_expressionIf your PRIMARY KEY is compound, the code gets messier.
This code will not work without a numeric PRIMARY or UNIQUE key.
Read on, we'll develop messier code to deal with most of these caveats.
You can edit the SQL generated by the stored procedure to tweak the output in a variety of ways. Or you can tweak the stored procedure to generate what you would prefer.
'Source' this into the mysql commandline tool:
Then do a CALL, like in the examples, below.
I thought about having several extra options for variations, but decided that would be too messy. Instead, here are instructions for implementing the variations, either by capturing the SELECT that was output by the Stored Procedure, or by modifying the SP, itself.
The data is strings (not numeric) -- Remove "SUM" (but keep the expression); remove the SUM...AS TOTAL line.
If you want blank output instead of 0 -- Currently the code says "SUM(IF(... 0))"; change the 0 to NULL, then wrap the SUM: IFNULL(SUM(...), ''). Note that this will distinguish between a zero total (showing '0') and no data (blank).
Fancier output -- Use PHP/VB/Java/etc.
No Totals at the bottom -- Remove the WITH ROLLUP line from the SELECT.
No Total for each row -- Remove the SUM...AS Total line from the SELECT.
Change the order of the columns -- Modify the ORDER BY 1 ('1' meaning first column) in the SELECT DISTINCT in the SP.
Example: ORDER BY FIND_IN_SET(DAYOFWEEK(...), 'Sun,Mon,Tue,Wed,Thu,Fri,Sat')
Notes about "base_cols":
Multiple columns on the left, such as an ID and its meaning -- This is already handled by allowing base_cols to be a commalist like 'id, meaning'
You cannot call the SP with "foo AS 'blah'" in hopes of changing the labels, but you could edit the SELECT to achieve that goal.
Notes about the "Totals":
If "base_cols" is more than one column, WITH ROLLUP will be subtotals as well as a grand total.
NULL shows up in the Totals row in the "base_cols" column; this can be changed via something like IFNULL(..., 'Totals').
Example 1 - Population vs Latitude in US
Notice how Alaska (AK) has populations in high latitudes and Hawaii (HI) in low latitudes.
Example 2 - Home Solar Power Generation
This give the power (KWh) generated by hour and month for 2012.
Other variations made the math go wrong. (Note that there is no CAST to FLOAT.)
While I was at it, I gave an alias to change "MONTH(ts)" to just "Month".
So, I edited the SQL to this and ran it:
-- Which gave cleaner output:
Midday in the summer is the best time for solar panels, as you would expect. 1-2pm in July was the best.
Posted, Feb. 2015
Brawley's notes Rick James graciously allowed us to use this article in the documentation.Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: pivot
This page is licensed: CC BY-SA / Gnu FDL
CREATE TABLE tbl
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ts TIMESTAMP,
...
PRIMARY KEY(id)@a = 0
LOOP
DELETE FROM tbl
WHERE id BETWEEN @a AND @a+999
AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
SET @a = @a + 1000
sleep 1 -- be a nice guy
UNTIL end of table@a = SELECT MIN(id) FROM tbl
LOOP
SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1
IF @z IS NULL
EXIT LOOP -- last chunk
DELETE FROM tbl
WHERE id >= @a
AND id < @z
AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
SET @a = @z
sleep 1 -- be a nice guy, especially in replication
ENDLOOP
# Last chunk:
DELETE FROM tbl
WHERE id >= @a
AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)...
SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1
IF @z == @a
SELECT @z := id FROM tbl WHERE id > @a ORDER BY id LIMIT 1
...LOOP
DELETE FROM tbl
WHERE ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
ORDER BY ts -- to use the index, and to make it deterministic
LIMIT 1000
UNTIL no rows deletedINDEX(Genus, species)
SELECT/DELETE ...
WHERE Genus >= '$g' AND ( species > '$s' OR Genus > '$g' )
ORDER BY Genus, species
LIMIT ...WHERE ( Genus = '$g' AND species > '$s' ) OR Genus > '$g' )CREATE TABLE new LIKE main;
INSERT INTO new SELECT * FROM main; -- This could take a long time
RENAME TABLE main TO old, new TO main; -- Atomic swap
DROP TABLE old; -- Space freed up here-- Optional: SET GLOBAL innodb_file_per_table = ON;
CREATE TABLE New LIKE Main;
-- Optional: ALTER TABLE New ADD PARTITION BY RANGE ...;
-- Do this INSERT..SELECT all at once, or with chunking:
INSERT INTO New
SELECT * FROM Main
WHERE ...; -- just the rows you want to keep
RENAME TABLE main TO Old, New TO Main;
DROP TABLE Old; -- Space freed up hereDELETE * FROM tbl ORDER BY date LIMIT 111DELETE * FROM tbl ORDER BY date, id LIMIT 111SELECT @z := ... LIMIT 1000,1; -- not replicated
DELETE ... BETWEEN @a AND @z; -- deterministicSET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;DELIMITER //
DROP PROCEDURE IF EXISTS Pivot //
CREATE PROCEDURE Pivot(
IN tbl_name VARCHAR(99), -- table name (or db.tbl)
IN base_cols VARCHAR(99), -- column(s) on the left, separated by commas
IN pivot_col VARCHAR(64), -- name of column to put across the top
IN tally_col VARCHAR(64), -- name of column to SUM up
IN where_clause VARCHAR(99), -- empty string or "WHERE ..."
IN order_by VARCHAR(99) -- empty string or "ORDER BY ..."; usually the base_cols
)
DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
-- Find the distinct values
-- Build the SUM()s
SET @subq = CONCAT('SELECT DISTINCT ', pivot_col, ' AS val ',
' FROM ', tbl_name, ' ', where_clause, ' ORDER BY 1');
-- select @subq;
SET @cc1 = "CONCAT('SUM(IF(&p = ', &v, ', &t, 0)) AS ', &v)";
SET @cc2 = REPLACE(@cc1, '&p', pivot_col);
SET @cc3 = REPLACE(@cc2, '&t', tally_col);
-- select @cc2, @cc3;
SET @qval = CONCAT("'\"', val, '\"'");
-- select @qval;
SET @cc4 = REPLACE(@cc3, '&v', @qval);
-- select @cc4;
SET SESSION group_concat_max_len = 10000; -- just in case
SET @stmt = CONCAT(
'SELECT GROUP_CONCAT(', @cc4, ' SEPARATOR ",\n") INTO @sums',
' FROM ( ', @subq, ' ) AS top');
select @stmt;
PREPARE _sql FROM @stmt;
EXECUTE _sql; -- Intermediate step: build SQL for columns
DEALLOCATE PREPARE _sql;
-- Construct the query and perform it
SET @stmt2 = CONCAT(
'SELECT ',
base_cols, ',\n',
@sums,
',\n SUM(', tally_col, ') AS Total'
'\n FROM ', tbl_name, ' ',
where_clause,
' GROUP BY ', base_cols,
'\n WITH ROLLUP',
'\n', order_by
);
select @stmt2; -- The statement that generates the result
PREPARE _sql FROM @stmt2;
EXECUTE _sql; -- The resulting pivot table ouput
DEALLOCATE PREPARE _sql;
-- For debugging / tweaking, SELECT the various @variables after CALLing.
END;
//
DELIMITER ;-- Sample input:
+-------+----------------------+---------+------------+
| state | city | lat | population |
+-------+----------------------+---------+------------+
| AK | Anchorage | 61.2181 | 276263 |
| AK | Juneau | 58.3019 | 31796 |
| WA | Monroe | 47.8556 | 15554 |
| WA | Spanaway | 47.1042 | 25045 |
| PR | Arecibo | 18.4744 | 49189 |
| MT | Kalispell | 48.1958 | 18018 |
| AL | Anniston | 33.6597 | 23423 |
| AL | Scottsboro | 34.6722 | 14737 |
| HI | Kaneohe | 21.4181 | 35424 |
| PR | Candelaria | 18.4061 | 17632 |
...
-- Call the Stored Procedure:
CALL Pivot('World.US', 'state', '5*FLOOR(lat/5)', 'population', '', '');
-- SQL generated by the SP:
SELECT state,
SUM(IF(5*FLOOR(lat/5) = "15", population, 0)) AS "15",
SUM(IF(5*FLOOR(lat/5) = "20", population, 0)) AS "20",
SUM(IF(5*FLOOR(lat/5) = "25", population, 0)) AS "25",
SUM(IF(5*FLOOR(lat/5) = "30", population, 0)) AS "30",
SUM(IF(5*FLOOR(lat/5) = "35", population, 0)) AS "35",
SUM(IF(5*FLOOR(lat/5) = "40", population, 0)) AS "40",
SUM(IF(5*FLOOR(lat/5) = "45", population, 0)) AS "45",
SUM(IF(5*FLOOR(lat/5) = "55", population, 0)) AS "55",
SUM(IF(5*FLOOR(lat/5) = "60", population, 0)) AS "60",
SUM(IF(5*FLOOR(lat/5) = "70", population, 0)) AS "70",
SUM(population) AS Total
FROM World.US GROUP BY state
WITH ROLLUP
-- Output from that SQL (also comes out of the SP):
+-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
| state | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 55 | 60 | 70 | Total |
+-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
| AK | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 60607 | 360765 | 4336 | 425708 |
| AL | 0 | 0 | 0 | 1995225 | 0 | 0 | 0 | 0 | 0 | 0 | 1995225 |
| AR | 0 | 0 | 0 | 595537 | 617361 | 0 | 0 | 0 | 0 | 0 | 1212898 |
| AZ | 0 | 0 | 0 | 4708346 | 129989 | 0 | 0 | 0 | 0 | 0 | 4838335 |
...
| FL | 0 | 34706 | 9096223 | 1440916 | 0 | 0 | 0 | 0 | 0 | 0 | 10571845 |
| GA | 0 | 0 | 0 | 2823939 | 0 | 0 | 0 | 0 | 0 | 0 | 2823939 |
| HI | 43050 | 752983 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 796033 |
...
| WY | 0 | 0 | 0 | 0 | 0 | 277480 | 0 | 0 | 0 | 0 | 277480 |
| NULL | 1792991 | 787689 | 16227033 | 44213344 | 47460670 | 61110822 | 7105143 | 60607 | 360765 | 4336 | 179123400 |
+-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+-- Sample input:
+---------------------+------+
| ts | enwh |
+---------------------+------+
| 2012-06-06 11:00:00 | 523 |
| 2012-06-06 11:05:00 | 526 |
| 2012-06-06 11:10:00 | 529 |
| 2012-06-06 11:15:00 | 533 |
| 2012-06-06 11:20:00 | 537 |
| 2012-06-06 11:25:00 | 540 |
| 2012-06-06 11:30:00 | 542 |
| 2012-06-06 11:35:00 | 543 |
Note that it is a reading in watts for each 5 minutes.
So, summing is needed to get the breakdown by month and hour.
-- Invoke the SP:
CALL Pivot('details', -- Table
'MONTH(ts)', -- `base_cols`, to put on left; SUM up over the month
'HOUR(ts)', -- `pivot_col` to put across the top; SUM up entries across the hour
'enwh/1000', -- The data -- watts converted to KWh
"WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year", -- Limit to one year
''); -- assumes that the months stay in order
-- The SQL generated:
SELECT MONTH(ts),
SUM(IF(HOUR(ts) = "5", enwh/1000, 0)) AS "5",
SUM(IF(HOUR(ts) = "6", enwh/1000, 0)) AS "6",
SUM(IF(HOUR(ts) = "7", enwh/1000, 0)) AS "7",
SUM(IF(HOUR(ts) = "8", enwh/1000, 0)) AS "8",
SUM(IF(HOUR(ts) = "9", enwh/1000, 0)) AS "9",
SUM(IF(HOUR(ts) = "10", enwh/1000, 0)) AS "10",
SUM(IF(HOUR(ts) = "11", enwh/1000, 0)) AS "11",
SUM(IF(HOUR(ts) = "12", enwh/1000, 0)) AS "12",
SUM(IF(HOUR(ts) = "13", enwh/1000, 0)) AS "13",
SUM(IF(HOUR(ts) = "14", enwh/1000, 0)) AS "14",
SUM(IF(HOUR(ts) = "15", enwh/1000, 0)) AS "15",
SUM(IF(HOUR(ts) = "16", enwh/1000, 0)) AS "16",
SUM(IF(HOUR(ts) = "17", enwh/1000, 0)) AS "17",
SUM(IF(HOUR(ts) = "18", enwh/1000, 0)) AS "18",
SUM(IF(HOUR(ts) = "19", enwh/1000, 0)) AS "19",
SUM(IF(HOUR(ts) = "20", enwh/1000, 0)) AS "20",
SUM(enwh/1000) AS Total
FROM details WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year GROUP BY MONTH(ts)
WITH ROLLUP
-- That generated decimal places that I did like:
| MONTH(ts) | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14
| 15 | 16 | 17 | 18 | 19 | 20 | Total |
+-----------+--------+---------+----------+----------+-----------+-----------+-----------+-----------+-----------+------
-----+-----------+----------+----------+----------+---------+--------+------------+
| 1 | 0.0000 | 0.0000 | 1.8510 | 21.1620 | 52.3190 | 73.0420 | 89.3220 | 97.0190 | 88.9720 | 75.
4970 | 50.9270 | 12.5130 | 0.5990 | 0.0000 | 0.0000 | 0.0000 | 563.2230 |
| 2 | 0.0000 | 0.0460 | 5.9560 | 35.6330 | 72.4710 | 96.5130 | 112.7770 | 126.0850 | 117.1540 | 96.
7160 | 72.5900 | 33.6230 | 4.7650 | 0.0040 | 0.0000 | 0.0000 | 774.3330 |SELECT MONTH(ts) AS 'Month',
ROUND(SUM(IF(HOUR(ts) = "5", enwh, 0))/1000) AS "5",
...
ROUND(SUM(IF(HOUR(ts) = "20", enwh, 0))/1000) AS "20",
ROUND(SUM(enwh)/1000) AS Total
FROM details WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 YEAR
GROUP BY MONTH(ts)
WITH ROLLUP;+-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+
| Month | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | Total |
+-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+
| 1 | 0 | 0 | 2 | 21 | 52 | 73 | 89 | 97 | 89 | 75 | 51 | 13 | 1 | 0 | 0 | 0 | 563 |
| 2 | 0 | 0 | 6 | 36 | 72 | 97 | 113 | 126 | 117 | 97 | 73 | 34 | 5 | 0 | 0 | 0 | 774 |
| 3 | 0 | 0 | 9 | 46 | 75 | 105 | 121 | 122 | 128 | 126 | 105 | 71 | 33 | 10 | 0 | 0 | 952 |
| 4 | 0 | 1 | 14 | 63 | 111 | 146 | 171 | 179 | 177 | 158 | 141 | 105 | 65 | 26 | 3 | 0 | 1360 |
| 5 | 0 | 4 | 21 | 78 | 128 | 162 | 185 | 199 | 196 | 187 | 166 | 130 | 81 | 36 | 8 | 0 | 1581 |
| 6 | 0 | 4 | 17 | 71 | 132 | 163 | 182 | 191 | 193 | 182 | 161 | 132 | 89 | 43 | 10 | 1 | 1572 |
| 7 | 0 | 3 | 17 | 57 | 121 | 160 | 185 | 197 | 199 | 189 | 168 | 137 | 92 | 44 | 11 | 1 | 1581 |
| 8 | 0 | 1 | 11 | 48 | 104 | 149 | 171 | 183 | 187 | 179 | 156 | 121 | 76 | 32 | 5 | 0 | 1421 |
| 9 | 0 | 0 | 6 | 32 | 77 | 127 | 151 | 160 | 159 | 148 | 124 | 93 | 47 | 12 | 1 | 0 | 1137 |
| 10 | 0 | 0 | 1 | 16 | 54 | 85 | 107 | 115 | 119 | 106 | 85 | 56 | 17 | 2 | 0 | 0 | 763 |
| 11 | 0 | 0 | 5 | 30 | 57 | 70 | 84 | 83 | 76 | 64 | 35 | 8 | 1 | 0 | 0 | 0 | 512 |
| 12 | 0 | 0 | 2 | 17 | 39 | 54 | 67 | 75 | 64 | 58 | 31 | 4 | 0 | 0 | 0 | 0 | 411 |
| NULL | 0 | 13 | 112 | 516 | 1023 | 1392 | 1628 | 1728 | 1703 | 1570 | 1294 | 902 | 506 | 203 | 38 | 2 | 12629 |
+-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+mrr_cost_based=on - enable cost-based choice whether to use MRR. Currently not recommended, because cost model is not sufficiently tuned yet.





And one of the following conditions is also met:
The entire thread pool has fewer than thread_pool_max_threads.
There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.
The entire thread pool has fewer than thread_pool_max_threads.
There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.
And one of the following conditions is also met:
There are more client connection requests in the thread group's work queue that the listener thread still needs to distribute to worker threads. In this case, the new thread is intended to be a worker thread.
There is currently no listener thread in the thread group. For example, if the thread_pool_dedicated_listener system variable is not enabled, then the thread group's listener thread can became a worker thread, so that it could handle some client connection request. In this case, the new thread can become the thread group's listener thread.
The entire thread pool has fewer than thread_pool_max_threads.
There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.
If the thread group already has active worker threads, then the following condition also needs to be met:
A worker thread has not been created for the thread group within the throttling interval.
And one of the following conditions is also met:
The entire thread pool has fewer than thread_pool_max_threads.
There are fewer than two threads in the thread group. This is to guarantee that each thread group can have at least two threads, even if thread_pool_max_threads has already been reached or exceeded.
A worker thread has not been created for the thread group within the throttling interval.
0-(1 + thread_pool_oversubscribe)
0
4-7
50 * THROTTLING_FACTOR
8-15
100 * THROTTLING_FACTOR
16-65536
20 * THROTTLING_FACTOR
Dynamic: Yes
Data Type: numeric
Default Value: 1
Range: 1 to 100000
Scope: Global
Dynamic: No
Data Type: numeric
Default Value: 0
When the no-threads mode is enabled, the server uses a single thread for all client connections, which is really only usable for debugging.
Command line: --thread-handling=name
Scope: Global
Dynamic: No
Data Type: enumeration
Default Value: one-thread-per-connection (non-Windows), pool-of-threads (Windows)
Valid Values: no-threads, one-thread-per-connection, pool-of-threads.
Documentation: Using the thread pool.
Notes: In MySQL, the thread pool is only available in MySQL Enterprise. In MariaDB it's available in all versions.
Scope:
Dynamic:
Data Type: boolean
Default Value: 0
Introduced: MariaDB 10.5.0
Dynamic:
Data Type: boolean
Default Value: 0
Introduced: MariaDB 10.5.0
thread-pool-idle-timeout=#Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 60
Documentation: Using the thread pool.
thread-pool-max-threads=#Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 65536
Range: 1 to 65536
Documentation: Using the thread pool.
thread-pool-min-threads=#Data Type: numeric
Default Value: 1
Documentation: Using the thread pool.
This system variable is only meaningful on Unix.
Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 3
Range: 1 to 65536
Documentation: Using the thread pool.
Dynamic: Yes
Data Type: numeric
Default Value: 1000
Range: 0 to 4294967295
Introduced: MariaDB 10.2.2
Documentation: Using the thread pool.
Default Value: auto
Valid Values: high, low, auto.
Introduced: MariaDB 10.2.2
Documentation: Using the thread pool.
--thread-pool-size=#Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: Based on the number of processors (but see MDEV-7806).
Range: 1 to 128
Documentation: Using the thread pool.
Note that if you are migrating from the MySQL Enterprise thread pool plugin, then the unit used in their implementation is 10ms, not 1ms.
Command line: --thread-pool-stall-limit=#
Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 500
Range: 1 to 4294967295
Documentation: Using the thread pool.
numericThis document discusses the creation and maintenance of "Summary Tables". It is a companion to the document on Data Warehousing Techniques.
The basic terminology ("Fact Table", "", etc) is covered in that document.
Summary tables are a performance necessity for large tables. MariaDB and MySQL do not provide any automated way to create such, so I am providing techniques here.
(Other vendors provide something similar with "materialized views".)
When you have millions or billions of rows, it takes a long time to summarize the data to present counts, totals, averages, etc, in a size that is readily digestible by humans. By computing and saving subtotals as the data comes in, one can make "reports" run much faster. (I have seen 10x to 1000x speedups.) The subtotals go into a "summary table". This document guides you on efficiency in both creating and using such tables.
A summary table includes two sets of columns:
Main KEY: date + some dimension(s)
Subtotals: COUNT(*), SUM(...), ...; but not AVG()
The "date" might be a DATE (a 3-byte native datatype), or an hour, or some other time interval. A 3-byte MEDIUMINT UNSIGNED 'hour' can be derived from a DATETIME or TIMESTAMP via
The "dimensions" (a DW term) are some of the columns of the "Fact" table. Examples: Country, Make, Product, Category, Host Non-dimension examples: Sales, Quantity, TimeSpent
There would be one or more indexes, usually starting with some dimensions and ending with the date field. By ending with the date, one can efficiently get a range of days/weeks/etc. even when each row summarizes only one day.
There will typically be a "few" summary tables. Often one summary table can serve multiple purposes sufficiently efficiently.
As a rule of thumb, a summary table will have one-tenth the number of rows as the Fact table. (This number is very loose.)
Let's talk about a large chain of car dealerships. The Fact table has all the sales with columns such as datetime, salesman_id, city, price, customer_id, make, model, model_year. One Summary table might focus on sales:
"Augment" in this section means to add new rows into the summary table or increment the counts in existing rows.
Plan A: "While inserting" rows into the Fact table, augment the summary table(s). This is simple, and workable for a smaller DW database (under 10 Fact table rows per second). For larger DW databases, Plan A likely to be too costly to be practical.
Plan B: "Periodically", via cron or an EVENT.
Plan C: "As needed". That is, when someone asks for a report, the code first updates the summary tables that will be needed.
Plan D: "Hybrid" of B and C. C, by itself, can led to long delays for the report. By also doing B, those delays can be kept low.
Plan E: (This is not advised.) "Rebuild" the entire summary table from the entire Fact table. The cost of this is prohibitive for large tables. However, Plan E may be needed when you decide to change the columns of a Summary Table, or discover a flaw in the computations. To lessen the impact of an entire build, adapt the chunking techniques in .
Plan F: "Staging table". This is primarily for very high speed ingestion. It is mentioned briefly in this blog, and discussed more thoroughly in the companion blog: High Speed Ingestion
IODKU (Insert On Duplicate Key Update) will update an existing row or create a new row. It knows which to do based on the Summary table's PRIMARY KEY.
Caution: This approach is costly, and will not scale to an ingestion rate of over, say, 10 rows per second (Or maybe 50/second on SSDs). More discussion later.
If your reports need to be up-to-the-second, you need "as needed" or "hybrid". If your reports have less urgency (eg, weekly reports that don't include 'today'), then "periodically" might be best.
For a daily summaries, augmenting the summary tables could be done right after midnight. But, beware of data coming "late".
For both "periodic" and "as needed", you need a definitive way of keeping track of where you "left off".
Case 1: You insert into the Fact table first and it has an AUTO_INCREMENT id: Grab MAX(id) as the upper bound for summarizing and put it either into some other secure place (an extra table), or put it into the row(s) in the Summary table as you insert them. (Caveat: AUTO_INCREMENT ids do not work well in multi-master, including Galera, setups.)
Case 2: If you are using a 'staging' table, there is no issue. (More on staging tables later.)
This applies to multi-row (batch) INSERT and LOAD DATA.
The Fact table needs an AUTO_INCREMENT id, and you need to be able to find the exact range of ids inserted. (This may be impractical in any multi-master setup.)
Then perform bulk summarization using
Load the data (via INSERTs or [LOAD DATA) en masse into a "staging table". Then perform batch summarization from the Staging table. And batch copy from the Staging table to the Fact table. Note that the Staging table is handy for batching "normalization" during ingestion. See also [
Let's say your summary table has a DATE, dy, and a dimension, foo. The question is: Should (foo, dy) be the PRIMARY KEY? Or a non-UNIQUE index?
Case 1: PRIMARY KEY (foo, dy) and summarization is in lock step with, say, changes in dy.
This case is clean and simple -- until you get to endcases. How will you handle the case of data arriving 'late'? Maybe you will need to recalculate some chunks of data? If so, how?
Case 2: (foo, dy) is a non-UNIQUE INDEX.
This case is clean and simple, but it can clutter the summary table because multiple rows can occur for a given (foo, dy) pair. The report will always have to up values because it cannot assume there is only one row, even when it is reporting on a single foo for a single dy. This forced-SUM is not really bad -- you should do it anyway; that way all your reports are written with one pattern.
Case 3: PRIMARY KEY (foo, dy) and summarization can happen anytime.
Since you should be using InnoDB, there needs to be an explicit PRIMARY KEY. One approach when you do not have a 'natural' PK is this:
This case pushes the complexity onto the summarization by doing a IODKU.
Advice? Avoid Case 1; too messy. Case 2 is ok if the extra rows are not too common. Case 3 may be the closest to "once size fits all".
When summarizing, include COUNT(*) AS ct and SUM(foo) AS sum_foo. When reporting, the "average" is computed as SUM(sum_foo) / SUM(ct). That is mathematically correct.
Exception... Let's say you are looking at weather temperatures. And you monitoring station gets the temp periodically, but unreliably. That is, the number of readings for a day varies. Further, you decide that the easiest way to compensate for the inconsistency is to do something like: Compute the avg temp for each day, then average those across the month (or other timeframe).
Formula for Standard Deviation:
Where sum_foo2 is SUM(foo * foo) from the summary table. sum_foo and sum_foo2 should be FLOAT. FLOAT gives you about 7 significant digits, which is more than enough for things like average and standard deviation. FLOAT occupies 4 bytes. DOUBLE would give you more precision, but occupies 8 bytes. INT and BIGINT are not practical because they may lead to complaints about overflow.
The idea here is to first load a set of Fact records into a "staging table", with the following characteristics (at least):
The table is repeatedly populated and truncated
Inserts could be individual or batched, and from one or many clients
SELECTs will be table scans, so no indexes needed
Inserting will be fast (InnoDB may be the fastest)
If you have bulk inserts (Batch INSERT or LOAD DATA) then consider doing the normalization and summarization immediately after each bulk insert.
More details: High Speed Ingestion
Here is a more complex way to design the system, with the goal of even more scaling.
Use master-slave setup: ingest into master; report from slave(s).
Feed ingestion through a staging table (as described above)
Single-source of data: ENGINE=MEMORY; multiple sources: InnoDB
Explanation and comments:
ROW + ignore_db avoids replicating Staging, yet replicates the INSERTs based on it. Hence, it lightens the write load on the Slaves
If using MEMORY, remember that it is volatile -- recover from a crash by starting the ingestion over.
To aid with debugging, TRUNCATE or re-CREATE Staging at the start of the next cycle.
Staging needs no indexes -- all operations read all rows from it.
Stats on the system that this 'extreme design' came from: Fact Table: 450GB, 100M rows/day (batch of 4M/hour), 60 day retention (60+24 partitions), 75B/row, 7 summary tables, under 10 minutes to ingest and summarize the hourly batch. The INSERT..SELECT handled over 20K rows/sec going into the Fact table. Spinning drives (not SSD) with RAID-10.
One technique involves summarizing some of the data, then recording where you "left off", so that next time, you can start there. There are some subtle issues with "left off" that you should be cautious of.
If you use a DATETIME or TIMESTAMP as "left off", beware of multiple rows with the same value.
Plan A: Use a compound "left off" (eg, TIMESTAMP + ID). This is messy, error prone, etc.
Plan B: WHERE ts >= $left_off AND ts < $max_ts -- avoids dups, but has other problems (below)
Separate threads could COMMIT TIMESTAMPs out of order.
If you use an AUTO_INCREMENT as "left off" beware of:
In InnoDB, separate threads could COMMIT ids in the 'wrong' order.
Multi-master (including Galera and InnoDB Cluster), could lead to ordering issues.
So, nothing works, at least not in a multi-threaded environment?
If you can live with an occasional hiccup (skipped record), then maybe this is 'not a problem' for you.
The "Flip-Flop Staging" is a safe alternative, optionally combined with the "Extreme Design".
If you have many threads simultaneously INSERTing into one staging table, then here is an efficient way to handle a large load: Have a process that flips that staging table with another, identical, staging table, and performs bulk normalization, Fact insertion, and bulk summarization.
The flipping step uses a fast, atomic, RENAME.
Here is a sketch of the code:
Meanwhile, ingestion can continue writing to Staging. The ingestion INSERTs will conflict with the RENAME, but will be resolved gracefully and silently and quickly.
How fast should you flip-flop? Probably the best scheme is to
Have a job that flip-flops in a tight loop (no delay, or a small delay, between iterations), and
Have a CRON that serves only as a "keep-alive" to restart the job if it dies.
If Staging is 'big', an iteration will take longer, but run more efficiently. Hence, it is self-regulating.
In a (or InnoDB Cluster?) environment, each node could be receiving input. If can afford to loose a few rows, have Staging be a non-replicated MEMORY table. Otherwise, have one Staging per node and be InnoDB; it will be more secure, but slower and not without problems. In particular, if a node dies completely, you somehow need to process its Staging table.
Look at the reports you will need.
Design a summary table for each.
Then look at the summary tables -- you are likely to find some similarities.
Merge similar ones.
To look at what a report needs, look at the WHERE clause that would provide the data. Some examples, assuming data about service records for automobiles: The GROUP BY to gives a clue of what the report might be about.
WHERE make = ? AND model_year = ? GROUP BY service_date, service_type
WHERE make = ? AND model = ? GROUP BY service_date, service_type
WHERE service_type = ? GROUP BY make, model, service_date
WHERE service_date between ? and ? GROUP BY make, model, model_year
You need to allow for 'ad hoc' queries? Well, look at all the ad hoc queries -- they all have a date range, plus nail down one or two other things. (I rarely see something as ugly as '%CL%' for nailing down another dimension.) So, start by thinking of date plus one or two other dimensions as the 'key' into a new summary table. Then comes the question of what data might be desired -- counts, sums, etc. Eventually you have a small set of summary tables. Then build a front end to allow them to pick only from those possibilities. It should encourage use of the existing summary tables, not be truly 'open ended'.
Later, another 'requirement' may surface. So, build another summary table. Of course, it may take a day to initially populate it.
Does one ever need to summarize a summary table? Yes, but only in extreme situations. Usually a 'weekly' report can be derived from a 'daily' summary table; building a separate weekly summary table not being worth the effort.
Would one ever PARTITION a Summary Table? Yes, in extreme situations, such as the table being large, and
Need to purge old data (unlikely), or
Recent' data is usually requested, and the index(es) fail to prevent table scans (rare). ("Partition pruning" to the rescue.)
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
Examples
This page is licensed: CC BY-SA / Gnu FDL
This document discusses techniques for improving performance for data-warehouse-like tables in MariaDB and MySQL.
How to load large tables.
.
Developing 'summary tables' to make 'reports' efficient.
Purging old data.
Details on summary tables is covered in the companion document: .
This list mirrors "Data Warehouse" terminology.
Fact table -- The one huge table with the 'raw' data.
Summary table -- a redundant table of summarized data that could -- use for efficiency
Dimension -- columns that identify aspects of the dataset (region, country, user, SKU, zipcode, ...)
Normalization table (dimension table) -- mapping between strings an ids; used for space and speed.
Techniques that should be applied to the huge Fact table.
id INT/BIGINT UNSIGNED NOT NULL AUTO_INCREMENT
PRIMARY KEY (id)
Probably no other INDEXes
Accessed only via id
There are exceptions where the Fact table must be accessed to retrieve multiple rows. However, you should minimize the number of INDEXes on the table because they are likely to be costly on INSERT.
Once you have built the Summary table(s), there is not much need for the Fact table. One option that you should seriously consider is to not have a Fact table. Or, at least, you could purge old data from it sooner than you purge the Summary tables. Maybe even keep the Summary tables forever.
Case 1: You need to find the raw data involved in some event. But how will you find those row(s)? This is where a secondary index may be required.
If a secondary index is bigger than can be cached in RAM, and if the column(s) being indexed is random, then each row inserted may cause a disk hit to update the index. This limits insert speed to something like 100 rows per second (on ordinary disks). Multiple random indexes slow down insertion further. RAID striping and/or SSDs speed up insertion. Write caching helps, but only for bursts.
Case 2: You need some event, but you did not plan ahead with the optimal INDEX. Well, if the data is PARTITIONed on date, so even if you have a clue of when the event occurred, "partition pruning" will keep the query from being too terribly slow.
Case 3: Over time, the application is likely to need new 'reports', which may lead to a new Summary table. At this point, it would be handy to scan through the old data to fill up the new table.
Case 4: You find a flaw in the summarization, and need to rebuild an existing Summary table.
Cases 3 and 4 both need the "raw" data. But they don't necessarily need the data sitting in a database table. It could be in the pre-database format (such as log files). So, consider not building the Fact table, but simply keep the raw data, comressed, on some file system.
When talking about billions of rows in the Fact table, it is essentially mandatory that you "batch" the inserts. There are two main ways:
INSERT INTO Fact (.,.,.) VALUES (.,.,.), (.,.,.), ...; -- "Batch insert"
LOAD DATA ...;
A third way is to INSERT or LOAD into a Staging table, then
INSERT INTO Fact SELECT * FROM Staging; This INSERT..SELECT allows you to do other things, such as normalization. More later.
Chunk size should usually be 100-1000 rows.
100-1000 an insert will run 10 times as fast as single-row inserts.
Beyond 100, you may be interfering replication and SELECTs.
Beyond 1000, you are into diminishing returns -- virtually no further performance gains.
Don't go past, say, 1MB for the constructed INSERT statement. This deals with packet sizes, etc. (1MB is unlikely to be hit for a Fact table.) Decide whether your application should lean toward the 100 or the 1000.
If your data is coming in continually, and you are adding a batching layer, let's do some math. Compute your ingestion rate -- R rows per second.
If R < 10 (= 1M/day = 300M/year) -- single-row INSERTs would probably work fine (that is, batching is optional)
If R < 100 (3B records per year) -- secondary indexes on Fact table may be ok
If R < 1000 (100M records/day) -- avoid secondary indexes on Fact table.
If R > 1000 -- Batching may not work. Decide how long (S seconds) you can stall loading the data in order to collect a batch of rows.
If batching seems viable, then design the batching layer to gather for S seconds or 100-1000 rows, whichever comes first.
(Note: Similar math applies to rapid UPDATEs of a table.)
Normalization is important in Data Warehouse applications because it significantly cuts down on the disk footprint and improves performance. There are other reasons for normalizing, but space is the important one for DW.
Here is a typical pattern for a Dimension table:
Notes:
MEDIUMINT is 3 bytes with UNSIGNED range of 0..16M; pick SMALLINT, INT, etc, based on a conservative estimate of how many 'foo's you will eventually have.
datatype sizes
There may be more than one VARCHAR in the table. Example: For cities, you might have City and Country.
InnoDB is better than MyISAM because of way the two keys are structured.
I bring this up as a separate topic because of some of the subtle issues that can happen.
You may be tempted to do
It has the problem of "burning" AUTO_INCREMENT ids. This is because MariaDB pre-allocates ids before getting to "IGNORE". That could rapidly increase the AUTO_INCREMENT values beyond what you expected.
Better is this...
Notes:
The LEFT JOIN .. IS NULL finds the foos that are not yet in Foos.
This INSERT..SELECT must not be done inside the transaction with the rest of the processing. Otherwise, you add to deadlock risks, leading to burned ids.
IGNORE is used in case you are doing the INSERT from multiple processes simultaneously.
Once that INSERT is done, this will find all the foo_ids it needs:
An advantage of "Batched Normalization" is that you can summarize directly from the Staging table. Two approaches:
Case 1: PRIMARY KEY (dy, foo) and summarization is in lock step with, say, changes in dy.
This approach can have troubles if new data arrives after you have summarized the day's data.
Case 2: (dy, foo) is a non-UNIQUE INDEX.
Same code as Case 1.
By having the index be non-UNIQUE, delayed data simply shows up as extra rows.
You need to take care to avoid summarizing the data twice. (The id on the Fact table may be a good tool for that.)
Case 3: PRIMARY KEY (dy, foo) and summarization can happen anytime.
This document lists a number of ways to do things. Your situation may lead to one approach being more/less acceptable. But, if you are thinking "Just tell me what to do!", then here:
Batch load the raw data into a temporary table (Staging).
Normalize from Staging -- use code in Case 3.
INSERT .. SELECT to move the data from Staging into the Fact table
Those techniques should perform well and scale well in most cases. As you develop your situation, you may discover why I described alternative solutions.
Typically the Fact table is PARTITION BY RANGE (10-60 ranges of days/weeks/etc) and needs purging (DROP PARTITION) periodically. This discusses a safe/clean way to design the partitioning and do the DROPs: Purging PARTITIONs
For "read scaling", backup, and failover, use master-slave replication or something fancier. Do ingestion only on a single active master; it replicate to the slave(s). Generate reports on the slave(s).
"Sharding" is the splitting of data across multiple servers. (In contrast, and have the same data on all servers, requiring all data to be written to all servers.)
With the non-sharding techniques described here, terabyte(s) of data can be handled by a single machine. Tens of terabytes probably requires sharding.
Sharding is beyond the scope of this document.
With the techniques described here, you may be able to achieve the following performance numbers. I say "may" because every data warehouse situation is different, and you may require performance-hurting deviations from what I describe here. I give multiple options for some aspects; these may cover some of your deviations.
One big performance killer is UUID/GUID keys. Since they are very 'random', updates of them (at scale) are limited to 1 row = 1 disk hit. Plain disks can handle only 100 hits/second. RAID and/or SSD can increase that to something like 1000 hits/sec. Huge amounts of RAM (for caching the random index) are a costly solution. It is possible to turn type-1 UUIDs into roughly-chronological keys, thereby mittigating the performance problems if the UUIDs are written/read with some chronological clustering. UUID discussion
Hardware, etc:
Single SATA drive: 100 IOPs (Input/Output operations per second)
RAID with N physical drives -- 100*N IOPs (roughly)
SSD -- 5 times as fast as rotating media (in this context)
Batch INSERT -- 100-1000 rows is 10 times as fast as INSERTing 1 row at a time (see above)
"Count the disk hits" -- back-of-envelope performance analysis
Random accesses to a table/index -- count each as a disk hit.
At-the-end accesses (INSERT chronologically or with AUTO_INCREMENT; range SELECT) -- count as zero hits.
In between (hot/popular ids, etc) -- count as something in between
For INSERTs, do the analysis on each index; add them up.
More on Count the Disk Hits
Look at your data; compute raw rows per second (or hour or day or year). There are about 30M seconds in a year; 86,400 seconds per day. Inserting 30 rows per second becomes a billion rows per year.
10 rows per second is about all you can expect from an ordinary machine (after allowing for various overheads). If you have less than that, you don't have many worries, but still you should probably create Summary tables. If more than 10/sec, then batching, etc, becomes vital. Even on spiffy hardware, 100/sec is about all you can expect without utilizing the techniques here.
Let's say your insert rate is only one-tenth of your disk IOPs (eg, 10 rows/sec vs 100 IOPs). Also, let's say your data is not "bursty"; that is, the data comes in somewhat soothly throughout the day.
Note that 10 rows/sec (300M/year) implies maybe 30GB for data + indexes + normalization tables + summary tables for 1 year. I would call this "not so big".
Still, the and summarization are important. Normalization keeps the data from being, say, twice as big. Summarization speeds up the reports by orders of magnitude.
Let's design and analyse a "simple ingestion scheme" for 10 rows/second, without 'batching'.
Depending on the number and randomness of your indexes, etc, 10 Fact rows may (or may not) take less than 100 IOPs.
Also, note that as the data grows over time, random indexes will become less and less likely to be cached. That is, even if runs fine with 1 year's worth of data, it may be in trouble with 2 year's worth.
For those reasons, I started this discussion with a wide margin (10 rows versus 100 IOPs).
Summary Tables
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
You want to find the largest row in each group of rows. An example is looking for the largest city in each state. While it is easy to find the MAX(population) ... GROUP BY state, it is hard to find the name of the city associated with that population. Alas, MySQL and MariaDB do not have any syntax to provide the solution directly.
This article is under construction, mostly for cleanup. The content is reasonably accurate during construction.
The article presents two "good" solutions. They differ in ways that make neither of them 'perfect'; you should try both and weigh the pros and cons.
Also, a few "bad" solutions will be presented, together with why they were rejected.
MySQL manual gives 3 solutions; only the "Uncorrelated" one is "good", the other two are "bad".
To show how the various coding attempts work, I have devised this simple task: Find the largest city in each Canadian province. Here's a sample of the source data (5493 rows):
Here's the desired output (13 rows):
One thing to consider is whether you want -- or do not want -- to see multiple rows for tied winners. For the dataset being used here, that would imply that the two largest cities in a province had identical populations. For this case, a duplicate would be unlikely. But there are many groupwise-max use cases where duplictes are likely.
The two best algorithms differ in whether they show duplicates.
Characteristics:
Superior performance or medium performance
It will show duplicates
Needs an extra index
Probably requires 5.6
An 'uncorrelated subquery':
But this also 'requires' an extra index: INDEX(province, population). In addition, MySQL has not always been able to use that index effectively, hence the "requires 5.6". (I am not sure of the actual version.)
Without that extra index, you would need 5.6, which has the ability to create indexes for subqueries. This is indicated by <auto_key0> in the EXPLAIN. Even so, the performance is worse with the auto-generated index than with the manually generated one.
With neither the extra index, nor 5.6, this 'solution' would belong in 'The Duds' because it would run in O(N*N) time.
Characteristics:
Good performance
Does not show duplicates (picks one to show)
Consistent O(N) run time (N = number of input rows)
Only one scan of the data
For your application, change the lines with comments.
*** 'correlated subquery' (from MySQL doc):**
O(N*N) (that is, terrible) performance
*** LEFT JOIN (from MySQL doc):**
Medium performance (2N-3N, depending on join_buffer_size).
For O(N*N) time,... It will take one second to do a groupwise-max on a few thousand rows; a million rows could take hours.
This is a variant on "groupwise-max" wherein you desire the largest (or smallest) N items in each group. Do these substitutions for your use case:
province --> your 'GROUP BY'
Canada --> your table
3 --> how many of each group to show
population --> your numeric field for determining "Top-N"
Output:
The performance of this is O(N), actually about 3N, where N is the number of source rows.
EXPLAIN EXTENDED gives
Explanation, shown in the same order as the EXPLAIN, but numbered chronologically: 3. Get the subquery id=2 (init) 4. Scan the output from subquery id=3 (x) 2. Subquery id=3 -- the table scan of Canada
Subquery id=2 -- init for simply initializing the two @variables
Yes, it took two sorts, though probably in RAM.
Main Handler values:
This variant is faster than the previous, but depends on city being unique across the dataset. (from openark.org)
Output. Note how there can be more than 3 cities per province:
Main Handler values:
(This does not need your table to be MyISAM, but it does need MyISAM tmp table for its 2-column PRIMARY KEY feature.) See previous section for what changes to make for your use case.
The main handler values (total of all operations):
Both "Top-n" formulations probably take about the same amount of time.
Hot off the press from Percona Live... has "windowing functions", which make "groupwise max" much more straightforward.
The code: TBD
Developed a first posted, Feb, 2015; Add MyISAM approach: July, 2015; Openark's method added: Apr, 2016; Windowing: Apr 2016
I did not include the technique(s) using GROUP_CONCAT. They are useful in some situations with small datasets. They can be found in the references below.
This has some of these algorithms, plus some others:
Other references:
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
This article describes the different methods MariaDB provides to limit/timeout a query:
The LIMIT clause restricts the number of returned rows.
Stops the query after 'rows_limit' number of rows have been examined.
If the variable is set, one can't execute an or
statement unless one specifies a key constraint in the WHERE clause or provide a LIMIT clause (or both).
acts as an automatic LIMIT row_count to any query.
The above is the same as:
If the variable (also called sql_max_join_size) is set, then it will limit
any SELECT statements that probably need to examine more thanMAX_JOIN_SIZE rows.
If the variable is set, any query (excluding stored procedures) taking longer than the value of max_statement_time (specified in seconds) to execute will be aborted. This can be set globally, by session, as well as per user and per query. See .
variable
This page is licensed: CC BY-SA / Gnu FDL
The task of scalable server software (and a DBMS like MariaDB is an example of such software) is to maintain top performance with an increasing number of clients. MySQL traditionally assigned a thread for every client connection, and as the number of concurrent users grows this model shows performance drops. Many active threads are a performance killer, because increasing the number of threads leads to extensive context switching, bad locality for CPU caches, and increased contention for hot locks. An ideal solution that would help to reduce context switching is to maintain a lower number of threads than the number of clients. But this number should not be too low either, since we also want to utilize CPUs to their fullest, so ideally, there should be a single active thread for each CPU on the machine.
thread_group_id = connection_id % thread_pool_sizeTHROTTLING_FACTOR = thread_pool_stall_limit / MAX (500,thread_pool_stall_limit)SET GLOBAL thread_pool_stall_limit=300;[mariadb]
..
thread_handling=pool-of-threads
thread_pool_size=32
thread_pool_stall_limit=300SET GLOBAL thread_pool_oversubscribe=10;[mariadb]
..
thread_handling=pool-of-threads
thread_pool_size=32
thread_pool_stall_limit=300
thread_pool_oversubscribe=10CREATE TABLE users (
user_name_mb4 VARCHAR(100) COLLATE utf8mb4_general_ci,
...
);CREATE TABLE orders (
user_name_mb3 VARCHAR(100) COLLATE utf8mb3_general_ci,
...,
INDEX idx1(user_name_mb3)
);SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;CONVERT(orders.user_name_mb3 USING utf8mb4) = users.user_name_mb4EXPLAIN SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
+------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | |
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 10330 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+SET optimizer_switch='cset_narrowing=ON';
EXPLAIN SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
+------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 1 | SIMPLE | orders | ref | idx1 | idx1 | 303 | users.user_name_mb4 | 1 | Using index condition |
+------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+SET optimizer_switch='cset_narrowing=ON';SELECT ... LIMIT row_count
OR
SELECT ... LIMIT OFFSET, row_count
OR
SELECT ... LIMIT row_count OFFSET OFFSETSELECT ... LIMIT ROWS EXAMINED rows_limit;SET @@SQL_SAFE_UPDATES=1
UPDATE tbl_name SET not_key_column=val;
-> ERROR 1175 (HY000): You are using safe update mode
and you tried to update a table without a WHERE that uses a KEY columnSET @@SQL_SELECT_LIMIT=1000
SELECT * FROM big_table;SELECT * FROM big_table LIMIT 1000;SET @@MAX_JOIN_SIZE=1000;
->ERROR 1104 (42000): The SELECT would examine more than MAX_JOIN_SIZE ROWS;
SELECT COUNT(null_column) FROM big_table;
CHECK your WHERE AND USE SET SQL_BIG_SELECTS=1 OR SET MAX_JOIN_SIZE=# IF the SELECT IS okayNormalization can be done in bulk, hence efficiently
Copying to the Fact table will be fast
Summarization can be done in bulk, hence efficiently
"Bursty" ingestion is smoothed by this process
Flip-flop a pair of Staging tables
Use binlog_ignore_db to avoid replicating staging -- necessitating putting it in a separate database.
Do the summarization from Staging
Load Fact via INSERT INTO Fact ... SELECT FROM Staging ...
Normalization -- The process of building the mapping ('New York City' <-> 123)
ENGINE = InnoDB
All "reports" use summary tables, not the Fact table
Summary tables may be populated from ranges of id (other techniques described below)
If S < 0.1s -- May not be able to keep up
The secondary key is effectively (email_id, email), hence 'covering' for certain queries.
It is OK to not specify an AUTO_INCREMENT to be UNIQUE.
Staging to Summary table(s) via IODKU (Insert ... On Duplicate Key Update).Drop the Staging
Purge "old" data -- Do not use DELETE or TRUNCATE, design so you can use DROP PARTITION (see above)
Think of each INDEX (except the PRIMARY KEY on InnoDB) as a separate table
Consider access patterns of each table/index: random vs at-the-end vs something in between
For SELECTs, do the analysis on the one index used, plus the table. (Use of 2 indexes is rare.) Insert cost, based on datatype of first column in an index:
AUTO_INCREMENT -- essentially 0 IOPs
DATETIME, TIMESTAMP -- essentially 0 for 'current' times
UUID/GUID -- 1 per insert (terrible)
Others -- depends on their patterns SELECT cost gets a little tricky:
Range on PRIMARY KEY -- think of it as getting 100 rows per disk hit.
IN on PRIMARY KEY -- 1 disk hit per item in IN
"=" -- 1 hit (for 1 row)
Secondary key -- First compute the hits for the index, then...
Think of each row as needing 1 disk hit.
However, if the rows are likely to be 'near' each other (based on the PRIMARY KEY), then it could be < 1 disk hit/row.
Change the SELECT and ORDER BY if you desire
DESC to get the 'largest'; ASC for the 'smallest'
Adding a large LIMIT to a subquery may make things work.
MariaDB has a dynamic and adaptive thread pool, aimed at optimizing resource utilization and preventing deadlocks.
For example, a thread may depend on another thread's completion, and they may block each other via locks and/or I/O. It is hard, and sometimes impossible, to predict how many threads are ideal or even sufficient to prevent deadlocks in every situation. MariaDB implements a dynamic and adaptive pool that takes care of creating new threads in times of high demand, and retiring threads if they have nothing to do. This is a complete reimplementation of the legacy pool-of-threads scheduler, with the following goals:
Make the pool dynamic, so that it will grow and shrink whenever required.
Minimize the amount of overhead that is required to maintain the thread pool itself.
Make the best use of underlying OS capabilities. For example, if a native thread pool implementation is available, it should be used. If not, the best I/O multiplexing method should be used.
Limit the resources used by threads.
There are currently two different low-level implementations – depending on OS. One implementation is designed specifically for Windows which utilizes a native CreateThreadpool API. The second implementation is primarily intended to be used in Unix-like systems. Because the implementations are different, some system variables differ between Windows and Unix.
Thread pools are most efficient in situations where queries are relatively short and the load is CPU-bound, such as in OLTP workloads. If the workload is not CPU-bound, then you might still benefit from limiting the number of threads to save memory for the database memory buffers.
There are special, rare cases where the thread pool is likely to be less efficient.
If you have a very bursty workload, then the thread pool may not work well for you. These tend to be workloads in which there are long periods of inactivity followed by short periods of very high activity by many users. These also tend to be workloads in which delays cannot be tolerated, so the throttling of thread creation that the thread pool uses is not ideal. Even in this situation, performance can be improved by tweaking how often threads are retired. For example, with thread_pool_idle_timeout on Unix, or with thread_pool_min_threads on Windows.
If you have many concurrent, long, non-yielding queries, then the thread pool may not work well for you. In this context, a "non-yielding" query is one that never waits or which does not indicate waits to the thread pool. These kinds of workloads are mostly used in data warehouse scenarios. Long-running, non-yielding queries will delay execution of other queries. However, the thread pool has stall detection to prevent them from totally monopolizing the thread pool. See Thread Groups in the Unix Implementation of the Thread Pool: Thread Group Stalls for more information. Even when the whole thread pool is blocked by non-yielding queries, you can still connect to the server through the extra-port TCP/IP port.
If you rely on the fact that simple queries always finish quickly, no matter how loaded your database server is, then the thread pool may not work well for you. When the thread pool is enabled on a busy server, even simple queries might be queued to be executed later. This means that even if the statement itself doesn't take much time to execute, even a simple SELECT 1, might take a bit longer when the thread pool is enabled than with one-thread-per-connection if it gets queued.
The thread_handling system variable is the primary system variable that is used to configure the thread pool.
There are several other system variables as well, which are described in the sections below. Many of the system variables documented below are dynamic, meaning that they can be changed with SET GLOBAL on a running server.
Generally, there is no need to tweak many of these system variables. The goal of the thread pool was to provide good performance out-of-the box. However, the system variable values can be changed, and we intended to expose as many knobs from the underlying implementation as we could. Feel free to tweak them as you see fit.
If you find any issues with any of the default behavior, then we encourage you to .
See Thread Pool System and Status Variables for the full list of the thread pool's system variables.
On Unix, if you would like to use the thread pool, then you can use the thread pool by setting the thread_handling system variable to pool-of-threads in a server option group in an option file prior to starting up the server. For example:
The following system variables can also be configured on Unix:
thread_pool_size – The number of thread groups in the thread pool, which determines how many statements can execute simultaneously. The default value is the number of CPUs on the system. When setting this system variable's value at system startup, the max value is 100000. However, it is not a good idea to set it that high. When setting this system variable's value dynamically, the max value is either 128 or the value that was set at system startup--whichever value is higher. See Thread Groups in the Unix Implementation of the Thread Pool for more information.
thread_pool_max_threads – The maximum number of threads in the thread pool. Once this limit is reached, no new threads will be created in most cases. In rare cases, the actual number of threads can slightly exceed this, because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks. The default value is 65536.
– The number of milliseconds between each stall check performed by the timer thread. The default value is 500. Stall detection is used to prevent a single client connection from monopolizing a thread group. When the timer thread detects that a thread group is stalled, it wakes up a sleeping worker thread in the thread group, if one is available. If there isn't one, then it creates a new worker thread in the thread group. This temporarily allows several client connections in the thread group to run in parallel. However, note that the timer thread will not create a new worker thread if the number of threads in the thread pool is already greater than or equal to the maximum defined by the variable, unless the thread group does not already have a listener thread. See for more information.
– Determines how many worker threads in a thread group can remain active at the same time once a thread group is oversubscribed due to stalls. The default value is 3. Usually, a thread group only has one active worker thread at a time. However, the timer thread can add more active worker threads to a thread group if it detects a stall. There are trade-offs to consider when deciding whether to allow only one thread per CPU to run at a time, or whether to allow more than one thread per CPU to run at a time. Allowing only one thread per CPU means that the thread can have unrestricted access to the CPU while its running, but it also means that there is additional overhead from putting threads to sleep or waking them up more frequently. Allowing more than one thread per CPU means that the threads have to share the CPU, but it also means that there is less overhead from putting threads to sleep or waking them up. This is primarily for internal use, and it is not meant to be changed for most users. See for more information.
– The number of seconds before an idle worker thread exits. The default value is 60. If there is currently no work to do, how long should an idle thread wait before exiting?
The Windows implementation of the thread pool uses a native thread pool created with the CreateThreadpool API.
On Windows, if you would like to use the thread pool, then you do not need to do anything, because the default for the thread_handling system variable is already preset to pool-of-threads.
However, if you would like to use the old one thread per-connection behavior on Windows, then you can use that by setting the thread_handling system variable to one-thread-per-connection in a server option group in an option file prior to starting up the server. For example:
On older versions of Windows, such as XP and 2003, pool-of-threads is not
implemented, and the server will silently switch to using the legacyone-thread-per-connection method.
The native CreateThreadpool API allows applications to set the minimum and maximum number of threads in the pool. The following system variables can be used to configure those values on Windows:
thread_pool_min_threads – The minimum number of threads in the pool. Default is 1. This applicable in a special case of very “bursty” workloads. Imagine having longer periods of inactivity after periods of high activity. While the thread pool is idle, Windows would decide to retire pool threads (based on experimentation, this seems to happen after thread had been idle for 1 minute). Next time high load comes, it could take some milliseconds or seconds until the thread pool size stabilizes again to optimal value. To avoid thread retirement, one could set the parameter to a higher value.
thread_pool_max_threads – The maximum number of threads in the pool. Threads are not created when this value is reached. The default is 1000. This parameter can be used to prevent the creation of new threads if the pool can have short periods where many or all clients are blocked (for example, with FLUSH TABLES WITH READ LOCK, high contention on row locks, or similar). New threads are created if a blocking situation occurs (such as after a throttling interval), but sometimes you want to cap the number of threads, if you’re familiar with the application and need to, for example, save memory. If your application constantly pegs at 500 threads, it might be a strong indicator for high contention in the application, and the thread pool does not help much.
It is possible to configure connection prioritization. The priority behavior is configured by the thread_pool_priority system variable.
By default, if thread_pool_priority is set to auto, then queries would be given a higher priority, in case the current connection is inside a transaction. This allows the running transaction to finish faster, and has the effect of lowering the number of transactions running in parallel. The default setting will generally improve throughput for transactional workloads. But it is also possible to explicitly set the priority for the current connection to either 'high' or 'low'.
There is also a mechanism in place to ensure that higher priority connections are not monopolizing the worker threads in the pool (which would cause indefinite delays for low priority connections). On Unix, low priority connections are put into the high priority queue after the timeout specified by the thread_pool_prio_kickup_timer system variable.
MariaDB allows you to configure an extra port for administrative connections. This is primarily intended to be used in situations where all threads in the thread pool are blocked, and you still need a way to access the server. However, it can also be used to ensure that monitoring systems (including MaxScale's monitors) always have access to the system, even when all connections on the main port are used. This extra port uses the old one-thread-per-connection thread handling.
You can enable this and configure a specific port by setting the extra_port system variable.
You can configure a specific number of connections for this port by setting the extra_max_connections system variable.
These system variables can be set in a server option group in an option file prior to starting up the server. For example:
Once you have the extra port configured, you can use the mariadb client with the -P option to connect to the port.
Currently there are two status variables exposed to monitor pool activity.
Number of threads in the thread pool. In rare cases, this can be slightly higher than , because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks.
Number of inactive threads in the thread pool. Threads become inactive for various reasons, such as by waiting for new work. However, an inactive thread is not necessarily one that has not been assigned work. Threads are also considered inactive if they are being blocked while waiting on disk I/O, or while waiting on a lock, etc. This status variable is only meaningful on Unix.
On Unix, the thread pool implementation uses objects called thread groups to divide up client connections into many independent sets of threads. See Thread Groups in the Unix Implementation of the Thread Pool for more information.
When using global locks, even with a high value on the thread_pool_max_threads system variable, it is still possible to block the entire pool.
Imagine the case where a client performs FLUSH TABLES WITH READ LOCK then pauses. If then the number of other clients connecting to the server to start write operations exceeds the maximum number of threads allowed in the pool, it can block the Server. This makes it impossible to issue the UNLOCK TABLES statement. It can also block MaxScale from monitoring the Server.
To mitigate the issue, MariaDB allows you to configure an extra port for administrative connections. See Configuring the Extra Port for information on how to configure this.
Once you have the extra port configured, you can use the mariadb client with the -P option to connect to the port.
This ensures that your administrators can access the server in cases where the number of threads is already equal to the configured value of the thread_pool_max_threads system variable, and all threads are blocked. It also ensures that MaxScale can still access the server in such situations for monitoring information.
Once you are connected to the extra port, you can solve the issue by increasing the value on the thread_pool_max_threads system variable, or by killing the offending connection, (that is, the connection that holds the global lock, which would be in the sleep state).
The following Information Schema tables relate to the thread pool:
Commercial editions of MySQL since 5.5 include an Oracle MySQL Enterprise thread pool implemented as a plugin, which delivers similar functionality. A detailed discussion about the design of the feature is at Mikael Ronstrom's blog. Here is the summary of similarities and differences, based on the above materials.
On Unix, both MariaDB and Oracle MySQL Enterprise Thread Pool will partition client connections into groups. The thread_pool_size parameter thus has the same meaning for both MySQL and MariaDB.
Both implementations use similar schema checking for thread stalls, and both have the same parameter name for thread_pool_stall_limit (though in MariaDB it is measured using millisecond units, not 10ms units like in Oracle MySQL).
The Windows implementation is completely different – MariaDB's uses native Windows thread pooling, while Oracle's implementation includes a convenience function WSAPoll() (provided for convenience to port Unix applications). As a consequence of relying on WSAPoll(), Oracle's implementation does not work with named pipes and shared memory connections.
MariaDB uses the most efficient I/O multiplexing facilities for each operating system: Windows (the I/O completion port is used internally by the native thread pool), Linux (epoll), Solaris (event ports), FreeBSD and OSX (kevent). Oracle uses optimized I/O multiplexing only on Linux, with epoll, and uses poll() otherwise.
Unlike Oracle MySQL Enterprise Thread Pool, MariaDB's one is built in, not a plugin.
Percona's implementation is a port of the MariaDB's thread pool with some added features. In particular, Percona added priority scheduling to its 5.5-5.7 releases. MariaDB and Percona priority scheduling works in a similar fashion, but there are some differences in details.
MariaDB's thread_pool_priority=auto,high, low correspond to Percona's thread_pool_high_prio_mode=transactions,statements,none.
Percona has a thread_pool_high_prio_tickets connection variable to allow every nth low priority query to be put into the high priority queue. MariaDB does not have a corresponding setting.
MariaDB has a thread_pool_prio_kickup_timer setting, which Percona does not have.
When running sysbench and maybe other benchmarks, that create many threads on the same machine as the server, it is advisable to run benchmark driver and server on different CPUs to get the realistic results. Running lots of driver threads and only several server threads on the same CPUs will have the effect that OS scheduler will schedule benchmark driver threads to run with much higher probability than the server threads, that is driver will pre-empt the server. Use "taskset –c" on Linuxes, and "set /affinity" on Windows to separate benchmark driver and server CPUs, as the preferred method to fix this situation.
A possible alternative on Unix (if taskset or a separate machine running the benchmark is not desired for some reason) would be to increase thread_pool_size to make the server threads more "competitive" against the client threads.
When running sysbench, a good rule of thumb could be to give 1/4 of all CPUs to the sysbench, and 3/4 of CPUs to mariadbd. It is also good idea to run sysbench and mariadbd on different NUMA nodes, if possible.
The thread_cache_size system variable is not used when the thread pool is used and the Threads_cached status variable will have a value of 0.
This page is licensed: CC BY-SA / Gnu FDL
optimizer_switch is a server variable that one can use to enable/disable specific optimizations.
To set or unset the various optimizations, use the following syntax:
The cmd takes the following format:
There is no need to list all flags - only those that are specified in the command will be affected.
Below is a list of all optimizer_switch flags available in MariaDB:
This page is licensed: CC BY-SA / Gnu FDL
FLOOR(UNIX_TIMESTAMP(dt) / 3600)
FROM_UNIXTIME(hour * 3600)PRIMARY KEY(city, datetime),
Aggregations: ct, sum_price
# Core of INSERT..SELECT:
DATE(datetime) AS DATE, city, COUNT(*) AS ct, SUM(price) AS sum_price
# Reporting average price FOR last month, broken down BY city:
SELECT city,
SUM(sum_price) / SUM(ct) AS 'AveragePrice'
FROM SalesSummary
WHERE datetime BETWEEN ...
GROUP BY city;
# Monthly sales, nationwide, FROM same summary TABLE:
SELECT MONTH(datetime) AS 'Month',
SUM(ct) AS 'TotalSalesCount'
SUM(sum_price) AS 'TotalDollars'
FROM SalesSummary
WHERE datetime BETWEEN ...
GROUP BY MONTH(datetime);
# This might benefit FROM a secondary INDEX(datetime)INSERT INTO Fact ...;
INSERT INTO Summary (..., ct, foo, ...) VALUES (..., 1, foo, ...)
ON DUPLICATE KEY UPDATE ct = ct+1, sum_foo = sum_foo + VALUES(foo), ...;FROM Fact
WHERE id BETWEEN min_id AND max_idid INT UNSIGNED AUTO_INCREMENT NOT NULL,
...
PRIMARY KEY(foo, dy, id), -- `id` added to make unique
INDEX(id) -- sufficient to keep AUTO_INCREMENT happySQRT( SUM(sum_foo2)/SUM(ct) - POWER(SUM(sum_foo)/SUM(ct), 2) )# Prep FOR flip:
CREATE TABLE new LIKE Staging;
# Swap (flip) Staging tables:
RENAME TABLE Staging TO old, new TO Staging;
# Normalize new `foo`s:
# (autocommit = 1)
INSERT IGNORE INTO Foos SELECT fpp FROM old LEFT JOIN Foos ...
# Prep FOR possible deadlocks, etc
WHILE...
START TRANSACTION;
# ADD TO Fact:
INSERT INTO Fact ... FROM old JOIN Foos ...
# Summarize:
INSERT INTO Summary ... FROM old ... GROUP BY ...
COMMIT;
end-WHILE
# Cleanup:
DROP TABLE old;CREATE TABLE Emails (
email_id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT, -- don't make bigger than needed
email VARCHAR(...) NOT NULL,
PRIMARY KEY (email), -- for looking up one way
INDEX(email_id) -- for looking up the other way (UNIQUE is not needed)
) ENGINE = InnoDB; -- to get clusteringINSERT IGNORE INTO Foos
SELECT DISTINCT foo FROM Staging; -- not wiseINSERT IGNORE INTO Foos
SELECT DISTINCT foo
FROM Staging
LEFT JOIN Foos ON Foos.foo = Staging.foo
WHERE Foos.foo_id IS NULL;INSERT INTO Fact (..., foo_id, ...)
SELECT ..., Foos.foo_id, ...
FROM Staging
JOIN Foos ON Foos.foo = Staging.foo;INSERT INTO Summary (dy, foo, ct, blah_total)
SELECT DATE(dt) AS dy, foo,
COUNT(*) AS ct, SUM(blah) AS blah_total)
FROM Staging
GROUP BY 1, 2;INSERT INTO Summary (dy, foo, ct, blah_total)
ON DUPLICATE KEY UPDATE
ct = ct + VALUE(ct),
blah_total = blah_total + VALUE(bt)
SELECT DATE(dt) AS dy, foo,
COUNT(*) AS ct, SUM(blah) AS bt)
FROM Staging
GROUP BY 1, 2;# Normalize:
$foo_id = SELECT foo_id FROM Foos WHERE foo = $foo;
IF NO $foo_id, THEN
INSERT IGNORE INTO Foos ...
# Inserts:
BEGIN;
INSERT INTO Fact ...;
INSERT INTO Summary ... ON DUPLICATE KEY UPDATE ...;
COMMIT;
# (plus code TO deal WITH errors ON INSERTs OR COMMIT)+------------------+----------------+------------+
| province | city | population |
+------------------+----------------+------------+
| Saskatchewan | Rosetown | 2309 |
| British Columbia | Chilliwack | 51942 |
| Nova Scotia | Yarmouth | 7500 |
| Alberta | Grande Prairie | 41463 |
| Quebec | Sorel | 33591 |
| Ontario | Moose Factory | 2060 |
| Ontario | Bracebridge | 8238 |
| British Columbia | Nanaimo | 84906 |
| Manitoba | Neepawa | 3151 |
| Alberta | Grimshaw | 2560 |
| Saskatchewan | Carnduff | 950 |
...+---------------------------+---------------+------------+
| province | city | population |
+---------------------------+---------------+------------+
| Alberta | Calgary | 968475 |
| British Columbia | Vancouver | 1837970 |
| Manitoba | Winnipeg | 632069 |
| New Brunswick | Saint John | 87857 |
| Newfoundland and Labrador | Corner Brook | 18693 |
| Northwest Territories | Yellowknife | 15866 |
| Nova Scotia | Halifax | 266012 |
| Nunavut | Iqaluit | 6124 |
| Ontario | Toronto | 4612187 |
| Prince Edward Island | Charlottetown | 42403 |
| Quebec | Montreal | 3268513 |
| Saskatchewan | Saskatoon | 198957 |
| Yukon | Whitehorse | 19616 |
+---------------------------+---------------+------------+SELECT c1.province, c1.city, c1.population
FROM Canada AS c1
JOIN
( SELECT province, MAX(population) AS population
FROM Canada
GROUP BY province
) AS c2 USING (province, population)
ORDER BY c1.province;SELECT
province, city, population -- The desired columns
FROM
( SELECT @prev := '' ) init
JOIN
( SELECT province != @prev AS first, -- `province` is the 'GROUP BY'
@prev := province, -- The 'GROUP BY'
province, city, population -- Also the desired columns
FROM Canada -- The table
ORDER BY
province, -- The 'GROUP BY'
population DESC -- ASC for MIN(population), DESC for MAX
) x
WHERE first
ORDER BY province; -- Whatever you likeSELECT province, city, population
FROM Canada AS c1
WHERE population =
( SELECT MAX(c2.population)
FROM Canada AS c2
WHERE c2.province= c1.province
)
ORDER BY province;SELECT c1.province, c1.city, c1.population
FROM Canada AS c1
LEFT JOIN Canada AS c2 ON c2.province = c1.province
AND c2.population > c1.population
WHERE c2.province IS NULL
ORDER BY province;SELECT
province, n, city, population
FROM
( SELECT @prev := '', @n := 0 ) init
JOIN
( SELECT @n := if(province != @prev, 1, @n + 1) AS n,
@prev := province,
province, city, population
FROM Canada
ORDER BY
province ASC,
population DESC
) x
WHERE n <= 3
ORDER BY province, n;+---------------------------+------+------------------+------------+
| province | n | city | population |
+---------------------------+------+------------------+------------+
| Alberta | 1 | Calgary | 968475 |
| Alberta | 2 | Edmonton | 822319 |
| Alberta | 3 | Red Deer | 73595 |
| British Columbia | 1 | Vancouver | 1837970 |
| British Columbia | 2 | Victoria | 289625 |
| British Columbia | 3 | Abbotsford | 151685 |
| Manitoba | 1 | ...+----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
| 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 5484 | 100.00 | Using where |
| 3 | DERIVED | Canada | ALL | NULL | NULL | NULL | NULL | 5484 | 100.00 | Using filesort |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+| Handler_read_rnd | 39 |
| Handler_read_rnd_next | 10971 |
| Handler_write | 5485 | -- #rows in Canada (+1)SELECT province, city, population
FROM Canada
JOIN
( SELECT GROUP_CONCAT(top_in_province) AS top_cities
FROM
( SELECT SUBSTRING_INDEX(
GROUP_CONCAT(city ORDER BY population DESC),
',', 3) AS top_in_province
FROM Canada
GROUP BY province
) AS x
) AS y
WHERE FIND_IN_SET(city, top_cities)
ORDER BY province, population DESC;| Alberta | Calgary | 968475 |
| Alberta | Edmonton | 822319 |
| Alberta | Red Deer | 73595 |
| British Columbia | Vancouver | 1837970 |
| British Columbia | Victoria | 289625 |
| British Columbia | Abbotsford | 151685 |
| British Columbia | Sydney | 0 | -- Nova Scotia's second largest is Sydney
| Manitoba | Winnipeg | 632069 || Handler_read_next | 5484 | -- table size
| Handler_read_rnd_next | 5500 | -- table size + number of provinces
| Handler_write | 14 | -- number of provinces (+1)-- build tmp table to get numbering
-- (Assumes auto_increment_increment = 1)
CREATE TEMPORARY TABLE t (
nth MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY(province, nth)
) ENGINE=MyISAM
SELECT province, NULL AS nth, city, population
FROM Canada
ORDER BY population DESC;
-- Output the biggest 3 cities in each province:
SELECT province, nth, city, population
FROM t
WHERE nth <= 3
ORDER BY province, nth;
+---------------------------+-----+------------------+------------+
| province | nth | city | population |
+---------------------------+-----+------------------+------------+
| Alberta | 1 | Calgary | 968475 |
| Alberta | 2 | Edmonton | 822319 |
| Alberta | 3 | Red Deer | 73595 |
| British Columbia | 1 | Vancouver | 1837970 |
| British Columbia | 2 | Victoria | 289625 |
| British Columbia | 3 | Abbotsford | 151685 |
| Manitoba | ...
SELECT FOR CREATE:
+----+-------------+--------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | Canada | ALL | NULL | NULL | NULL | NULL | 5484 | Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------+
Other SELECT:
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | t | index | NULL | PRIMARY | 104 | NULL | 22 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+| Handler_read_rnd_next | 10970 |
| Handler_write | 5484 | -- number of rows in Canada (write tmp table)[mariadb]
...
thread_handling=pool-of-threads[mariadb]
...
thread_handling=one-thread-per-connection[mariadb]
...
extra_port = 8385
extra_max_connections = 10$ mariadb -u root -P 8385 -p$ mariadb -u root -P 8385 -pSET [GLOBAL|SESSION] optimizer_switch='cmd[,cmd]...';default
duplicateweedout=on
engine_condition_pushdown=off
(deprecated in , removed in )
()
index_merge=on
index_merge_intersection=on
index_merge_sort_union=on
index_merge_union=on#
materialization=on (, )
default
Reset all optimizations to their default values.
optimization_name=default
Set the specified optimization to its default value.
optimization_name=on
Enable the specified optimization.
optimization_name=off
Disable the specified optimization.
MariaDB 10.6.16, MariaDB 10.11.6, , and
index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, duplicateweedout=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=on, sargable_casefold=on
index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=on, sargable_casefold=on
index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=off, sargable_casefold=on
MariaDB 10.6.16, MariaDB 10.11.6, , and
index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=off
index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, engine_condition_pushdown=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on,condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on
index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, engine_condition_pushdown=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on,condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=off
This article documents the major general thread states. More specific lists related to delayed inserts, replication, the query cache and the event scheduler are listed in:
These correspond to the STATE values listed by the statement or in the as well as the PROCESSLIST_STATE value listed in the
This page is licensed: CC BY-SA / Gnu FDL
converting HEAP to Aria
Converting an internal temporary table into an on-disk temporary table.
converting HEAP to MyISAM
Converting an internal temporary table into an on-disk temporary table.
copy to tmp table
A new table has been created as part of an statement, and rows are about to be copied into it.
Copying to group table
Sorting the rows by group and copying to a temporary table, which occurs when a statement has different and criteria.
Copying to tmp table
Copying to a temporary table in memory.
Copying to tmp table on disk
Copying to a temporary table on disk, as the resultset is too large to fit into memory.
Creating index
Processing an for an or table.
Creating sort index
Processing a statement resolved using an internal temporary table.
creating table
Creating a table (temporary or non-temporary).
Creating tmp table
Creating a temporary table (in memory or on-disk).
deleting from main table
Deleting from the first table in a multi-table , saving columns and offsets for use in deleting from the other tables.
deleting from reference tables
Deleting matched rows from secondary reference tables as part of a multi-table .
discard_or_import_tablespace
Processing an or statement.
end
State before the final cleanup of an , , , , , or statement.
executing
Executing a statement.
Execution of init_command
Executing statements specified by the --init_command option.
filling schema table
A table in the database is being built.
freeing items
Freeing items from the after executing a command. Usually followed by the cleaning up state.
Flushing tables
Executing a statement and waiting for other threads to close their tables.
FULLTEXT initialization
Preparing to run a search. This includes running the fulltext search (MATCH ... AGAINST) and creating a list of the result in memory
init
About to initialize an , , , , or statement. Could be performaing cleanup, or flushing the or InnoDB log.
Killed
Thread will abort next time it checks the kill flag. Requires waiting for any locks to be released.
Locked
Query has been locked by another query.
logging slow query
Writing statement to the .
NULL
State used for .
login
Connection thread has not yet been authenticated.
manage keys
Enabling or disabling a table index.
Opening table[s]
Trying to open a table. Usually very quick unless the limit set by has been reached, or an or is in progress.
optimizing
Server is performing initial optimizations in for a query.
preparing
State occurring during query optimization.
Purging old relay logs
Relay logs that are no longer needed are being removed.
query end
Query has finished being processed, but items have not yet been freed (the freeing items state.
Reading file
Server is reading the file (for example during ).
Reading from net
Server is reading a network packet.
Removing duplicates
Duplicated rows being removed before sending to the client. This happens when SELECT DISTINCT is used in a way that the distinct operation could not be optimized at an earlier point.
removing tmp table
Removing an internal temporary table after processing a statement.
rename
Renaming a table.
rename result table
Renaming a table that results from an statement having created a new table.
Reopen tables
Table is being re-opened after thread obtained a lock but the underlying table structure had changed, so the lock was released.
Repair by sorting
Indexes are being created with the use of a sort. Much faster than the related Repair with keycache.
Repair done
Multi-threaded repair has been completed.
Repair with keycache
Indexes are being created through the key cache, one-by-one. Much slower than the related Repair by sorting.
Rolling back
A transaction is being rolled back.
Saving state
New table state is being saved. For example, after, analyzing a table, the key distributions, rowcount etc. are saved to the .MYI file.
Searching rows for update
Finding matching rows before performing an , which is needed when the UPDATE would change the index used for the UPDATE
Sending data
Sending data to the client as part of processing a statement or other statements that returns data like . Often the longest-occurring state as it also include all reading from tables and disk read activities. Where an aggregation or un-indexed filtering occurs there is significantly more rows read than what is sent to the client.
setup
Setting up an operation.
Sorting for group
Sorting as part of a
Sorting for order
Sorting as part of an
Sorting index
Sorting index pages as part of a table optimization operation.
Sorting result
Processing a statement using a non-temporary table.
statistics
Calculating statistics as part of deciding on a query execution plan. Usually a brief state unless the server is disk-bound.
System lock
Requesting or waiting for an external lock for a specific table. The determines what kind of external lock to use. For example, the storage engine uses file-based locks. However, MyISAM's external locks are disabled by default, due to the default value of the system variable. Transactional storage engines such as also register the transaction or statement with MariaDB's while in this thread state. See for more information about that.
Table lock
About to request a table's internal lock after acquiring the table's external lock. This thread state occurs after the System lock thread state.
update
About to start updating table.
Updating
Searching for and updating rows in a table.
updating main table
Updating the first table in a multi-table update, and saving columns and offsets for use in the other tables.
updating reference tables
Updating the secondary (reference) tables in a multi-table update
updating status
This state occurs after a query's execution is complete. If the query's execution time exceeds , then is incremented, and if the is enabled, then the query is logged. If the plugin is enabled, then the query is also logged into the audit log at this stage. If the plugin is enabled, then CPU statistics are also updated at this stage.
User lock
About to request or waiting for an advisory lock from a call. For , means requesting a lock only.
User sleep
A call has been invoked.
Waiting for commit lock
is waiting for a commit lock, or a statement resulting in an explicit or implicit commit is waiting for a read lock to be released. This state was called Waiting for all running commits to finish in earlier versions.
Waiting for global read lock
Waiting for a global read lock.
Waiting for table level lock
External lock acquired,and internal lock about to be requested. Occurs after the System lock state. In earlier versions, this was called Table lock.
Waiting for xx lock
Waiting to obtain a lock of type xx.
Waiting on cond
Waiting for an unspecified condition to occur.
Writing to net
Writing a packet to the network.
After create
The function that created (or tried to create) a table (temporary or non-temporary) has just ended.
Analyzing
Calculating table key distributions, such as when running an ANALYZE TABLE statement.
checking permissions
Checking to see whether the permissions are adequate to perform the statement.
Checking table
Checking the table.
cleaning up
Preparing to reset state variables and free memory after executing a command.
closing tables
Flushing the changes to disk and closing the table. This state will only persist if the disk is full or under extremely high load.
Hints that are not ignored are kept in the query text (you can see them in SHOW PROCESSLIST, Slow Query Log, EXPLAIN EXTENDED). Hints that were incorrect and were ignored are removed from there.
Hints can be:
global - they apply to whole query;
table-level - they apply to a table;
index-level - they apply to an index in a table.
Index-level hints apply to indexes. Possible syntax variants are:
The optimizer can be controlled by
server variables - optimizer_switch, join_cache_level, and so forth;
old-style hints;
new-style hints.
Old-style hints do not overlap with server variable settings.
New-style hints are more specific than server variable settings, so they override the server variable settings.
Hints are "narrowly interpreted" and "best effort" - if a hint dictates to do something, for example:
It means: When considering a query plan that involves using t1_index1 in a way that one can use MRR, use MRR. If the query planning is such that use of t1_index1 doesn't allow to use MRR, it won't be used.
The optimizer may also consider using t1_index2 and pick that over using t1_index1. In such cases, the hint is effectively ignored and no warning is given.
The QB_NAME hint is used to assign a name to the query block the hint is in. The Query block is either a SELECT statement or a top-level construct of an UPDATE or DELETE statement.
The name can then can be used
to refer to the query block;
to refer to a table in the query block as table_name@query_block_name.
Query block scope is the whole statement. It is invalid to use the same name for multiple query blocks. You can refer to the query block "down into subquery", "down into derived table", "up to the parent" and "to a right sibling in the UNION". You cannot refer "to a left sibling in a UNION".
Hints inside views are not supported, yet. You can neither use hints in VIEW definitions, nor control query plans inside non-merged views. (This is because QB_NAME binding are done "early", before we know that some tables are views.)
Besides the given name, any query block is given a name select#n (where #n stands for a number). You can see it when running EXPLAIN EXTENDED:
It is not possible to use it in the hint text:
Hints that control @name will control the first use of the CTE (common table expression).
Does not consider ROWID filter for the scope of the hint (all tables in the query block, specific table, and specific indexes). See ROWID_FILTER for details.
When a derived table is materialized, MariaDB processes and stores the results of that derived table temporarily before joining it with other tables. The "lateral derived" optimization specifically looks for ways to optimize these types of derived tables. It does that by pushing a splitting condition down into the derived table, to limit the number of rows materialized into the derived table. The SPLIT_MATERIALIZED hint forces this behavior, while NO_SPLIT_MATERIALIZED prevents it.
NO_SPLIT_MATERIALIZED(X) disables the use of split-materialized optimization in the context of X :
Like NO_RANGE_OPTIMIZATION or MRR, this hint can be applied to:
Query blocks — NO_ROWID_FILTER()
Table — NO_ROWID_FILTER(table_name)
Specific indexes — NO_ROWID_FILTER(table_name index1 index2 ...)
Forces the use of ROWID_FILTER for the table index it targets:
For query blocks and tables, it enables the use of the ROWID filter, assuming it is disabled globally.
For indexes, it forces its use, regardless of the costs. The following query forces the use of the ROWID filter made from t1.idx1 if the chosen plan allows so (that is, if the access method to t1 allows it):
Assuming the optimizer would pick idx2 for table t1 if the hint was not used, this could result in the usage of both idx2 and idx1 if the hint is used. That might become more expensive than a full table scan, or result in a change of the join order.
Therefore, do not "blindly" use this filter, but rather make sure its use doesn't have a negative impact as described.
When a derived table is materialized, MariaDB processes and stores the results of that derived table temporarily before joining it with other tables. The "lateral derived" optimization specifically looks for ways to optimize these types of derived tables. It does that by pushing a splitting condition down into the derived table, to limit the number of rows materialized into the derived table. The SPLIT_MATERIALIZED hint forces this behavior, while NO_SPLIT_MATERIALIZED prevents it.
SPLIT_MATERIALIZED(X) enables and forces the use of split-materialized optimization in the context of X, unless it is impossible to do (for instance, because a table is not a materialized derived table).
Hints are placed after the main statement verb.
They can also appear after the SELECT keyword in any subquery:
There can be one or more hints separated with space:
An index-level hint that enables or disables the specified indexes for an access method (range, ref, etc.). Equivalent to FORCE INDEX FOR JOIN and IGNORE INDEX FOR JOIN.
An index-level hint that enables or disables the specified indexes for index scans for GROUP BY operations. Equivalent to FORCE INDEX FOR GROUP BY and IGNORE INDEX FOR GROUP BY.
An index-level hint that enables or disables the specified indexes for sorting rows. Equivalent to FORCE INDEX FOR ORDER BY and IGNORE INDEX FOR ORDER BY.
An index-level hint that enables or disables the specified indexes, for all scopes (join access method, GROUP BY, or sorting). Equivalent to FORCE INDEX and IGNORE INDEX.
The hints operate by modifying the set of keys the optimizer considers for SELECT statements. The specific behavior depends on whether specific index keys are provided within the hint.
The INDEX_MERGE and NO_INDEX_MERGE optimizer hints provide granular control over the optimizer's use of index merge strategies. They allow users to override the optimizer's cost-based calculations and global switch settings, to force or prevent the merging of indexes for specific tables.
The hints operate by modifying the set of keys the optimizer considers for merge operations. The specific behavior depends on whether specific index keys are provided within the hint.
This hint instructs the optimizer to employ an index merge strategy.
Without arguments: When specified as INDEX_MERGE(tbl), the optimizer considers all available keys for that table and selects the cheapest index merge combination.
With specific keys: When specified with keys, for instance, INDEX_MERGE(tbl key1, key2), the optimizer considers only the listed keys for the merge operation. All other keys are excluded from consideration for index merging.
This hint instructs the optimizer to avoid index merge strategies.
Without arguments: When specified as NO_INDEX_MERGE(tbl), index merge optimizations are completely disabled for the specified table.
With specific keys: When specified with keys, for instance, NO_INDEX_MERGE(tbl key1), the listed keys are excluded from consideration. The optimizer may still perform a merge using other available keys. However, if excluding the listed keys leaves insufficient row-ordered retrieval (ROR) scans available, no merge is performed.
While these hints control which keys are candidates for merging, they do not directly dictate the specific merge algorithm (Intersection, Union, or Sort-Union).
Indirect Control: You can influence the strategy indirectly by combining these hints with optimizer_switch settings, but specific algorithm selection is not guaranteed.
Invalid Hints: If a hint directs the optimizer to use specific indexes, but those indexes do not provide sufficient ROR scans to form a valid plan, the server is unable to honor the hint. in this scenario, the server emits a warning.
In the following examples, the index_merge_intersection switch is globally disabled. However, the INDEX_MERGE hint forces the optimizer to consider specific keys (f2 and f4), resulting in an intersection strategy.
You can see that we disable intersection with NO_INDEX_MERGE for the following query and the behavior reflects in the EXPLAIN output. The query after that shows with the hint enabling merge–an intersection of f3,f4 is used. In the last example, a different intersection is used: f3,PRIMARY.
No intersection (no merged indexes):
Intersection of keys f3, f4:
Intersection of keys PRIMARY, f3:
An index-level hint that disables range optimization for certain index(es):
An index-level hint that disables Index Condition Pushdown for the indexes. ICP+BKA is disabled as well.
Index-level hints to force or disable use of MRR.
This controls:
MRR optimization for range access;
BKA.
Query block or table-level hints.
BKA() also enables MRR to make BKA possible. (This is different from session variables, where you need to enable MRR separately). This also enables BKAH.
Controls BNL-H.
The implementation is "BNL() hint effectively increases join_cache_level up to 4 " .. for the table(s) it applies to.
Global-level hint to limit query execution time
A query that doesn't finish in the time specified will be aborted with an error.
If @@max_statement_time is set, the hint will be ignored and a warning produced. Note that this contradicts the stated principle that "new-style hints are more specific than server variable settings, so they override the server variable settings".
Enables or disables the use of the Split Materialized Optimization (also called the Lateral Derived Optimization).
Enables or disables the use of condition pushdown for derived tables.
Table-level hint that enables the use of merging, or disables and uses materialization, for the specified tables, views or common table expressions.
Query block-level hint.
This controls non-semi-join subqueries. The parameter specifies which subquery to use. Use of this hint disables conversion of subquery into semi-join.
For details, see the Subquery Hints section.
Query block-level hints.
This controls the conversion of subqueries to semi-joins and which semi-join strategies are allowed.
where the strategy is one of DUPSWEEDOUT, FIRSTMATCH, LOOSESCAN, MATERIALIZATION.
Hints are placed after the main statement verb.
They can also appear after the SELECT keyword in any subquery:
There can be one or more hints separated with space:
Syntax of the JOIN_FIXED_ORDER hint:
Syntax of other join-order hints:
For the following join order hint syntax,
tbl is the name of a table used in the statement. A hint that names tables applies to all tables that it names. The JOIN_FIXED_ORDER hint names no tables and applies to all tables in the FROM clause of the query block in which it occurs;
query_block_name is the query block to which the hint applies. If the hint includes no leading @query_block_name, it applies to the query block in which it occurs. When using the tbl@query_block_name syntax, the hint applies to the named table in the named query block. To assign a name to a query block, see .
General notes:
If a table has an alias, hints must refer to the alias, not the table name.
Table names in hints cannot be qualified with schema names.
Forces the optimizer to join tables using the order in which they appear in the FROM clause. This is the same as specifying SELECT STRAIGHT_JOIN.
Instructs the optimizer to join tables using the specified table order. The hint applies to the named tables. The optimizer may place tables that are not named anywhere in the join order, including between specified tables.
Alternative syntax:
JOIN_ORDER(tbl[@query_block_name] [, tbl[@query_block_name]] ...)
Instructs the optimizer to join tables using the specified table order for the first tables of the join execution plan. The hint applies to the named tables. The optimizer places all other tables after the named tables.
Alternative syntax:
JOIN_PREFIX(tbl[@query_block_name] [, tbl[@query_block_name]] ...)
Instructs the optimizer to join tables using the specified table order for the last tables of the join execution plan. The hint applies to the named tables. The optimizer places all other tables before the named tables.
Subquery hints determine:
If semijoin transformations are to be used;
Which semijoin strategies are permitted;
When semijoins are not used, whether to use subquery materialization or IN-to-EXISTS transformations.
hint_name: The following hint names are permitted to enable or disable the named semijoin strategies: SEMIJOIN, NO_SEMIJOIN.
strategy: Enable or disable a semi-join strategy. The following strategy names are permitted: DUPSWEEDOUT, FIRSTMATCH, LOOSESCAN, MATERIALIZATION.
For SEMIJOIN hints, if no strategies are named, semi-join is used based on the strategies enabled according to the optimizer_switch system variable, if possible. If strategies are named, but inapplicable for the statement, DUPSWEEDOUT is used.
For NO_SEMIJOIN hints, semi-join is not used if no strategies are named. If named strategies rule out all applicable strategies for the statement, DUPSWEEDOUT is used.
If a subquery is nested within another, and both are merged into a semi-join of an outer query, any specification of semi-join strategies for the innermost query are ignored. SEMIJOIN and NO_SEMIJOIN hints can still be used to enable or disable semi-join transformations for such nested subqueries.
If DUPSWEEDOUT is disabled, the optimizer may generate a query plan that is far from optimal.
Syntax of hints that affect whether to use subquery materialization or IN-to-EXISTS transformations:
The hint name is always SUBQUERY.
For SUBQUERY hints, these strategy values are permitted: INTOEXISTS, MATERIALIZATION.
For semi-join and SUBQUERY hints, a leading @query_block_name specifies the query block to which the hint applies. If the hint includes no leading @query_block_name, the hint applies to the query block in which it occurs. To assign a name to a query block, see Naming Query Blocks.
If a hint comment contains multiple subquery hints, the first is used. If there are other following hints of that type, they produce a warning. Following hints of other types are silently ignored.
The query cache stores results of SELECT queries so that if the identical query is received in future, the results can be quickly returned.
This is extremely useful in high-read, low-write environments (such as most websites). It does not scale well in environments with high throughput on multi-core machines, so it is disabled by default.
Note that the query cache cannot be enabled in certain environments. See .
Unless MariaDB has been specifically built without the query cache, the query cache will always be available, although inactive. The server variable will show whether the query cache is available.
hint: hint_name([arguments])hint_name([table_name [table_name [,...]] )hint_name(table_name [index_name [, index_name] ...])
hint_name(table_name@query_block [index_name [, index_name] ...])
hint_name(@query_block table_name [index_name [, index_name] ...])SELECT /*+ MRR(t1 t1_index1) */ ... FROM t1 ...SELECT /*+ QB_NAME(foo) */ select_list FROM ...Note 1003 SELECT /*+ NO_RANGE_OPTIMIZATION(t3@select#1 PRIMARY) */ ...SELECT /*+ BKA(tbl1@`select#1`) */ 1 FROM tbl1 ...;/* +NO_ROWID_FILTER([table_name [index_name [ ... ] ]] ) */SELECT
/*+ NO_SPLIT_MATERIALIZED(CUST_TOTALS) */
...
FROM
customer
(SELECT SUM(amount), o_custkey FROM orders GROUP BY o_custkey) as CUST_TOTALS
WHERE
customer.c_custkey= o_custkey AND
customer.country='FI';/* +ROWID_FILTER( [table_name [index_name [ ...] ]]) */SELECT /*+ ROWID_FILTER(t1 idx1) */
...SELECT
/*+ SPLIT_MATERIALIZED(CUST_TOTALS) */
...
FROM
customer
(SELECT SUM(amount), o_custkey FROM orders GROUP BY o_custkey) as CUST_TOTALS
WHERE
customer.c_custkey= o_custkey AND
customer.country='FI';UPDATE /*+ hints */ table ...;
DELETE /*+ hints */ FROM table... ;
SELECT /*+ hints */ ...SELECT * FROM t1 WHERE a IN (SELECT /*+ hints */ ...)hints: hint hint .../*+ INDEX(table_name [index_name, ...]) */
/*+ NO_INDEX(table_name [index_name, ...]) *//*+ INDEX_MERGE(table_name [index_name, ...]) */
/*+ NO_INDEX_MERGE(table_name [index_name, ...]) */MariaDB [test]> EXPLAIN SELECT /*+ NO_INDEX_MERGE(t1 f2, f4, f3) */ COUNT(*) FROM t1 WHERE f4 = 'h' AND f3 = 'b' AND f5 = 'i'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: PRIMARY,f3,f4
key: f3
key_len: 9
ref: const,const
rows: 1
Extra: Using index condition; Using where
1 row in set (0.009 sec)MariaDB [test]> EXPLAIN SELECT /*+ INDEX_MERGE(t1 f2, f4, f3) */ COUNT(*) FROM t1 WHERE f4 = 'h' AND f3 = 'b' AND f5 = 'i'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index_merge
possible_keys: PRIMARY,f3,f4
key: f3,f4
key_len: 9,9
ref: NULL
rows: 1
Extra: Using intersect(f3,f4); Using where; Using index
1 row in set (0.010 sec)MariaDB [test]> EXPLAIN SELECT COUNT(*) FROM t1 WHERE f4 = 'h' AND f3 = 'b' AND f5 = 'i'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index_merge
possible_keys: PRIMARY,f3,f4
key: f3,PRIMARY
key_len: 9,4
ref: NULL
rows: 1
Extra: Using intersect(f3,PRIMARY); Using where
1 row in set (0.006 sec)SELECT /*+ NO_RANGE_OPTIMIZATION(tbl index1 index2) */ * FROM tbl ...SELECT /*+ NO_ICP(tbl index1 index2) */ * FROM tbl ...SELECT /*+ MRR(tbl index1 index2) */ * FROM tbl ...
SELECT /*+ NO_MRR(tbl index1 index2) */ * FROM tbl ...SELECT /*+ MAX_EXECUTION_TIME(milliseconds) */ ... ;SUBQUERY([@query_block_name] MATERIALIZATION)
SUBQUERY([@query_block_name] INTOEXISTS)[NO_]SEMIJOIN([@query_block_name] [strategy [, strategy] ...])UPDATE /*+ hints */ table ...;
DELETE /*+ hints */ FROM table... ;
SELECT /*+ hints */ ...SELECT * FROM t1 WHERE a IN (SELECT /*+ hints */ ...)hints: hint hint ...hint_name([@query_block_name])hint_name([@query_block_name] tbl_name [, tbl_name] ...)
hint_name(tbl_name[@query_block_name] [, tbl_name[@query_block_name]] ...)hint_name([@query_block_name] [strategy [, strategy] ...])SELECT /*+ NO_SEMIJOIN(@subquery1 FIRSTMATCH, LOOSESCAN) */ * FROM t2
WHERE t2.a IN (SELECT /*+ QB_NAME(subq1) */ a FROM t3);
SELECT /*+ SEMIJOIN(@subquery1 MATERIALIZATION, DUPSWEEDOUT) */ * FROM t2
WHERE t2.a IN (SELECT /*+ QB_NAME(subquery1) */ a FROM t3);SUBQUERY([@query_block_name] strategy)SELECT id, a IN (SELECT /*+ SUBQUERY(MATERIALIZATION) */ a FROM t1) FROM t2;
SELECT * FROM t2 WHERE t2.a IN (SELECT /*+ SUBQUERY(INTOEXISTS) */ a FROM t1);NO, you cannot enable the query cache unless you rebuild or reinstall a version of MariaDB with the cache available.To see if the cache is enabled, view the query_cache_type server variable. It is disabled by default — enable it by setting query_cache_type to 1 :
The query_cache_size is set to 1MB by default. Set the cache to a larger size if needed, for example:
The query_cache_type is automatically set to ON if the server is started with the query_cache_size set to a non-zero (and non-default) value.
See Limiting the size of the Query Cache below for details.
When the query cache is enabled and a new SELECT query is processed, the query cache is examined to see if the query appears in the cache.
Queries are considered identical if they use the same database, same protocol version and same default character set. Prepared statements are always considered as different to non-prepared statements, see Query cache internal structure for more info.
If the identical query is not found in the cache, the query will be processed normally and then stored, along with its result set, in the query cache. If the query is found in the cache, the results will be pulled from the cache, which is much quicker than processing it normally.
Queries are examined in a case-sensitive manner, so :
Is different from :
Comments are also considered and can make the queries differ, so :
Is different from :
See the query_cache_strip_comments server variable for an option to strip comments before searching.
Each time changes are made to the data in a table, all affected results in the query cache are cleared. It is not possible to retrieve stale data from the query cache.
When the space allocated to query cache is exhausted, the oldest results will be dropped from the cache.
When using query_cache_type=ON, and the query specifies SQL_NO_CACHE (case-insensitive), the server will not cache the query and will not fetch results from the query cache.
When using query_cache_type=DEMAND and the query specifies SQL_CACHE, the server will cache the query.
If the query_cache_type system variable is set to 1, or ON, all queries fitting the size constraints will be stored in the cache unless they contain a SQL_NO_CACHE clause, or are of a nature that caching makes no sense, for example making use of a function that returns the current time. Queries with SQL_NO_CACHE will not attempt to acquire query cache lock.
If any of the following functions are present in a query, it will not be cached. Queries with these functions are sometimes called 'non-deterministic' — don't get confused with the use of this term in other contexts.
(one parameter)
A query will also not be added to the cache if:
It is of the form:
SELECT SQL_NO_CACHE ...
SELECT ... INTO OUTFILE ...
SELECT ... INTO DUMPFILE ...
SELECT ... FOR UPDATE
SELECT * FROM ... WHERE autoincrement_column IS NULL
SELECT ... LOCK IN SHARE MODE
It uses TEMPORARY table
It uses no tables at all
It generates a warning
The user has a column-level privilege on any table in the query
It accesses a table from INFORMATION_SCHEMA, mysql or the performance_schema database
It makes use of user or local variables
It makes use of stored functions
It makes use of user-defined functions
It is inside a transaction with the SERIALIZABLE isolation level
It is quering a table inside a transaction after the same table executed a query cache invalidation using INSERT, UPDATE or DELETE
The query itself can also specify that it is not to be stored in the cache by using the SQL_NO_CACHE attribute. Query-level control is an effective way to use the cache more optimally.
It is also possible to specify that no queries must be stored in the cache unless the query requires it. To do this, the query_cache_type server variable must be set to 2, or DEMAND. Then, only queries with the SQL_CACHE attribute are cached.
There are two main ways to limit the size of the query cache. First, the overall size in bytes is determined by the query_cache_size server variable. About 40KB is needed for various query cache structures.
The query cache size is allocated in 1024 byte-blocks, thus it should be set to a multiple of 1024.
The query result is stored using a minimum block size of query_cache_min_res_unit. Check two conditions to use a good value of this variable: Query cache insert result blocks with locks, each new block insert lock query cache, a small value will increase locks and fragmentation and waste less memory for small results, a big value will increase memory use wasting more memory for small results but it reduce locks. Test with your workload for fine tune this variable.
If the strict mode is enabled, setting the query cache size to an invalid value will cause an error. Otherwise, it will be set to the nearest permitted value, and a warning will be triggered.
The ideal size of the query cache is very dependent on the specific needs of each system. Setting a value too small will result in query results being dropped from the cache when they could potentially be re-used later. Setting a value too high could result in reduced performance due to lock contention, as the query cache is locked during updates.
The second way to limit the cache is to have a maximum size for each set of query results. This prevents a single query with a huge result set taking up most of the available memory and knocking a large number of smaller queries out of the cache. This is determined by the query_cache_limit server variable.
If you attempt to set a query cache that is too small (the amount depends on the architecture), the resizing will fail and the query cache will be set to zero, for example :
A number of status variables provide information about the query cache.
Qcache_inserts contains the number of queries added to the query cache, Qcache_hits contains the number of queries that have made use of the query cache, while Qcache_lowmem_prunes contains the number of queries that were dropped from the cache due to lack of memory.
The above example could indicate a poorly performing cache. More queries have been added, and more queries have been dropped, than have actually been used.
Results returned by the query cache count towards Com_select (see MDEV-4981).
The QUERY_CACHE_INFO plugin creates the QUERY_CACHE_INFO table in the INFORMATION_SCHEMA, allowing you to examine the contents of the query cache.
The Query Cache uses blocks of variable length, and over time may become fragmented. A high Qcache_free_blocks relative to Qcache_total_blocks may indicate fragmentation. FLUSH QUERY CACHE will defragment the query cache without dropping any queries :
After this, there will only be one free block :
To empty or clear all results from the query cache, use RESET QUERY CACHE. FLUSH TABLES will have the same effect.
Setting either query_cache_type or query_cache_size to 0 will disable the query cache, but to free up the most resources, set both to 0 when you wish to disable caching.
The query cache can be used when tables have a write lock (which may seem confusing since write locks should avoid table reads). This behaviour can be changed by setting the query_cache_wlock_invalidate system variable to ON, in which case each write lock will invalidate the table query cache. Setting to OFF, the default, means that cached queries can be returned even when a table lock is being held. For example:
The query cache handles transactions. Internally a flag (FLAGS_IN_TRANS) is set to 0 when a query was executed outside a transaction, and to 1 when the query was inside a transaction (begin / COMMIT / ROLLBACK). This flag is part of the "query cache hash", in others words one query inside a transaction is different from a query outside a transaction.
Queries that change rows (INSERT / UPDATE / DELETE / TRUNCATE) inside a transaction will invalidate all queries from the table, and turn off the query cache to the changed table. Transactions that don't end with COMMIT / ROLLBACK check that even without COMMIT / ROLLBACK, the query cache is turned off to allow row level locking and consistency level.
Examples:
Internally, each flag that can change a result using the same query is a different query. For example, using the latin1 charset and using the utf8 charset with the same query are treated as different queries by the query cache.
Some fields that differentiate queries are (from "Query_cache_query_flags" internal structure) :
query (string)
current database schema name (string)
client long flag (0/1)
client protocol 4.1 (0/1)
protocol type (internal value)
more results exists (protocol flag)
in trans (inside transaction or not)
autocommit ( session variable)
pkt_nr (protocol flag)
character set client ( session variable)
character set results ( session variable)
collation connection ( session variable)
limit ( session variable)
time zone ( session variable)
sql_mode ( session variable)
max_sort_length ( session variable)
group_concat_max_len ( session variable)
default_week_format ( session variable)
div_precision_increment ( session variable)
lc_time_names ( session variable)
When searching for a query inside the query cache, a try_lock function waits with a timeout of 50ms. If the lock fails, the query isn't executed via the query cache. This timeout is hard-coded (MDEV-6766 include two variables to tune this timeout).
From the sql_cache.cc, function "try_lock" using TIMEOUT :
When inserting a query inside the query cache or aborting a query cache insert (using the KILL command for example), a try_lock function waits until the query cache returns; no timeout is used in this case.
When two processes execute the same query, only the last process stores the query result. All other processes increase the Qcache_not_cached status variable.
There are two aspects to the query cache: placing a query in the cache, and retrieving it from the cache.
Adding a query to the query cache. This is done automatically for cacheable queries (see (Queries Stored in the Query Cache) when the query_cache_type system variable is set to 1, or ON and the query contains no SQL_NO_CACHE clause, or when the query_cache_type system variable is set to 2, or DEMAND, and the query contains the SQL_CACHE clause.
Retrieving a query from the cache. This is done after the server receives the query and before the query parser. In this case one point should be considered:
When using SQL_NO_CACHE, it should be after the first SELECT hint:
Don't use it like this:
The second query will be checked. The query cache only checks if SQL_NO_CACHE/SQL_CACHE exists after the first SELECT. (More info at MDEV-6631)
This page is licensed: CC BY-SA / Gnu FDL
You have a SELECT and you want to build the best INDEX for it. This blog is a "cookbook" on how to do that task.
A short algorithm that works for many simpler SELECTs and helps in complex queries.
Examples of the algorithm, plus digressions into exceptions and variants
Finally a long list of "other cases".
The hope is that a newbie can quickly get up to speed, and his/her INDEXes will no longer smack of "newbie".
Many edge cases are explained, so even an expert may find something useful here.
Here's the way to approach creating an INDEX, given a SELECT. Follow the steps below, gathering columns to put in the INDEX in order. When the steps give out, you usually have the 'perfect' index.
Given a WHERE with a bunch of expressions connected by AND: Include the columns (if any), in any order, that are compared to a constant and not hidden in a function.
You get one more chance to add to the INDEX; do the first of these that applies:
2a. One column used in a 'range' -- BETWEEN, '>', LIKE w/o leading wildcard, etc.
2b. All columns, in order, of the GROUP BY.
2c. All columns, in order, of the ORDER BY if there is no mixing of ASC and DESC.
This blog assumes you know the basic idea behind having an INDEX. Here is a refresher on some of the key points.
Virtually all INDEXes in MySQL are structured as BTrees BTrees allow very efficient for
Given a key, find the corresponding row(s);
"Range scans" -- That is start at one value for the key and repeatedly find the "next" (or "previous") row.
A PRIMARY KEY is a UNIQUE KEY; a UNIQUE KEY is an INDEX. ("KEY" == "INDEX".)
InnoDB "clusters" the PRIMARY KEY with the data. Hence, given the value of the PK ("PRIMARY KEY"), after drilling down the BTree to find the index entry, you have all the columns of the row when you get there. A "secondary key" (any UNIQUE or INDEX other than the PK) in InnoDB first drills down the BTree for the secondary index, where it finds a copy of the PK. Then it drills down the PK to find the row.
Every InnoDB table has a PRIMARY KEY. While there is a default if you do not specify one, it is best to explicitly provide a PK.
For completeness: MyISAM works differently. All indexes (including the PK) are in separate BTrees. The leaf node of such BTrees have a pointer (usually a byte offset) into the data file.
All discussion here assumes InnoDB tables, however most statements apply to other Engines.
Think of a list of names, sorted by last_name, then first_name. You have undoubtedly seen such lists, and they often have other information such as address and phone number. Suppose you wanted to look me up. If you remember my full name ('James' and 'Rick'), it is easy to find my entry. If you remembered only my last name ('James') and first initial ('R'). You would quickly zoom in on the Jameses and find the Rs in them. There, you might remember 'Rick' and ignore 'Ronald'. But, suppose you remembered my first name ('Rick') and only my last initial ('J'). Now you are in trouble. You would be scanning all the Js -- Jones, Rick; Johnson, Rick; Jamison, Rick; etc, etc. That's much less efficient.
Those equate to
Think about this example as I talk about "=" versus "range" in the Algorithm, below.
WHERE aaa = 123 AND ... : an INDEX starting with aaa is good.
WHERE aaa = 123 AND bbb = 456 AND ... : an INDEX starting with aaa and bbb is good. In this case, it does not matter whether aaa or bbb comes first in the INDEX.
xxx IS NULL : this acts like "= const" for this discussion.
Note that the expression must be of the form of column_name = (constant). These do not apply to this step in the Algorithm: DATE(dt) = '...', LOWER(s) = '...', CAST(s ...) = '...', x='...' COLLATE...
(If there are no "=" parts AND'd in the WHERE clause, move on to step 2 without any columns in your putative INDEX.)
Find the first of 2a / 2b / 2c that applies; use it; then quit. If none apply, then you are through gathering columns for the index.
In some cases it is optimal to do step 1 (all equals) plus step 2c (ORDER BY).
A "range" shows up as
aaa >= 123 -- any of <, <=, >=, >; but not <>, !=
aaa BETWEEN 22 AND 44
sss LIKE 'blah%' -- but not sss LIKE '%blah'
If there are more parts to the WHERE clause, you must stop now.
Complete examples (assume nothing else comes after the snippet)
WHERE aaa >= 123 AND bbb = 1 ⇒ INDEX(bbb, aaa) (WHERE order does not matter; INDEX order does)
WHERE aaa >= 123 ⇒ INDEX(aaa)
WHERE aaa >= 123 AND ccc > 'xyz' ⇒ INDEX(aaa) or INDEX(ccc) (only one range)
If there is a GROUP BY, all the columns of the GROUP BY should now be added, in the specified order, to the INDEX you are building. (I do not know what happens if one of the columns is already in the INDEX.)
If you are GROUPing BY an expression (including function calls), you cannot use the GROUP BY; stop.
Complete examples (assume nothing else comes after the snippet)
WHERE aaa = 123 AND bbb = 1 GROUP BY ccc ⇒ INDEX(bbb, aaa, ccc) or INDEX(aaa, bbb, ccc) (='s first, in any order; then the GROUP BY)
WHERE aaa >= 123 GROUP BY xxx ⇒ INDEX(aaa) (You should have stopped with Step 2a)
GROUP BY x,y ⇒ INDEX(x,y) (no WHERE)
If there is a ORDER BY, all the columns of the ORDER BY should now be added, in the specified order, to the INDEX you are building.
If there are multiple columns in the ORDER BY, and there is a mixture of ASC and DESC, do not add the ORDER BY columns; they won't help; stop.
If you are ORDERing BY an expression (including function calls), you cannot use the ORDER BY; stop.
Complete examples (assume nothing else comes after the snippet)
WHERE aaa = 123 GROUP BY ccc ORDER BY ddd ⇒ INDEX(aaa, ccc) -- should have stopped with Step 2b
WHERE aaa = 123 GROUP BY ccc ORDER BY ccc ⇒ INDEX(aaa, ccc) -- the ccc will be used for both GROUP BY and ORDER BY
WHERE aaa = 123 ORDER BY xxx ASC, yyy DESC ⇒ INDEX(aaa) -- mixture of ASC and DESC.
The following are especially good. Normally a LIMIT cannot be applied until after lots of rows are gathered and then sorted according to the ORDER BY. But, if the INDEX gets all they way through the ORDER BY, only (OFFSET + LIMIT) rows need to be gathered. So, in these cases, you win the lottery with your new index:
WHERE aaa = 123 GROUP BY ccc ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)
WHERE aaa = 123 ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)
ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc)
(It does not make much sense to have a LIMIT without an ORDER BY, so I do not discuss that case.)
You have collected a few columns; put them in INDEX and ADD that to the table. That will often produce a "good" index for the SELECT you have. Below are some other suggestions that may be relevant.
An example of the Algorithm being 'wrong':
This would (according to the Algorithm) call for INDEX(flag). However, indexing a column that has two (or a small number of) values is almost always useless. This is called 'low cardinality'. The Optimizer would prefer to do a table scan than to bounce between the index BTree and the data.
On the other hand, the Algorithm is 'right' with
That would call for a compound index starting with a flag: INDEX(flag, date). Such an index is likely to be very beneficial. And it is likely to be more beneficial than INDEX(date).
If your resulting INDEX include column(s) that are likely to be UPDATEd, note that the UPDATE will have extra work to remove a 'row' from one place in the INDEX's BTree and insert a 'row' back into the BTree. For example:
There are too many variables to say whether it is better to keep the index or to toss it.
In this case, shortening the index may be beneficial:
Changing to INDEX(z) would make for less work for the UPDATE, but might hurt some SELECT. It depends on the frequency of each, plus many more factors.
(There are exceptions to some of these.)
You may not create an index bigger than 3KB.
You may not include a column that equates to bigger than some value (767 bytes -- VARCHAR(255) CHARACTER SET utf8).
You can deal with big fields using "prefix" indexing; but see below.
You should not have more than 5 columns in an index. (This is just a Rule of Thumb; nothing prevents having more.)
INDEX(flag) is almost never useful if flag has very few values. More specifically, when you say WHERE flag = 1 and "1" occurs more than 20% of the time, such an index will be shunned. The Optimizer would prefer to scan the table instead of bouncing back and forth between the index and the data for more than 20% of the rows.
("20%" is really somewhere between 10% and 30%, depending on the phase of the moon.)
A "Covering" index is an index that contains all the columns in the SELECT. It is special in that the SELECT can be completed by looking only at the INDEX BTree. (Since InnoDB's PRIMARY KEY is clustered with the data, "covering" is of no benefit when considering at the PRIMARY KEY.)
Mini-cookbook:
Gather the list of column(s) according to the "Algorithm", above.
Add to the end of the list the rest of the columns seen in the SELECT, in any order.
Examples:
SELECT x FROM t WHERE y = 5; ⇒ INDEX(y,x) -- The algorithm said just INDEX(y)
SELECT x,z FROM t WHERE y = 5 AND q = 7; ⇒ INDEX(y,q,x,z) -- y and q in either order (Algorithm), then x and z in either order (covering).
SELECT x FROM t WHERE y > 5 AND q > 7; ⇒ INDEX(y,q,x) -- y or q first (that's as far as the Algorithm goes), then the other two fields afterwards.
The speedup you get might be minor, or it might be spectacular; it is hard to predict.
But...
It is not wise to build an index with lots of columns. Let's cut it off at 5 (Rule of Thumb).
Prefix indexes cannot 'cover', so don't use them anywhere in a 'covering' index.
There are limits (3KB?) on how 'wide' an index can be, so "covering" may not be possible.
INDEX(a,b) can find anything that INDEX(a) could find. So you don't need both. Get rid of the shorter one.
If you have lots of SELECTs and they generate lots of INDEXes, this may cause a different problem. Each index must be updated (sooner or later) for each INSERT. More indexes ⇒ slower INSERTs. Limit the number of indexes on a table to about 6 (Rule of Thumb).
Notice in the cookbook how it says "in any order" in a few places. If, for example, you have both of these (in different SELECTs):
WHERE a=1 AND b=2 begs for either INDEX(a,b) or INDEX(b,a)
WHERE a>1 AND b=2 begs only for INDEX(b,a)
Include only INDEX(b,a) since it handles both cases with only one INDEX.
Suppose you have a lot of indexes, including (a,b,c,dd) and (a,b,c,ee). Those are getting rather long. Consider either picking one of them, or having simply (a,b,c). Sometimes the selectivity of (a,b,c) is so good that tacking on 'dd' or 'ee' does make enough difference to matter.
The main cookbook skips over an important optimization that is sometimes used. The optimizer will sometimes ignore the WHERE and, instead, use an INDEX that matches the ORDER BY. This, of course, needs to be a perfect match -- all columns, in the same order. And all ASC or all DESC.
This becomes especially beneficial if there is a LIMIT.
But there is a problem. There could be two situations, and the Optimizer is sometimes not smart enough to see which case applies:
If the WHERE does very little filtering, fetching the rows in ORDER BY order avoids a sort and has little wasted effort (because of 'little filtering'). Using the INDEX matching the ORDER BY is better in this case.
If the WHERE does a lot of filtering, the ORDER BY is wasting a lot of time fetching rows only to filter them out. Using an INDEX matching the WHERE clause is better.
What should you do? If you think the "little filtering" is likely, then create an index with the ORDER BY columns in order and hope that the Optimizer uses it when it should.
Cases...
WHERE a=1 OR a=2 -- This is turned into WHERE a IN (1,2) and optimized that way.
WHERE a=1 OR b=2 usually cannot be optimized.
WHERE x.a=1 OR y.b=2 This is even worse because of using two different tables.
A workaround is to use UNION. Each part of the UNION is optimized separately. For the second case:
Now the query can take good advantage of two different indexes. Note: "Index merge" might kick in on the original query, but it is not necessarily any faster than the UNION. Sister blog on compound indexes, including 'Index Merge'
The third case (OR across 2 tables) is similar to the second.
If you originally had a LIMIT, UNION gets complicated. If you started with ORDER BY z LIMIT 190, 10, then the UNION needs to be
You cannot directly index a TEXT or BLOB or large VARCHAR or large BINARY column. However, you can use a "prefix" index: INDEX(foo(20)). This says to index the first 20 characters of foo. But... It is rarely useful.
Example of a prefix index:
The index for me would contain 'Ja', 'Rick'. That's not useful for distinguishing between 'Jamison', 'Jackson', 'James', etc., so the index is so close to useless that the optimizer often ignores it.
Probably never do UNIQUE(foo(20)) because this applies a uniqueness constraint on the first 20 characters of the column, not the whole column!
DATE, DATETIME, etc. are tricky to compare against.
Some tempting, but inefficient, techniques:
date_col LIKE '2016-01%' -- must convert date_col to a string, so acts like a functionLEFT(date_col, 4) = '2016-01' -- hiding the column in functionDATE(date_col) = 2016 -- hiding the column in function
All must do a full scan. (On the other hand, it can handy to use GROUP BY LEFT(date_col, 7) for monthly grouping, but that is not an INDEX issue.)
This is efficient, and can use an index:
This case works because both right-hand values are converted to constants, then it is a "range". I like the design pattern with INTERVAL because it avoids computing the last day of the month. And it avoids tacking on '23:59:59', which is wrong if you have microsecond times. (And other cases.)
Perform EXPLAIN SELECT... (and EXPLAIN FORMAT=JSON SELECT... if you have 5.6.5). Look at the Key that it chose, and the Key_len. From those you can deduce how many columns of the index are being used for filtering. (JSON makes it easier to get the answer.) From that you can decide whether it is using as much of the INDEX as you thought. Caveat: Key_len only covers the WHERE part of the action; the non-JSON output won't easily say whether GROUP BY or ORDER BY was handled by the index.
IN (1,99,3) is sometimes optimized as efficiently as "=", but not always. Older versions of MySQL did not optimize it as well as newer versions. (5.6 is possibly the main turning point.)
IN ( SELECT ... )
From version 4.1 through 5.5, IN ( SELECT ... ) was very poorly optimized. The SELECT was effectively re-evaluated every time. Often it can be transformed into a JOIN, which works much faster. Heres is a pattern to follow:
The SELECT expressions will need "a." prefixing the column names.
Alas, there are cases where the pattern is hard to follow.
5.6 does some optimizing, but probably not as good as the JOIN.
If there is a JOIN or GROUP BY or ORDER BY LIMIT in the subquery, that complicates the JOIN in new format. So, it might be better to use this pattern:
Caveat: If you end up with two subqueries JOINed together, note that neither has any indexes, hence performance can be very bad. (5.6 improves on it by dynamically creating indexes for subqueries.)
There is work going on in MariaDB and Oracle 5.7, in relation to "NOT IN", "NOT EXISTS", and "LEFT JOIN..IS NULL"; here is an old discussion on the topic So, what I say here may not be the final word.
When you have a JOIN and a GROUP BY, you may have the situation where the JOIN exploded more rows than the original query (due to many:many), but you wanted only one row from the original table, so you added the GROUP BY to implode back to the desired set of rows.
This explode + implode, itself, is costly. It would be better to avoid them if possible.
Sometimes the following will work.
Using DISTINCT or GROUP BY to counteract the explosion
When using second table just to check for existence:
Do it this way.
Notes:
Lack of an AUTO_INCREMENT id for this table -- The PK given is the 'natural' PK; there is no good reason for a surrogate.
"MEDIUMINT" -- This is a reminder that all INTs should be made as small as is safe (smaller ⇒ faster). Of course the declaration here must match the definition in the table being linked to.
"UNSIGNED" -- Nearly all INTs may as well be declared non-negative
"NOT NULL" -- Well, that's true, isn't it?
To conditionally INSERT new links, use
Note that if you had an AUTO_INCREMENT in this table, IODKU would "burn" ids quite rapidly.
Each subquery SELECT and each SELECT in a UNION can be considered separately for finding the optimal INDEX.
Exception: In a "correlated" ("dependent") subquery, the part of the WHERE that depends on the outside table is not easily factored into the INDEX generation. (Cop out!)
The first step is to decide what order the optimizer will go through the tables. If you cannot figure it out, then you may need to be pessimistic and create two indexes for each table -- one assuming the table will be used first, one assiming that it will come later in the table order.
The optimizer usually starts with one table and extracts the data needed from it. As it finds a useful (that is, matches the WHERE clause, if any) row, it reaches into the 'next' table. This is called NLJ ("Nested Loop Join"). The process of filtering and reaching to the next table continues through the rest of the tables.
The optimizer usually picks the "first" table based on these hints:
STRAIGHT_JOIN forces the table order.
The WHERE clause limits which rows needed (whether indexed or not).
The table to the "left" in a LEFT JOIN usually comes before the "right" table. (By looking at the table definitions, the optimizer may decide that "LEFT" is irrelevant.)
The current INDEXes will encourage an order.
Running EXPLAIN tells you the table order that the Optimizer is very likely to use today. After adding a new INDEX, the optimizer may pick a different table order. You should anticipate the order changing, guess at what order makes the most sense, and build the INDEXes accordingly. Then rerun EXPLAIN to see if the Optimizer's brain was on the same wavelength you were on.
You should build the INDEX for the "first" table based on any parts of the WHERE, GROUP BY, and ORDER BY clauses that are relevant to it. If a GROUP/ORDER BY mentions a different table, you should ignore that clause.
The second (and subsequent) table will be reached into based on the ON clause. (Instead of using commajoin, please write JOINs with the JOIN keyword and ON clause!) In addition, there could be parts of the WHERE clause that are relevant. GROUP/ORDER BY are not to be considered in writing the optimal INDEX for subsequent tables.
PARTITIONing is rarely a substitute for a good INDEX.
PARTITION BY RANGE is a technique that is sometimes useful when indexing fails to be good enough. In a two-dimensional situation such as nearness in a geographical sense, one dimension can partially be handled by partition pruning; then the other dimension can be handled by a regular index (preferrably the PRIMARY KEY). This goes into more detail: .
FULLTEXT is now implemented in InnoDB as well as MyISAM. It provides a way to search for "words" in TEXT columns. This is much faster (when it is applicable) than col LIKE '%word%'.
always(?) uses the FULLTEXT index first. That is, the whole Algorithm is invalidated when one of the ANDs is a MATCH.
No "compound" (aka "composite") indexes
No PRIMARY KEY
Redundant indexes (especially blatant is PRIMARY KEY(id), KEY(id))
Most or all columns individually indexes ("But I indexed everything")
The published table (see Wikipedia) is
The problems:
The AUTO_INCREMENT provides no benefit; in fact it slows down most queries and clutters disk.
Much better is PRIMARY KEY(post_id, meta_key) -- clustered, handles both parts of usual JOIN.
BIGINT is overkill, but that can't be fixed without changing other tables.
VARCHAR(255) can be a problem in 5.6 with utf8mb4; see workarounds below.
The solutions:
Initial posting: March, 2015; Refreshed Feb, 2016; Add DATE June, 2016; Add WP example May, 2017.
The tips in this document apply to MySQL, MariaDB, and Percona.
Some info in the MySQL manual:
A short, but complicated,
This blog is the consolidation of a Percona tutorial I gave in 2013, plus many years of experience in fixing thousands of slow queries on hundreds of systems. I apologize that this does not tell you how create INDEXes for all SELECTs. Some are just too complex.
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
You want to find the nearest 10 pizza parlors, but you cannot figure out how to do it efficiently in your huge database. Database indexes are good at one-dimensional indexing, but poor at two-dimensions.
You might have tried
INDEX(lat), INDEX(lon) -- but the optimizer used only one
INDEX(lat,lon) -- but it still had to work too hard
Sometimes you ended up with a full table scan -- Yuck.
WHERE < ... -- No chance of using any index.
WHERE lat BETWEEN ... AND lng BETWEEN... -- This has some chance of using such indexes.
The goal is to look only at records "close", in both directions, to the target lat/lng.
in MariaDB and MySQL sort of give you a way to have two clustered indexes. So, if we could slice up (partition) the globe in one dimension and use ordinary indexing in the other dimension, maybe we can get something approximating a 2D index. This 2D approach keeps the number of disk hits significantly lower than 1D approaches, thereby speeding up "find nearest" queries.
It works. Not perfectly, but better than the alternatives.
What to PARTITION on? It seems like latitude or longitude would be a good idea. Note that longitudes vary in width, from 69 miles (111 km) at the equator, to 0 at the poles. So, latitude seems like a better choice.
How many PARTITIONs? It does not matter a lot. Some thoughts:
90 partitions - 2 degrees each. (I don't like tables with too many partitions; 90 seems like plenty.)
50-100 - evenly populated. (This requires code. For 2.7M placenames, 85 partitions varied from 0.5 degrees to very wide partitions at the poles.)
Don't have more than 100 partitions, there are inefficiencies in the partition implementation.
How to PARTITION? Well, MariaDB and MySQL are very picky. So / are out. is out. So, we are stuck with some kludge. Essentially, we need to convert Lat/Lng to some size of and use PARTITION BY RANGE.
To get to a datatype that can be used in PARTITION, you need to "scale" the latitude and longitude. (Consider only the *INTs; the other datatypes are included for comparison)
(Sorted by resolution)
What these mean...
Deg*100 () -- you take the lat/lng, multiply by 100, round, and store into a SMALLINT. That will take 2 bytes for each dimension, for a total of 4 bytes. Two items might be 1570 meters apart, but register as having the same latitude and longitude.
for latitude and DECIMAL(5,2) for longitude will take 2+3 bytes and have no better resolution than Deg*100.
SMALLINT scaled -- Convert latitude into a SMALLINT SIGNED by doing (degrees / 90 * 32767) and rounding; longitude by (degrees / 180 * 32767).
has 24 significant bits; has 53. (They don't work with PARTITIONing but are included for completeness. Often people use DOUBLE without realizing how much an overkill it is, and how much space it takes.)
Sure, you could do DEG_1000 and other "in between" cases, but there is no advantage. DEG_1000 takes as much space as DEG*10000, but has less resolution.
So, go down the list to see how much resolution you need, then pick an encoding you are comfortable with. However, since we are about to use latitude as a "partition key", it must be limited to one of the INTs. For the sample code, I will use Deg*10000 ().
GCDist is a helper FUNCTION that correctly computes the distance between two points on the globe.
The code has been benchmarked at about 20 microseconds per call on a 2011-vintage PC. If you had to check a million points, that would take 20 seconds -- far too much for a web application. So, one goal of the Procedure that uses it will be to minimize the usage of this function. With the code presented here, the function need be called only a few dozen or few hundred times, except in pathological cases.
Sure, you could use the Pythagorean formula. And it would work for most applications. But it does not take extra effort to do the GC. Furthermore, GC works across a pole and across the dateline. And, a Pythagorean function is not that much faster.
For efficiency, GCDist understands the scaling you picked and has that stuff hardcoded. I am picking "Deg*10000", so the function expects 350000 for representing 35 degrees. If you choose a different scaling, you will need to change the code.
GCDist() takes 4 scaled DOUBLEs -- lat1, lon1, lat2, lon2 -- and returns a scaled number of "degrees" representing the distance.
The table of representation choices says 52 feet of resolution for Deg*10000 and DECIMAL(x,4). Here is how it was calculated: To measuring a diagonal between lat/lng (0,0) and (0.0001,00001) (one 'unit in the last place'): GCDist(0,0,1,1) * 69.172 / 10000 * 5280 = 51.65, where
69.172 miles/degree of latitude
10000 units per degree for the scaling chosen
5280 feet / mile.
(No, this function does not compensate for the Earth being an oblate spheroid, etc.)
There will be one table (plus normalization tables as needed). The one table must be partitioned and indexed as indicated below.
Fields and indexes
PARTITION BY RANGE(lat)
lat -- scaled latitude (see above)
lon -- scaled longitude
PRIMARY KEY(lon, lat, ...) -- lon must be first; something must be added to make it UNIQUE
For most of this discussion, lat is assumed to be MEDIUMINT -- scaled from -90 to +90 by multiplying by 10000. Similarly for lon and -180 to +180.
The PRIMARY KEY must
start with lon since the algorithm needs the "clustering" that InnoDB will provide, and
include lat somewhere, since it is the PARTITION key, and
contain something to make the key UNIQUE (lon+lat is unlikely to be sufficient).
The FindNearest PROCEDURE will do multiple SELECTs something like this:
The query planner will
Do PARTITION "pruning" based on the latitude; then
Within a PARTITION (which is effectively a table), use lon do a 'clustered' range scan; then
Use the "condition" to filter down to the rows you desire, plus recheck lat. This design leads to very few disk blocks needing to be read, which is the main goal of the design.
Note that this does not even call GCDist. That comes in the last pass when the ORDER BY and LIMIT are used.
The has a loop. At least two SELECTs will be executed, but with proper tuning; usually no more than about 6 SELECTs will be performed. Because of searching by the PRIMARY KEY, each SELECT hits only one block, sometimes more, of the table. Counting the number of blocks hit is a crude, but effective way, of comparing the performance of multiple designs. By comparison, a full table scan will probably touch thousands of blocks. A simple INDEX(lat) probably leads to hitting hundreds of blocks.
Filtering... An argument to the FindNearest procedure includes a boolean expression ("condition") for a WHERE clause. If you don't need any filtering, pass in "1". To avoid "SQL injection", do not let web users put arbitrary expressions; instead, construct the "condition" from inputs they provide, thereby making sure it is safe.
The algorithm is embodied in a because of its complexity.
You feed it a starting width for a "square" and a number of items to find.
It builds a "square" around where you are.
A SELECT is performed to see how many items are in the square.
Loop, doubling the width of the square, until enough items are found.
The next section ("Performance") should make this a bit clearer as it walks through some examples.
Because of all the variations, it is hard to get a meaningful benchmark. So, here is some hand-waving instead.
Each SELECT is constrained by a "square" defined by a latitude range and a longitude range. (See the WHERE clause mentioned above, or in the sample code below.) Because of the way longitude lines warp, the longitude range of the "square" will be more degrees than the latitude range. Let's say the latitude partitioning is 3 degrees wide in the area where you are searching. That is over 200 miles (over 300km), so you are very likely to have a latitude range smaller than the partition width. Still, if you are reaching from the edge of a latitude stripe, the square could span two partitions. After partition pruning down to one (sometimes more) partition, the query is then constrained by a longitude range. (Remember, the PRIMARY KEY starts with lon.) If an InnoDB data block contains 100 rows (a handy Rule of Thumb), the select will touch one (or a few) block. If the square spans two (or more) partitions, then the same logic applies to each partition.
So, scanning the square will involve as little as one block; rarely more than a few blocks. The number of blocks is mostly independent of the dataset size.
The primary use case for this algorithm is when the data is significantly larger than will fit into cache (the buffer_pool). Hence, the main goal is to minimize the number of disk hits.
Now let's look at some edge cases, and argue that the number of blocks is still better (usually) than with traditional indexing techniques.
What if you are looking for Starbucks in a dense city? There would be dozens, maybe hundreds per square mile. If you start the guess at 100 miles, the SELECTs would be hitting lots of blocks -- not efficient. In this case, the "starting distance" should be small, say, 2 miles. Let's say your app wants the closest 10 stores. In this example, you would probably find more than 10 Starbucks within 2 miles in 1 InnoDB block in one partition. Even though there is a second SELECT to finish off the query, it would be hitting the same block. Total: One block hit == cheap.
Let's say you start with a 5 mile square. Since there are upwards of 200 Starbucks within a 5-miles radius in some dense cities of the world, that might imply 300 in our "square". That maps to about 4 disk blocks, and a modest amount of CPU to chew through the 300 records. Still not bad.
Now, suppose you are on an ocean liner somewhere in the Pacific. And there is one Starbucks onboard, but you are looking for the nearest 10. If you again start with 2 miles, it will take several iterations to find 10 sites. But, let's walk through it anyway. The first probe will hit one partition (maybe 2), and find just one hit. The second probe doubles the width of the square; 4 miles will still give you one hit -- the same hit in the same block, which is now cached, so we won't count it as a second disk I/O. Eventually the square will be wide enough to span multiple partitions. Each extra partition will be one new disk hit to discover no sites in the square. Finally, the square will hit Chile or Hawaii or Fiji and find some more sites, perhaps enough to stop the iteration. Since the main criteria in determining the number of disk hits is the number of partitions hit, we do not want to split the world into too many partitions. If there are, say, 40 partitions, then I have just described a case where there might be 20 disk hits.
2-degree partitions might be good for a global table of stores or restaurants. A 5-mile starting distance might be good when filtering for Starbucks. 20 miles might be better for a department store.
Now, let's discuss the 'last' SELECT, wherein the square is expanded by SQRT(2) and it uses the Great Circle formula to precisely order the N results. The SQRT(2) is in case that the N items were all at the corners of the 'square'. Growing the square by this much allows us to catch any other sites that were just outside the old square.
First, note that this 'last' SELECT is hitting the same block(s) that the iteration hit, plus possibly hitting some more blocks. It is hard to predict how many extra blocks might be hit. Here's a pathological case. You are in the middle of a desert; the square grows and grows. Eventually it finds N sites. There is a big city just outside the final square from the iterating. Now the 'last' SELECT kicks in, and it includes lots of sites in this big city. "Lots of sites" --> lots of blocks --> lots of disk hits.
Here's the gist of the FindNearest().
Make a guess at how close to "me" to look.
See how many items are in a 'square' around me, after filtering.
If not enough, repeat, doubling the width of the square.
After finding enough, or giving up because we are looking "too far", make one last pass to get all the data, ORDERed and LIMITed
Note that the loop merely uses 'squares' of lat/lng ranges. This is crude, but works well with the partitioning and indexing, and avoids calling to GCDist (until the last step). In the sample code, I picked 15 miles as starting value. Adjusting this will have some impact on the Procedure's performance, but the impact will vary with the use cases. A rough way to set the radius is to guess what will find the desired LIMIT about half the time. (This value is hardcoded in the PROCEDURE.)
Parameters passed into FindNearest():
your Latitude -- -90..90 (not scaled -- see hardcoded conversion in PROCEDURE)
your Longitude -- -180..180 (not scaled)
Start distance -- (miles or km) -- see discussion below
Max distance -- in miles or km -- see hardcoded conversion in PROCEDURE
The function will find the nearest items, up to Limit that meet the Condition. But it will give up at Max distance. (If you are at the South Pole, why bother searching very far for the tenth pizza parlor?)
Because of the "scaling", "hardcoding", "Condition", the table name, etc, this PROCEDURE is not truly generic; the code must be modified for each application. Yes, I could have designed it to pass all that stuff in. But what a mess.
The "_start_dist" gives some control over the performance. Making this too small leads to extra iterations; too big leads to more rows being checked. If you choose to tune the Stored Procedure, do the following. "SELECT @iterations" after calling the SP for a number of typical values. If the value is usually 1, then decrease _start_dist. If it is usually 2 or more, then increase it.
Timing: Under 10ms for "typical" usage; any dataset size. Slower for pathological cases (low min distance, high max distance, crossing dateline, bad filtering, cold cache, etc)
End-cases:
By using GC distance, not Pythagoras, distances are 'correct' even near poles.
Poles -- Even if the "nearest" is almost 360 degrees away (longitude), it can find it.
Dateline -- There is a small, 'contained', piece of code for crossing the Dateline. Example: you are at +179 deg longitude, and the nearest item is at -179.
The procedure returns one resultset, SELECT *, distance.
Only rows that meet your Condition, within Max distance are returned
At most Limit rows are returned
The rows will be ordered, "closest" first.
"dist" will be in miles or km (based on a hardcoded constant in the SP)
This version is based on scaling "Deg*10000 (MEDIUMINT)".
There is a "Haversine" algorithm that is twice as fast as the GCDist function here. But it has a fatal flaw of sometimes returning NULL for the distance between a point and itself. (This is because of computing a number slightly bigger than 1.0, then trying to take the ACOS of it.)
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos, optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
GUIDs/UUIDs (Globally/Universally Unique Identifiers) are very random. Therefore, INSERTing into an index means jumping around a lot. Once the index is too big to be cached, most INSERTs involve a disk hit. Even on a beefy system, this limits you to a few hundred INSERTs per second.
.
This blog is mostly eliminated in MySQL 8.0 with the advent of the following function:.
SHOW VARIABLES LIKE 'have_query_cache';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| have_query_cache | YES |
+------------------+-------+SET GLOBAL query_cache_type = 1;SET GLOBAL query_cache_size = 2000000;SELECT * FROM tSELECT * from t/* retry */SELECT * FROM t/* retry2 */SELECT * FROM tSHOW VARIABLES LIKE 'query_cache_size';
+------------------+----------+
| Variable_name | Value |
+------------------+----------+
| query_cache_size | 67108864 |
+------------------+----------+
SET GLOBAL query_cache_size = 8000000;
Query OK, 0 rows affected, 1 warning (0.03 sec)
SHOW VARIABLES LIKE 'query_cache_size';
+------------------+---------+
| Variable_name | Value |
+------------------+---------+
| query_cache_size | 7999488 |
+------------------+---------+SET GLOBAL query_cache_size=40000;
Query OK, 0 rows affected, 2 warnings (0.03 sec)
SHOW WARNINGS;
+---------+------+-----------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect query_cache_size value: '40000' |
| Warning | 1282 | Query cache failed to set size 39936; new query cache size is 0 |
+---------+------+-----------------------------------------------------------------+SHOW STATUS LIKE 'Qcache%';
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Qcache_free_blocks | 1158 |
| Qcache_free_memory | 3760784 |
| Qcache_hits | 31943398 |
| Qcache_inserts | 42998029 |
| Qcache_lowmem_prunes | 34695322 |
| Qcache_not_cached | 652482 |
| Qcache_queries_in_cache | 4628 |
| Qcache_total_blocks | 11123 |
+-------------------------+----------+FLUSH QUERY CACHE;SHOW STATUS LIKE 'Qcache%';
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 6101576 |
| Qcache_hits | 31981126 |
| Qcache_inserts | 43002404 |
| Qcache_lowmem_prunes | 34696486 |
| Qcache_not_cached | 655607 |
| Qcache_queries_in_cache | 4197 |
| Qcache_total_blocks | 8833 |
+-------------------------+----------+1> SELECT * FROM T1
+---+
| a |
+---+
| 1 |
+---+
-- Here the query is cached
-- From another connection execute:
2> LOCK TABLES T1 WRITE;
-- Expected result with: query_cache_wlock_invalidate = OFF
1> SELECT * FROM T1
+---+
| a |
+---+
| 1 |
+---+
-- read from query cache
-- Expected result with: query_cache_wlock_invalidate = ON
1> SELECT * FROM T1
-- Waiting Table Write LockSELECT * FROM T1 <first INSERT TO query cache, USING FLAGS_IN_TRANS=0>
+---+
| a |
+---+
| 1 |
+---+BEGIN;
SELECT * FROM T1 <first INSERT TO query cache, USING FLAGS_IN_TRANS=1>
+---+
| a |
+---+
| 1 |
+---+SELECT * FROM T1 <result FROM query cache, USING FLAGS_IN_TRANS=1>
+---+
| a |
+---+
| 1 |
+---+INSERT INTO T1 VALUES(2); <invalidate queries FROM TABLE T1 AND disable query cache TO TABLE T1>SELECT * FROM T1 <don't USE query cache, a normal query FROM innodb TABLE>
+---+
| a |
+---+
| 1 |
| 2 |
+---+SELECT * FROM T1 <don't USE query cache, a normal query FROM innodb TABLE>
+---+
| a |
+---+
| 1 |
| 2 |
+---+COMMIT; <query cache IS now turned ON TO T1 TABLE>SELECT * FROM T1 <first INSERT TO query cache, USING FLAGS_IN_TRANS=0>
+---+
| a |
+---+
| 1 |
+---+SELECT * FROM T1 <result FROM query cache, USING FLAGS_IN_TRANS=0>
+---+
| a |
+---+
| 1 |
+---+struct timespec waittime;
set_timespec_nsec(waittime,(ulong)(50000000L)); /* Wait for 50 msec */
int res= mysql_cond_timedwait(&COND_cache_status_changed,
&structure_guard_mutex, &waittime);
if (res == ETIMEDOUT)
break;SELECT SQL_NO_CACHE .... FROM (SELECT SQL_CACHE ...) AS temp_tableSELECT SQL_CACHE .... FROM (SELECT SQL_NO_CACHE ...) AS temp_tableUNIX_TIMESTAMP() (no parameters)
WHERE t1.aa = 123 AND t2.bb = 456 -- You must only consider columns in the current table.
xxx IS NOT NULL
Add the column in the range to your putative INDEX.
WHERE aaa >= 123 ORDER BY aaa ⇒ INDEX(aaa) -- Bonus: The ORDER BY will use the INDEX.WHERE aaa >= 123 ORDER BY aaa ⇒ INDEX(aaa) DESC -- Same Bonus.
WHERE aaa = 123 GROUP BY xxx, (a+b) ⇒ INDEX(aaa) -- expression in GROUP BY, so no use including even xxx.
WHERE ccc > 432 ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc) -- This "range" is compatible with ORDER BYYou should not have redundant indexes. (See below.)
"InnoDB" -- More effecient than MyISAM because of the way the PRIMARY KEY is clustered with the data in InnoDB.
"INDEX(y_id, x_id)" -- The PRIMARY KEY makes it efficient to go one direction; this index makes the other direction efficient. No need to say UNIQUE; that would be extra effort on INSERTs.
In the secondary index, saying justINDEX(y_id) would work because it would implicit include x_id. But I would rather make it more obvious that I am hoping for a 'covering' index.
etc.
FROM a, b WHERE a.x=b.x instead of FROM a JOIN b ON a.x=b.xWhen would meta_key or meta_value ever be NULL?
Indexing 101: Optimizing MySQL queries on a single table (Stephane Combaudon - Percona)
INDEX(id) -- if id is AUTO_INCREMENT, then this plain INDEX (not UNIQUE, not PRIMARY KEY) is necessary
ENGINE=InnoDB -- so the PRIMARY KEY will be "clustered"
Other indexes -- keep to a minimum (this is a general performance rule for large tables)
Now, a 'last' SELECT is performed to get the exact distances, sort them (ORDER BY) and LIMIT to the desired number.
If spanning a pole or the dateline, a more complex SELECT is used.
Limit -- maximum number of items to return
Condition -- something to put after 'AND' (more discussion above)
INDEX(last_name, first_name) -- the order of the list.
WHERE last_name = 'James' AND first_name = 'Rick' -- best case
WHERE last_name = 'James' AND first_name LIKE 'R%' -- pretty good
WHERE last_name LIKE 'J%' AND first_name = 'Rick' -- pretty badSELECT ... FROM t WHERE flag = true;SELECT ... FROM t WHERE flag = true AND date >= '2015-01-01';INDEX(x)
UPDATE t SET x = ... WHERE ...;INDEX(z, x)
UPDATE t SET x = ... WHERE ...;( SELECT ... WHERE a=1 ) -- and have INDEX(a)
UNION DISTINCT -- "DISTINCT" is assuming you need to get rid of dups
( SELECT ... WHERE b=2 ) -- and have INDEX(b)
GROUP BY ... ORDER BY ... -- whatever you had at the end of the original query( SELECT ... LIMIT 200 ) -- Note: OFFSET 0, LIMIT 190+10
UNION DISTINCT -- (or ALL)
( SELECT ... LIMIT 200 )
LIMIT 190, 10 -- Same as originallyINDEX(last_name(2), first_name)date_col >= '2016-01-01'
AND date_col < '2016-01-01' + INTERVAL 3 MONTHSELECT ...
FROM a
WHERE test_a
AND x IN (
SELECT x
FROM b
WHERE test_b
);
⇒
SELECT ...
FROM a
JOIN b USING(x)
WHERE test_a
AND test_b;SELECT ...
FROM a
WHERE test_a
AND x IN ( SELECT x FROM ... );
⇒
SELECT ...
FROM a
JOIN ( SELECT x FROM ... ) b
USING(x)
WHERE test_a;SELECT DISTINCT
a.*,
b.y
FROM a
JOIN b
⇒
SELECT a.*,
( SELECT GROUP_CONCAT(b.y) FROM b WHERE b.x = a.x ) AS ys
FROM aSELECT a.*
FROM a
JOIN b ON b.x = a.x
GROUP BY a.id
⇒
SELECT a.*,
FROM a
WHERE EXISTS ( SELECT * FROM b WHERE b.x = a.x )CREATE TABLE XtoY (
# No surrogate id for this table
x_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to one table
y_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to the other table
# Include other fields specific to the 'relation'
PRIMARY KEY(x_id, y_id), -- When starting with X
INDEX (y_id, x_id) -- When starting with Y
) ENGINE=InnoDB;WHERE x = 1
AND MATCH (...) AGAINST (...)CREATE TABLE wp_postmeta (
meta_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
post_id BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
meta_key VARCHAR(255) DEFAULT NULL,
meta_value LONGTEXT,
PRIMARY KEY (meta_id),
KEY post_id (post_id),
KEY meta_key (meta_key)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;CREATE TABLE wp_postmeta (
post_id BIGINT UNSIGNED NOT NULL,
meta_key VARCHAR(255) NOT NULL,
meta_value LONGTEXT NOT NULL,
PRIMARY KEY(post_id, meta_key),
INDEX(meta_key)
) ENGINE=InnoDB;Datatype Bytes resolution
------------------ ----- --------------------------------
Deg*100 (SMALLINT) 4 1570 m 1.0 mi Cities
DECIMAL(4,2)/(5,2) 5 1570 m 1.0 mi Cities
SMALLINT scaled 4 682 m 0.4 mi Cities
Deg*10000 (MEDIUMINT) 6 16 m 52 ft Houses/Businesses
DECIMAL(6,4)/(7,4) 7 16 m 52 ft Houses/Businesses
MEDIUMINT scaled 6 2.7 m 8.8 ft
FLOAT 8 1.7 m 5.6 ft
DECIMAL(8,6)/(9,6) 9 16cm 1/2 ft Friends in a mall
Deg*10000000 (INT) 8 16mm 5/8 in Marbles
DOUBLE 16 3.5nm ... Fleas on a dogWHERE lat BETWEEN @my_lat - @dlat
AND @my_lat + @dlat -- PARTITION Pruning and bounding box
AND lon BETWEEN @my_lon - @dlon
AND @my_lon + @dlon -- first part of PK
AND condition -- filter out non-pizza parlorsDELIMITER //
DROP function IF EXISTS GCDist //
CREATE FUNCTION GCDist (
_lat1 DOUBLE, -- Scaled Degrees north for one point
_lon1 DOUBLE, -- Scaled Degrees west for one point
_lat2 DOUBLE, -- other point
_lon2 DOUBLE
) RETURNS DOUBLE
DETERMINISTIC
CONTAINS SQL -- SQL but does not read or write
SQL SECURITY INVOKER -- No special privileges granted
-- Input is a pair of latitudes/longitudes multiplied by 10000.
-- For example, the south pole has latitude -900000.
-- Multiply output by .0069172 to get miles between the two points
-- or by .0111325 to get kilometers
BEGIN
-- Hardcoded constant:
DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000; -- For scaled by 1e4 to MEDIUMINT
DECLARE _rlat1 DOUBLE DEFAULT _deg2rad * _lat1;
DECLARE _rlat2 DOUBLE DEFAULT _deg2rad * _lat2;
-- compute as if earth's radius = 1.0
DECLARE _rlond DOUBLE DEFAULT _deg2rad * (_lon1 - _lon2);
DECLARE _m DOUBLE DEFAULT COS(_rlat2);
DECLARE _x DOUBLE DEFAULT COS(_rlat1) - _m * COS(_rlond);
DECLARE _y DOUBLE DEFAULT _m * SIN(_rlond);
DECLARE _z DOUBLE DEFAULT SIN(_rlat1) - SIN(_rlat2);
DECLARE _n DOUBLE DEFAULT SQRT(
_x * _x +
_y * _y +
_z * _z );
RETURN 2 * ASIN(_n / 2) / _deg2rad; -- again--scaled degrees
END;
//
DELIMITER ;
DELIMITER //
-- FindNearest (about my 6th approach)
DROP PROCEDURE IF EXISTS FindNearest6 //
CREATE
PROCEDURE FindNearest (
IN _my_lat DOUBLE, -- Latitude of me [-90..90] (not scaled)
IN _my_lon DOUBLE, -- Longitude [-180..180]
IN _START_dist DOUBLE, -- Starting estimate of how far to search: miles or km
IN _max_dist DOUBLE, -- Limit how far to search: miles or km
IN _limit INT, -- How many items to try to get
IN _condition VARCHAR(1111) -- will be ANDed in a WHERE clause
)
DETERMINISTIC
BEGIN
-- lat and lng are in degrees -90..+90 and -180..+180
-- All computations done in Latitude degrees.
-- Thing to tailor
-- *Locations* -- the table
-- Scaling of lat, lon; here using *10000 in MEDIUMINT
-- Table name
-- miles versus km.
-- Hardcoded constant:
DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000; -- For scaled by 1e4 to MEDIUMINT
-- Cannot use params in PREPARE, so switch to @variables:
-- Hardcoded constant:
SET @my_lat := _my_lat * 10000,
@my_lon := _my_lon * 10000,
@deg2dist := 0.0069172, -- 69.172 for miles; 111.325 for km *** (mi vs km)
@start_deg := _start_dist / @deg2dist, -- Start with this radius first (eg, 15 miles)
@max_deg := _max_dist / @deg2dist,
@cutoff := @max_deg / SQRT(2), -- (slightly pessimistic)
@dlat := @start_deg, -- note: must stay positive
@lon2lat := COS(_deg2rad * @my_lat),
@iterations := 0; -- just debugging
-- Loop through, expanding search
-- Search a 'square', repeat with bigger square until find enough rows
-- If the inital probe found _limit rows, then probably the first
-- iteration here will find the desired data.
-- Hardcoded table name:
-- This is the "first SELECT":
SET @sql = CONCAT(
"SELECT COUNT(*) INTO @near_ct
FROM Locations
WHERE lat BETWEEN @my_lat - @dlat
AND @my_lat + @dlat -- PARTITION Pruning and bounding box
AND lon BETWEEN @my_lon - @dlon
AND @my_lon + @dlon -- first part of PK
AND ", _condition);
PREPARE _sql FROM @sql;
MainLoop: LOOP
SET @iterations := @iterations + 1;
-- The main probe: Search a 'square'
SET @dlon := ABS(@dlat / @lon2lat); -- good enough for now -- note: must stay positive
-- Hardcoded constants:
SET @dlon := IF(ABS(@my_lat) + @dlat >= 900000, 3600001, @dlon); -- near a Pole
EXECUTE _sql;
IF ( @near_ct >= _limit OR -- Found enough
@dlat >= @cutoff ) THEN -- Give up (too far)
LEAVE MainLoop;
END IF;
-- Expand 'square':
SET @dlat := LEAST(2 * @dlat, @cutoff); -- Double the radius to search
END LOOP MainLoop;
DEALLOCATE PREPARE _sql;
-- Out of loop because found _limit items, or going too far.
-- Expand range by about 1.4 (but not past _max_dist),
-- then fetch details on nearest 10.
-- Hardcoded constant:
SET @dlat := IF( @dlat >= @max_deg OR @dlon >= 1800000,
@max_deg,
GCDist(ABS(@my_lat), @my_lon,
ABS(@my_lat) - @dlat, @my_lon - @dlon) );
-- ABS: go toward equator to find farthest corner (also avoids poles)
-- Dateline: not a problem (see GCDist code)
-- Reach for longitude line at right angle:
-- sin(dlon)*cos(lat) = sin(dlat)
-- Hardcoded constant:
SET @dlon := IFNULL(ASIN(SIN(_deg2rad * @dlat) /
COS(_deg2rad * @my_lat))
/ _deg2rad -- precise
, 3600001); -- must be too near a pole
-- This is the "last SELECT":
-- Hardcoded constants:
IF (ABS(@my_lon) + @dlon < 1800000 OR -- Usual case - not crossing dateline
ABS(@my_lat) + @dlat < 900000) THEN -- crossing pole, so dateline not an issue
-- Hardcoded table name:
SET @sql = CONCAT(
"SELECT *,
@deg2dist * GCDist(@my_lat, @my_lon, lat, lon) AS dist
FROM Locations
WHERE lat BETWEEN @my_lat - @dlat
AND @my_lat + @dlat -- PARTITION Pruning and bounding box
AND lon BETWEEN @my_lon - @dlon
AND @my_lon + @dlon -- first part of PK
AND ", _condition, "
HAVING dist <= ", _max_dist, "
ORDER BY dist
LIMIT ", _limit
);
ELSE
-- Hardcoded constants and table name:
-- Circle crosses dateline, do two SELECTs, one for each side
SET @west_lon := IF(@my_lon < 0, @my_lon, @my_lon - 3600000);
SET @east_lon := @west_lon + 3600000;
-- One of those will be beyond +/- 180; this gets points beyond the dateline
SET @sql = CONCAT(
"( SELECT *,
@deg2dist * GCDist(@my_lat, @west_lon, lat, lon) AS dist
FROM Locations
WHERE lat BETWEEN @my_lat - @dlat
AND @my_lat + @dlat -- PARTITION Pruning and bounding box
AND lon BETWEEN @west_lon - @dlon
AND @west_lon + @dlon -- first part of PK
AND ", _condition, "
HAVING dist <= ", _max_dist, " )
UNION ALL
( SELECT *,
@deg2dist * GCDist(@my_lat, @east_lon, lat, lon) AS dist
FROM Locations
WHERE lat BETWEEN @my_lat - @dlat
AND @my_lat + @dlat -- PARTITION Pruning and bounding box
AND lon BETWEEN @east_lon - @dlon
AND @east_lon + @dlon -- first part of PK
AND ", _condition, "
HAVING dist <= ", _max_dist, " )
ORDER BY dist
LIMIT ", _limit
);
END IF;
PREPARE _sql FROM @sql;
EXECUTE _sql;
DEALLOCATE PREPARE _sql;
END;
//
DELIMITER ;
<<code>>
== Sample
Find the 5 cities with non-zero population (out of 3 million) nearest to (+35.15, -90.15). Start with a 10-mile bounding box and give up at 100 miles.
<<code>>
CALL FindNearestLL(35.15, -90.05, 10, 100, 5, 'population > 0');
+---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
| id | lat | lon | country | ascii_city | city | state | population | @gcd_ct := 0 | dist | @gcd_ct := @gcd_ct + 1 |
+---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
| 3023545 | 351494 | -900489 | us | memphis | Memphis | TN | 641608 | 0 | 0.07478733189367963 | 3 |
| 2917711 | 351464 | -901844 | us | west memphis | West Memphis | AR | 28065 | 0 | 7.605683607627499 | 2 |
| 2916457 | 352144 | -901964 | us | marion | Marion | AR | 9227 | 0 | 9.3994963998986 | 1 |
| 3020923 | 352044 | -898739 | us | bartlett | Bartlett | TN | 43264 | 0 | 10.643941157860604 | 7 |
| 2974644 | 349889 | -900125 | us | southaven | Southaven | MS | 38578 | 0 | 11.344042217329935 | 5 |
+---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
5 rows in set (0.00 sec)
Query OK, 0 rows affected (0.04 sec)
SELECT COUNT(*) FROM ll_table;
+----------+
| COUNT(*) |
+----------+
| 3173958 |
+----------+
1 row in set (5.04 sec)
FLUSH STATUS;
CALL...
SHOW SESSION STATUS LIKE 'Handler%';
SHOW session status LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_read_first | 1 |
| Handler_read_key | 3 |
| Handler_read_next | 1307 | -- some index, some tmp, but far less than 3 million.
| Handler_read_rnd | 5 |
| Handler_read_rnd_next | 13 |
| Handler_write | 12 | -- it needed a tmp
+----------------------------+-------+A 'standard' GUID/UUID is composed of the time, machine identification and some other stuff. The combination should be unique, even without coordination between different computers that could be generating UUIDs simultaneously.
The top part of the GUID/UUID is the bottom part of the current time. The top part is the primary part of what would be used for placing the value in an ordered list (INDEX). This cycles in about 7.16 minutes.
Some math... If the index is small enough to be cached in RAM, each insert into the index is CPU only, with the writes being delayed and batched. If the index is 20 times as big as can be cached, then 19 out of 20 inserts will be a cache miss. (This math applies to any "random" index.)
36 characters is bulky. If you are using that as a PRIMARY KEY in InnoDB and you have secondary keys, remember that each secondary key has an implicit copy of the PK, thereby making it bulky.
It is tempting to declare the UUID VARCHAR(36). And, since you probably are thinking globally, so you have CHARACTER SET utf8 (or utf8mb4). For utf8:
2 - Overhead for VAR
36 - chars
3 (or 4) bytes per character for utf8 (or utf8mb4) So, max length = 2+3*36 = 110 (or 146) bytes. For temp tables 108 (or 144) is actually used if a MEMORY table is used.
To compress
utf8 is unnecessary (ascii would do); but this is obviated by the next two steps
Toss dashes
UNHEX Now it will fit in 16 bytes: BINARY(16)
But first, a caveat. This solution only works for "Time based" / "Version 1" UUIDs They are recognizable by the "1" at the beginning of the third clump.
The manual's sample: 6ccd780c-baba-1026-9564-0040f4311e29 . A more current value (after a few years): 49ea2de3-17a2-11e2-8346-001eecac3efa . Notice how the 3rd part has slowly changed over time? Let's data is rearranged, thus:
Now we have a number that increases nicely over time. Multiple sources won't be quite in time order, but they will be close. The "hot" spot for inserting into an INDEX(uuid) will be rather narrow, thereby making it quite cacheable and efficient.
If your SELECTs tend to be for "recent" uuids, then they, too, will be easily cached. If, on the other hand, your SELECTs often reach for old uuids, they will be random and not well cached. Still, improving the INSERTs will help the system overall.
Let's make Stored Functions to do the messy work of the two actions:
Rearrange fields
Convert to/from BINARY(16)
Then you would do things like
Do not flip the WHERE; this will be inefficent because it won't use INDEX(uuid):
TokuDB has been deprecated by its upstream maintainer. It is disabled from and has been removed in MariaDB 10.6 - MDEV-19780. We recommend MyRocks as a long-term migration path.
TokuDB is a viable engine if you must have UUIDs (even non-type-1) in a huge table. TokuDB is available in MariaDB as a 'standard' engine, making the barrier to entry very low. There are a small number of differences between InnoDB and TokuDB; I will not go into them here.
Tokudb, with its “fractal” indexing strategy builds the indexes in stages. In contrast, InnoDB inserts index entries “immediately” — actually that indexing is buffered by most of the size of the buffer_pool. To elaborate…
When adding a record to an InnoDB table, here are (roughly) the steps performed to write the data (and PK) and secondary indexes to disk. (I leave out logging, provision for rollback, etc.) First the PRIMARY KEY and data:
Check for UNIQUEness constraints
Fetch the BTree block (normally 16KB) that should contain the row (based on the PRIMARY KEY).
Insert the row (overflow typically occurs 1% of the time; this leads to a block split).
Leave the page “dirty” in the buffer_pool, hoping that more rows are added before it is bumped out of cache (buffer_pool).. Note that for AUTO_INCREMENT and TIMESTAMP-based PKs, the “last” block in the data will be updated repeatedly before splitting; hence, this delayed write adds greatly to the efficiency. OTOH, a UUID will be very random; when the table is big enough, the block will almost always be flushed before a second insert occurs in that block. <– This is the inefficiency in UUIDs. Now for any secondary keys:
All the steps are the same, since an index is essentially a "table" except that the "data" is a copy of the PRIMARY KEY.
UNIQUEness must be checked immediately — cannot delay the read.
There are (I think) some other "delays" that avoid some I/O.
Tokudb, on the other hand, does something like
Write data/index partially sorted records to disk before finding out exactly where it belongs.
In the background, combine these partially digested blocks. Repeat as needed.
Eventually move the info into the real table/indexes.
If you are familiar with how sort-merge works, consider the parallels to Tokudb. Each "sort" does some work of ordering things; each "merge" is quite efficient.
To summarize:
In the extreme (data/index much larger than buffer_pool), InnoDB must read-modify-write one 16KB disk block for each UUID entry.
Tokudb makes each I/O "count" by merging several UUIDs for each disk block. (Yeah, Toku rereads blocks, but it comes out ahead in the long run.)
Tokudb excels when the table is really big, which implies high ingestion rate.
This shows three thing for speeding up usage of GUIDs/UUIDs:
Shrink footprint (Smaller -> more cacheable -> faster).
Rearrange uuid to make a "hot spot" to improve cachability.
Use TokuDB (MyRocks shares some architectural traits which may also be beneficial in handling UUIDs, but this is hypothetical and hasn't been tested)
Note that the benefit of the "hot spot" is only partial:
Chronologically ordered (or approximately ordered) INSERTs benefit; random ones don't.
SELECTs/UPDATEs by "recent" uuids benefit; old ones don't benefit.
Thanks to Trey for some of the ideas here.
The tips in this document apply to MySQL, MariaDB, and Percona.
Written Oct, 2012. Added TokuDB, Jan, 2015.
, but it seems to be backwards.
Rick James graciously allowed us to use this article in the documentation.
Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: uuid
This page is licensed: CC BY-SA / Gnu FDL
1026-baba-6ccd780c-9564-0040f4311e29
11e2-17a2-49ea2de3-8346-001eecac3efa
11e2-17ac-106762a5-8346-001eecac3efa -- after a few more minutesDELIMITER //
CREATE FUNCTION UuidToBin(_uuid BINARY(36))
RETURNS BINARY(16)
LANGUAGE SQL DETERMINISTIC CONTAINS SQL SQL SECURITY INVOKER
RETURN
UNHEX(CONCAT(
SUBSTR(_uuid, 15, 4),
SUBSTR(_uuid, 10, 4),
SUBSTR(_uuid, 1, 8),
SUBSTR(_uuid, 20, 4),
SUBSTR(_uuid, 25) ));
//
CREATE FUNCTION UuidFromBin(_bin BINARY(16))
RETURNS BINARY(36)
LANGUAGE SQL DETERMINISTIC CONTAINS SQL SQL SECURITY INVOKER
RETURN
LCASE(CONCAT_WS('-',
HEX(SUBSTR(_bin, 5, 4)),
HEX(SUBSTR(_bin, 3, 2)),
HEX(SUBSTR(_bin, 1, 2)),
HEX(SUBSTR(_bin, 9, 2)),
HEX(SUBSTR(_bin, 11))
));
//
DELIMITER ;-- Letting MySQL create the UUID:
INSERT INTO t (uuid, ...) VALUES (UuidToBin(UUID()), ...);
-- Creating the UUID elsewhere:
INSERT INTO t (uuid, ...) VALUES (UuidToBin(?), ...);
-- Retrieving (point query using uuid):
SELECT ... FROM t WHERE uuid = UuidToBin(?);
-- Retrieving (other):
SELECT UuidFromBin(uuid), ... FROM t ...;WHERE UuidFromBin(uuid) = '1026-baba-6ccd780c-9564-0040f4311e29' -- NOThe optimizer is largely cost-based and will try to choose the optimal plan for any query. However in some cases it does not have enough information to choose a perfect plan and in these cases you may have to provide hints to force the optimizer to use another plan.
You can examine the query plan for a SELECT by writing EXPLAIN before the statement. SHOW EXPLAIN shows the output of a running query. In some cases, its output can be closer to reality than EXPLAIN.
For the following queries, we will use the world database for the examples.
Download it from
Install it with:
or
You can force the join order by using either in the or part.
The simplest way to force the join order is to put the tables in the correct order in the FROM clause and use SELECT STRAIGHT_JOIN like so:
If you only want to force the join order for a few tables, useSTRAIGHT_JOIN in the FROM clause. When this is done, only tables connected with STRAIGHT_JOIN will have their order forced. For example:
In both of the above cases Country will be scanned first and for each matching country (one in this case) all rows in City will be checked for a match. As there is only one matching country this will be faster than the original query.
The output of for the above cases is:
This is one of the few cases where ALL is ok, as the scan of theCountry table will only find one matching row.
In some cases the optimizer may choose a non-optimal index or it may choose to not use an index at all, even if some index could theoretically be used.
In these cases you have the option to either tell the optimizer to only use a limited set of indexes, ignore one or more indexes, or force the usage of some particular index.
You can limit which indexes are considered with the option.
The default is 'FOR JOIN', which means that the hint only affects how theWHERE clause is optimized.
USE INDEX is used after the table name in the FROM clause.
Example:
This produces:
If we had not used , the Name index would have been inpossible keys.
You can tell the optimizer to not consider some particular index with the option.
This is used after the table name in the FROM clause:
This produces:
The benefit of using IGNORE_INDEX instead of USE_INDEX is that it will not disable a new index which you may add later.
Also see for an option to specify in the index definition that indexes should be ignored.
to be used is mostly useful when the optimizer decides to do a table scan even if you know that using an index would be better. (The optimizer could decide to do a table scan even if there is an available index when it believes that most or all rows will match and it can avoid the overhead of using the index).
This produces:
FORCE_INDEX works by only considering the given indexes (like withUSE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.
When using index hints (USE, FORCE or ), the index name value can also be an unambiguous prefix of an index name.
The optimizer will try to use indexes to resolve and .
You can use , and as in the WHERE clause above
to ensure that some specific index used:
This is used after the table name in the FROM clause.
Example:
This produces:
Without the option we would have 'Using where; Using temporary; Using filesort' in the 'Extra' column, which means that the optimizer would created a temporary table and sort it.
The optimizer uses several strategies to optimize and :
Resolve with an index:
Scan the table in index order and output data as we go. (This only works if the / can be resolved by an index after constant propagation is done).
Filesort:
Scan the table to be sorted and collect the sort keys in a temporary file.
A temporary table will always be used if the fields which will be sorted are not from the first table in the order.
Use a temporary table for :
Create a temporary table to hold the result with an index that matches the fields.
Produce a result row
If a row with the key exists in the temporary table, add the new result row to it. If not, create a new row.
Using an in-memory table (as described above) is usually the fastest option for if the result set is small. It is not optimal if the result set is very big. You can tell the optimizer this by usingSELECT SQL_SMALL_RESULT or SELECT SQL_BIG_RESULT.
For example:
produces:
while:
produces:
The difference is that with SQL_SMALL_RESULT a temporary table is used.
In some cases you may want to force the use of a temporary table for the result to free up the table/row locks for the used tables as quickly as possible.
You can do this with the SQL_BUFFER_RESULT option:
This produces:
Without SQL_BUFFER_RESULT, the above query would not use a temporary table for the result set.
In we added an which allows you to specify which algorithms will be considered when optimizing a query.
See the section for more information about the different algorithms which are used.
This page is licensed: CC BY-SA / Gnu FDL
NULL
NULL
239
Using where
1
SIMPLE
City
ALL
NULL
NULL
NULL
NULL
4079
Using where; Using join buffer (flat, BNL join)
3
const
14
Using where
3
const
14
Using where
35
NULL
4079
Using where
35
NULL
4079
Using where
Sort the keys + reference to row (with filesort)
Scan the table in sorted order
Before sending the results to the user, sort the rows with filesort to get the results in the GROUP BY order.
NULL
NULL
4079
Using temporary; Using filesort
NULL
NULL
4079
Using filesort
35
NULL
4079
Using index; Using temporary
1
SIMPLE
Country
ALL
PRIMARY
1
SIMPLE
City
ref
CountryCode
1
SIMPLE
City
ref
CountryCode
1
SIMPLE
City
range
Name
1
SIMPLE
City
index
NULL
1
SIMPLE
City
ALL
NULL
1
SIMPLE
City
ALL
NULL
1
SIMPLE
City
index
NULL
NULL
CountryCode
CountryCode
Name
Name
NULL
NULL
Name
mariadb-admin create world
zcat world.sql.gz | ../client/mysql worldmariadb-admin create world
gunzip world.sql.gz
../client/mysql world < world.sqlSELECT STRAIGHT_JOIN SUM(City.Population) FROM Country,City WHERE
City.CountryCode=Country.Code AND Country.HeadOfState="Volodymyr Zelenskyy";SELECT SUM(City.Population) FROM Country STRAIGHT_JOIN City WHERE
City.CountryCode=Country.Code AND Country.HeadOfState="Volodymyr Zelenskyy";USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City USE INDEX (CountryCode)
WHERE name="Helsingborg" AND countrycode="SWE";IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City IGNORE INDEX (Name)
WHERE name="Helsingborg" AND countrycode="SWE";CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,CountryCode FROM City FORCE INDEX (Name)
WHERE name>="A" AND CountryCode >="A";USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,COUNT(*) FROM City
FORCE INDEX FOR GROUP BY (Name)
WHERE population >= 10000000 GROUP BY Name;EXPLAIN SELECT SQL_SMALL_RESULT Name,Count(*) AS Cities
FROM City GROUP BY Name HAVING Cities > 2;EXPLAIN SELECT SQL_BIG_RESULT Name,Count(*) AS Cities
FROM City GROUP BY Name HAVING Cities > 2;CREATE INDEX Name ON City (Name);
EXPLAIN SELECT SQL_BUFFER_RESULT Name,COUNT(*) AS Cities FROM City
GROUP BY Name HAVING Cities > 2;Stopwords are used to provide a list of commonly-used words that can be ignored for the purposes of Full-text-indexes.
Full-text indexes built in MyISAM and InnoDB have different stopword lists by default.
For full-text indexes on MyISAM tables, by default, the list is built from the file storage/myisam/ft_static.c, and searched using the server's character set and collation. The ft_stopword_file system variable allows the default list to be overridden with words from another file, or for stopwords to be ignored altogether.
If the stopword list is changed, any existing full-text indexes need to be rebuilt
The following table shows the default list of stopwords, although you should always treat storage/myisam/ft_static.c as the definitive list. See the for more details, and for related articles.
Stopwords on full-text indexes are only enabled if the system variable is set (by default it is) at the time the index was created.
The stopword list is determined as follows:
If the system variable is set, that table is used as a stopword list.
If innodb_ft_user_stopword_table is not set, the table set by is used.
If neither variable is set, the built-in list is used, which can be viewed by querying the in the .
In the first two cases, the specified table must exist at the time the system variable is set and the full-text index created. It must be an InnoDB table with a single column, a named VALUE.
The default InnoDB stopword list differs from the default MyISAM list, being much shorter, and contains the following words:
This page is licensed: CC BY-SA / Gnu FDL
almost
alone
along
already
also
although
always
am
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anyways
anywhere
apart
appear
appreciate
appropriate
are
aren't
around
as
aside
ask
asking
associated
at
available
away
awfully
be
became
because
become
becomes
becoming
been
before
beforehand
behind
being
believe
below
beside
besides
best
better
between
beyond
both
brief
but
by
c'mon
c's
came
can
can't
cannot
cant
cause
causes
certain
certainly
changes
clearly
co
com
come
comes
concerning
consequently
consider
considering
contain
containing
contains
corresponding
could
couldn't
course
currently
definitely
described
despite
did
didn't
different
do
does
doesn't
doing
don't
done
down
downwards
during
each
edu
eg
eight
either
else
elsewhere
enough
entirely
especially
et
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
exactly
example
except
far
few
fifth
first
five
followed
following
follows
for
former
formerly
forth
four
from
further
furthermore
get
gets
getting
given
gives
go
goes
going
gone
got
gotten
greetings
had
hadn't
happens
hardly
has
hasn't
have
haven't
having
he
he's
hello
help
hence
her
here
here's
hereafter
hereby
herein
hereupon
hers
herself
hi
him
himself
his
hither
hopefully
how
howbeit
however
i'd
i'll
i'm
i've
ie
if
ignored
immediate
in
inasmuch
inc
indeed
indicate
indicated
indicates
inner
insofar
instead
into
inward
is
isn't
it
it'd
it'll
it's
its
itself
just
keep
keeps
kept
know
knows
known
last
lately
later
latter
latterly
least
less
lest
let
let's
like
liked
likely
little
look
looking
looks
ltd
mainly
many
may
maybe
me
mean
meanwhile
merely
might
more
moreover
most
mostly
much
must
my
myself
name
namely
nd
near
nearly
necessary
need
needs
neither
never
nevertheless
new
next
nine
no
nobody
non
none
noone
nor
normally
not
nothing
novel
now
nowhere
obviously
of
off
often
oh
ok
okay
old
on
once
one
ones
only
onto
or
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
overall
own
particular
particularly
per
perhaps
placed
please
plus
possible
presumably
probably
provides
que
quite
qv
rather
rd
re
really
reasonably
regarding
regardless
regards
relatively
respectively
right
said
same
saw
say
saying
says
second
secondly
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sensible
sent
serious
seriously
seven
several
shall
she
should
shouldn't
since
six
so
some
somebody
somehow
someone
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specified
specify
specifying
still
sub
such
sup
sure
t's
take
taken
tell
tends
th
than
thank
thanks
thanx
that
that's
thats
the
their
theirs
them
themselves
then
thence
there
there's
thereafter
thereby
therefore
therein
theres
thereupon
these
they
they'd
they'll
they're
they've
think
third
this
thorough
thoroughly
those
though
three
through
throughout
thru
thus
to
together
too
took
toward
towards
tried
tries
truly
try
trying
twice
two
un
under
unfortunately
unless
unlikely
until
unto
up
upon
us
use
used
useful
uses
using
usually
value
various
very
via
viz
vs
want
wants
was
wasn't
way
we
we'd
we'll
we're
we've
welcome
well
went
were
weren't
what
what's
whatever
when
whence
whenever
where
where's
whereafter
whereas
whereby
wherein
whereupon
wherever
whether
which
while
whither
who
who's
whoever
whole
whom
whose
why
will
willing
wish
with
within
without
won't
wonder
would
wouldn't
yes
yet
you
you'd
you'll
you're
you've
your
yours
yourself
yourselves
zero
is
it
la
of
on
or
that
the
this
to
was
what
when
where
who
will
with
und
the
www
a's
able
about
above
according
accordingly
across
actually
after
afterwards
again
against
ain't
all
allow
allows
a
about
an
are
as
at
be
by
com
de
en
for
from
how
i
in