All pages
Powered by GitBook
1 of 12

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Optimization and Indexes

Optimize MariaDB Server queries with indexes. This section covers index types, creation, and best practices for leveraging them to significantly improve query performance and data retrieval speed.

Building the best INDEX for a given SELECT

The problem

You have a SELECT and you want to build the best INDEX for it. This blog is a "cookbook" on how to do that task.

  • A short algorithm that works for many simpler SELECTs and helps in complex queries.

  • Examples of the algorithm, plus digressions into exceptions and variants

  • Finally a long list of "other cases".

The hope is that a newbie can quickly get up to speed, and his/her INDEXes will no longer smack of "newbie".

Many edge cases are explained, so even an expert may find something useful here.

Algorithm

Here's the way to approach creating an INDEX, given a SELECT. Follow the steps below, gathering columns to put in the INDEX in order. When the steps give out, you usually have the 'perfect' index.

  1. Given a WHERE with a bunch of expressions connected by AND: Include the columns (if any), in any order, that are compared to a constant and not hidden in a function.

  2. You get one more chance to add to the INDEX; do the first of these that applies:

  • 2a. One column used in a 'range' -- BETWEEN, '>', LIKE w/o leading wildcard, etc.

  • 2b. All columns, in order, of the GROUP BY.

  • 2c. All columns, in order, of the ORDER BY if there is no mixing of ASC and DESC.

Digression

This blog assumes you know the basic idea behind having an INDEX. Here is a refresher on some of the key points.

Virtually all INDEXes in MySQL are structured as BTrees BTrees allow very efficient for

  • Given a key, find the corresponding row(s);

  • "Range scans" -- That is start at one value for the key and repeatedly find the "next" (or "previous") row.

A PRIMARY KEY is a UNIQUE KEY; a UNIQUE KEY is an INDEX. ("KEY" == "INDEX".)

InnoDB "clusters" the PRIMARY KEY with the data. Hence, given the value of the PK ("PRIMARY KEY"), after drilling down the BTree to find the index entry, you have all the columns of the row when you get there. A "secondary key" (any UNIQUE or INDEX other than the PK) in InnoDB first drills down the BTree for the secondary index, where it finds a copy of the PK. Then it drills down the PK to find the row.

Every InnoDB table has a PRIMARY KEY. While there is a default if you do not specify one, it is best to explicitly provide a PK.

For completeness: MyISAM works differently. All indexes (including the PK) are in separate BTrees. The leaf node of such BTrees have a pointer (usually a byte offset) into the data file.

All discussion here assumes InnoDB tables, however most statements apply to other Engines.

First, some examples

Think of a list of names, sorted by last_name, then first_name. You have undoubtedly seen such lists, and they often have other information such as address and phone number. Suppose you wanted to look me up. If you remember my full name ('James' and 'Rick'), it is easy to find my entry. If you remembered only my last name ('James') and first initial ('R'). You would quickly zoom in on the Jameses and find the Rs in them. There, you might remember 'Rick' and ignore 'Ronald'. But, suppose you remembered my first name ('Rick') and only my last initial ('J'). Now you are in trouble. You would be scanning all the Js -- Jones, Rick; Johnson, Rick; Jamison, Rick; etc, etc. That's much less efficient.

Those equate to

Think about this example as I talk about "=" versus "range" in the Algorithm, below.

Algorithm, step 1 (WHERE "column = const")

  • WHERE aaa = 123 AND ... : an INDEX starting with aaa is good.

  • WHERE aaa = 123 AND bbb = 456 AND ... : an INDEX starting with aaa and bbb is good. In this case, it does not matter whether aaa or bbb comes first in the INDEX.

  • xxx IS NULL : this acts like "= const" for this discussion.

Note that the expression must be of the form of column_name = (constant). These do not apply to this step in the Algorithm: DATE(dt) = '...', LOWER(s) = '...', CAST(s ...) = '...', x='...' COLLATE...

(If there are no "=" parts AND'd in the WHERE clause, move on to step 2 without any columns in your putative INDEX.)

Algorithm, step 2

Find the first of 2a / 2b / 2c that applies; use it; then quit. If none apply, then you are through gathering columns for the index.

In some cases it is optimal to do step 1 (all equals) plus step 2c (ORDER BY).

Algorithm, step 2a (one range)

A "range" shows up as

  • aaa >= 123 -- any of <, <=, >=, >; but not <>, !=

  • aaa BETWEEN 22 AND 44

  • sss LIKE 'blah%' -- but not sss LIKE '%blah'

If there are more parts to the WHERE clause, you must stop now.

Complete examples (assume nothing else comes after the snippet)

  • WHERE aaa >= 123 AND bbb = 1 ⇒ INDEX(bbb, aaa) (WHERE order does not matter; INDEX order does)

  • WHERE aaa >= 123 ⇒ INDEX(aaa)

  • WHERE aaa >= 123 AND ccc > 'xyz' ⇒ INDEX(aaa) or INDEX(ccc) (only one range)

Algorithm, step 2b (GROUP BY)

If there is a GROUP BY, all the columns of the GROUP BY should now be added, in the specified order, to the INDEX you are building. (I do not know what happens if one of the columns is already in the INDEX.)

If you are GROUPing BY an expression (including function calls), you cannot use the GROUP BY; stop.

Complete examples (assume nothing else comes after the snippet)

  • WHERE aaa = 123 AND bbb = 1 GROUP BY ccc ⇒ INDEX(bbb, aaa, ccc) or INDEX(aaa, bbb, ccc) (='s first, in any order; then the GROUP BY)

  • WHERE aaa >= 123 GROUP BY xxx ⇒ INDEX(aaa) (You should have stopped with Step 2a)

  • GROUP BY x,y ⇒ INDEX(x,y) (no WHERE)

Algorithm, step 2c (ORDER BY)

If there is a ORDER BY, all the columns of the ORDER BY should now be added, in the specified order, to the INDEX you are building.

If there are multiple columns in the ORDER BY, and there is a mixture of ASC and DESC, do not add the ORDER BY columns; they won't help; stop.

If you are ORDERing BY an expression (including function calls), you cannot use the ORDER BY; stop.

Complete examples (assume nothing else comes after the snippet)

  • WHERE aaa = 123 GROUP BY ccc ORDER BY ddd ⇒ INDEX(aaa, ccc) -- should have stopped with Step 2b

  • WHERE aaa = 123 GROUP BY ccc ORDER BY ccc ⇒ INDEX(aaa, ccc) -- the ccc will be used for both GROUP BY and ORDER BY

  • WHERE aaa = 123 ORDER BY xxx ASC, yyy DESC ⇒ INDEX(aaa) -- mixture of ASC and DESC.

The following are especially good. Normally a LIMIT cannot be applied until after lots of rows are gathered and then sorted according to the ORDER BY. But, if the INDEX gets all they way through the ORDER BY, only (OFFSET + LIMIT) rows need to be gathered. So, in these cases, you win the lottery with your new index:

  • WHERE aaa = 123 GROUP BY ccc ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)

  • WHERE aaa = 123 ORDER BY ccc LIMIT 10 ⇒ INDEX(aaa, ccc)

  • ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc)

(It does not make much sense to have a LIMIT without an ORDER BY, so I do not discuss that case.)

Algorithm end

You have collected a few columns; put them in INDEX and ADD that to the table. That will often produce a "good" index for the SELECT you have. Below are some other suggestions that may be relevant.

An example of the Algorithm being 'wrong':

This would (according to the Algorithm) call for INDEX(flag). However, indexing a column that has two (or a small number of) values is almost always useless. This is called 'low cardinality'. The Optimizer would prefer to do a table scan than to bounce between the index BTree and the data.

On the other hand, the Algorithm is 'right' with

That would call for a compound index starting with a flag: INDEX(flag, date). Such an index is likely to be very beneficial. And it is likely to be more beneficial than INDEX(date).

If your resulting INDEX include column(s) that are likely to be UPDATEd, note that the UPDATE will have extra work to remove a 'row' from one place in the INDEX's BTree and insert a 'row' back into the BTree. For example:

There are too many variables to say whether it is better to keep the index or to toss it.

In this case, shortening the index may be beneficial:

Changing to INDEX(z) would make for less work for the UPDATE, but might hurt some SELECT. It depends on the frequency of each, plus many more factors.

Limitations

(There are exceptions to some of these.)

  • You may not create an index bigger than 3KB.

  • You may not include a column that equates to bigger than some value (767 bytes -- VARCHAR(255) CHARACTER SET utf8).

  • You can deal with big fields using "prefix" indexing; but see below.

  • You should not have more than 5 columns in an index. (This is just a Rule of Thumb; nothing prevents having more.)

Flags and low cardinality

INDEX(flag) is almost never useful if flag has very few values. More specifically, when you say WHERE flag = 1 and "1" occurs more than 20% of the time, such an index will be shunned. The Optimizer would prefer to scan the table instead of bouncing back and forth between the index and the data for more than 20% of the rows.

("20%" is really somewhere between 10% and 30%, depending on the phase of the moon.)

"Covering" indexes

A "Covering" index is an index that contains all the columns in the SELECT. It is special in that the SELECT can be completed by looking only at the INDEX BTree. (Since InnoDB's PRIMARY KEY is clustered with the data, "covering" is of no benefit when considering at the PRIMARY KEY.)

Mini-cookbook:

  1. Gather the list of column(s) according to the "Algorithm", above.

  2. Add to the end of the list the rest of the columns seen in the SELECT, in any order.

Examples:

  • SELECT x FROM t WHERE y = 5; ⇒ INDEX(y,x) -- The algorithm said just INDEX(y)

  • SELECT x,z FROM t WHERE y = 5 AND q = 7; ⇒ INDEX(y,q,x,z) -- y and q in either order (Algorithm), then x and z in either order (covering).

  • SELECT x FROM t WHERE y > 5 AND q > 7; ⇒ INDEX(y,q,x) -- y or q first (that's as far as the Algorithm goes), then the other two fields afterwards.

The speedup you get might be minor, or it might be spectacular; it is hard to predict.

But...

  • It is not wise to build an index with lots of columns. Let's cut it off at 5 (Rule of Thumb).

  • Prefix indexes cannot 'cover', so don't use them anywhere in a 'covering' index.

  • There are limits (3KB?) on how 'wide' an index can be, so "covering" may not be possible.

Redundant/excessive indexes

INDEX(a,b) can find anything that INDEX(a) could find. So you don't need both. Get rid of the shorter one.

If you have lots of SELECTs and they generate lots of INDEXes, this may cause a different problem. Each index must be updated (sooner or later) for each INSERT. More indexes ⇒ slower INSERTs. Limit the number of indexes on a table to about 6 (Rule of Thumb).

Notice in the cookbook how it says "in any order" in a few places. If, for example, you have both of these (in different SELECTs):

  • WHERE a=1 AND b=2 begs for either INDEX(a,b) or INDEX(b,a)

  • WHERE a>1 AND b=2 begs only for INDEX(b,a) Include only INDEX(b,a) since it handles both cases with only one INDEX.

Suppose you have a lot of indexes, including (a,b,c,dd) and (a,b,c,ee). Those are getting rather long. Consider either picking one of them, or having simply (a,b,c). Sometimes the selectivity of (a,b,c) is so good that tacking on 'dd' or 'ee' does make enough difference to matter.

Optimizer picks ORDER BY

The main cookbook skips over an important optimization that is sometimes used. The optimizer will sometimes ignore the WHERE and, instead, use an INDEX that matches the ORDER BY. This, of course, needs to be a perfect match -- all columns, in the same order. And all ASC or all DESC.

This becomes especially beneficial if there is a LIMIT.

But there is a problem. There could be two situations, and the Optimizer is sometimes not smart enough to see which case applies:

  • If the WHERE does very little filtering, fetching the rows in ORDER BY order avoids a sort and has little wasted effort (because of 'little filtering'). Using the INDEX matching the ORDER BY is better in this case.

  • If the WHERE does a lot of filtering, the ORDER BY is wasting a lot of time fetching rows only to filter them out. Using an INDEX matching the WHERE clause is better.

What should you do? If you think the "little filtering" is likely, then create an index with the ORDER BY columns in order and hope that the Optimizer uses it when it should.

OR

Cases...

  • WHERE a=1 OR a=2 -- This is turned into WHERE a IN (1,2) and optimized that way.

  • WHERE a=1 OR b=2 usually cannot be optimized.

  • WHERE x.a=1 OR y.b=2 This is even worse because of using two different tables.

A workaround is to use UNION. Each part of the UNION is optimized separately. For the second case:

Now the query can take good advantage of two different indexes. Note: "Index merge" might kick in on the original query, but it is not necessarily any faster than the UNION. Sister blog on compound indexes, including 'Index Merge'

The third case (OR across 2 tables) is similar to the second.

If you originally had a LIMIT, UNION gets complicated. If you started with ORDER BY z LIMIT 190, 10, then the UNION needs to be

TEXT / BLOB

You cannot directly index a TEXT or BLOB or large VARCHAR or large BINARY column. However, you can use a "prefix" index: INDEX(foo(20)). This says to index the first 20 characters of foo. But... It is rarely useful.

Example of a prefix index:

The index for me would contain 'Ja', 'Rick'. That's not useful for distinguishing between 'Jamison', 'Jackson', 'James', etc., so the index is so close to useless that the optimizer often ignores it.

Probably never do UNIQUE(foo(20)) because this applies a uniqueness constraint on the first 20 characters of the column, not the whole column!

Dates

DATE, DATETIME, etc. are tricky to compare against.

Some tempting, but inefficient, techniques:

date_col LIKE '2016-01%' -- must convert date_col to a string, so acts like a functionLEFT(date_col, 4) = '2016-01' -- hiding the column in functionDATE(date_col) = 2016 -- hiding the column in function

All must do a full scan. (On the other hand, it can handy to use GROUP BY LEFT(date_col, 7) for monthly grouping, but that is not an INDEX issue.)

This is efficient, and can use an index:

This case works because both right-hand values are converted to constants, then it is a "range". I like the design pattern with INTERVAL because it avoids computing the last day of the month. And it avoids tacking on '23:59:59', which is wrong if you have microsecond times. (And other cases.)

EXPLAIN Key_len

Perform EXPLAIN SELECT... (and EXPLAIN FORMAT=JSON SELECT... if you have 5.6.5). Look at the Key that it chose, and the Key_len. From those you can deduce how many columns of the index are being used for filtering. (JSON makes it easier to get the answer.) From that you can decide whether it is using as much of the INDEX as you thought. Caveat: Key_len only covers the WHERE part of the action; the non-JSON output won't easily say whether GROUP BY or ORDER BY was handled by the index.

IN

IN (1,99,3) is sometimes optimized as efficiently as "=", but not always. Older versions of MySQL did not optimize it as well as newer versions. (5.6 is possibly the main turning point.)

IN ( SELECT ... )

From version 4.1 through 5.5, IN ( SELECT ... ) was very poorly optimized. The SELECT was effectively re-evaluated every time. Often it can be transformed into a JOIN, which works much faster. Heres is a pattern to follow:

The SELECT expressions will need "a." prefixing the column names.

Alas, there are cases where the pattern is hard to follow.

5.6 does some optimizing, but probably not as good as the JOIN.

If there is a JOIN or GROUP BY or ORDER BY LIMIT in the subquery, that complicates the JOIN in new format. So, it might be better to use this pattern:

Caveat: If you end up with two subqueries JOINed together, note that neither has any indexes, hence performance can be very bad. (5.6 improves on it by dynamically creating indexes for subqueries.)

There is work going on in MariaDB and Oracle 5.7, in relation to "NOT IN", "NOT EXISTS", and "LEFT JOIN..IS NULL"; here is an old discussion on the topic So, what I say here may not be the final word.

Explode/Implode

When you have a JOIN and a GROUP BY, you may have the situation where the JOIN exploded more rows than the original query (due to many:many), but you wanted only one row from the original table, so you added the GROUP BY to implode back to the desired set of rows.

This explode + implode, itself, is costly. It would be better to avoid them if possible.

Sometimes the following will work.

Using DISTINCT or GROUP BY to counteract the explosion

When using second table just to check for existence:

Many-to-many mapping table

Do it this way.

Notes:

  • Lack of an AUTO_INCREMENT id for this table -- The PK given is the 'natural' PK; there is no good reason for a surrogate.

  • "MEDIUMINT" -- This is a reminder that all INTs should be made as small as is safe (smaller ⇒ faster). Of course the declaration here must match the definition in the table being linked to.

  • "UNSIGNED" -- Nearly all INTs may as well be declared non-negative

  • "NOT NULL" -- Well, that's true, isn't it?

To conditionally INSERT new links, use

Note that if you had an AUTO_INCREMENT in this table, IODKU would "burn" ids quite rapidly.

Subqueries and UNIONs

Each subquery SELECT and each SELECT in a UNION can be considered separately for finding the optimal INDEX.

Exception: In a "correlated" ("dependent") subquery, the part of the WHERE that depends on the outside table is not easily factored into the INDEX generation. (Cop out!)

JOINs

The first step is to decide what order the optimizer will go through the tables. If you cannot figure it out, then you may need to be pessimistic and create two indexes for each table -- one assuming the table will be used first, one assiming that it will come later in the table order.

The optimizer usually starts with one table and extracts the data needed from it. As it finds a useful (that is, matches the WHERE clause, if any) row, it reaches into the 'next' table. This is called NLJ ("Nested Loop Join"). The process of filtering and reaching to the next table continues through the rest of the tables.

The optimizer usually picks the "first" table based on these hints:

  • STRAIGHT_JOIN forces the table order.

  • The WHERE clause limits which rows needed (whether indexed or not).

  • The table to the "left" in a LEFT JOIN usually comes before the "right" table. (By looking at the table definitions, the optimizer may decide that "LEFT" is irrelevant.)

  • The current INDEXes will encourage an order.

Running EXPLAIN tells you the table order that the Optimizer is very likely to use today. After adding a new INDEX, the optimizer may pick a different table order. You should anticipate the order changing, guess at what order makes the most sense, and build the INDEXes accordingly. Then rerun EXPLAIN to see if the Optimizer's brain was on the same wavelength you were on.

You should build the INDEX for the "first" table based on any parts of the WHERE, GROUP BY, and ORDER BY clauses that are relevant to it. If a GROUP/ORDER BY mentions a different table, you should ignore that clause.

The second (and subsequent) table will be reached into based on the ON clause. (Instead of using commajoin, please write JOINs with the JOIN keyword and ON clause!) In addition, there could be parts of the WHERE clause that are relevant. GROUP/ORDER BY are not to be considered in writing the optimal INDEX for subsequent tables.

PARTITIONing

PARTITIONing is rarely a substitute for a good INDEX.

PARTITION BY RANGE is a technique that is sometimes useful when indexing fails to be good enough. In a two-dimensional situation such as nearness in a geographical sense, one dimension can partially be handled by partition pruning; then the other dimension can be handled by a regular index (preferrably the PRIMARY KEY). This goes into more detail: .

FULLTEXT

FULLTEXT is now implemented in InnoDB as well as MyISAM. It provides a way to search for "words" in TEXT columns. This is much faster (when it is applicable) than col LIKE '%word%'.

always(?) uses the FULLTEXT index first. That is, the whole Algorithm is invalidated when one of the ANDs is a MATCH.

Signs of a Newbie

  • No "compound" (aka "composite") indexes

  • No PRIMARY KEY

  • Redundant indexes (especially blatant is PRIMARY KEY(id), KEY(id))

  • Most or all columns individually indexes ("But I indexed everything")

Speeding up wp_postmeta

The published table (see Wikipedia) is

The problems:

  • The AUTO_INCREMENT provides no benefit; in fact it slows down most queries and clutters disk.

  • Much better is PRIMARY KEY(post_id, meta_key) -- clustered, handles both parts of usual JOIN.

  • BIGINT is overkill, but that can't be fixed without changing other tables.

  • VARCHAR(255) can be a problem in 5.6 with utf8mb4; see workarounds below.

The solutions:

Postlog

Initial posting: March, 2015; Refreshed Feb, 2016; Add DATE June, 2016; Add WP example May, 2017.

The tips in this document apply to MySQL, MariaDB, and Percona.

See also

  • Some info in the MySQL manual:

  • A short, but complicated,

This blog is the consolidation of a Percona tutorial I gave in 2013, plus many years of experience in fixing thousands of slow queries on hundreds of systems. I apologize that this does not tell you how create INDEXes for all SELECTs. Some are just too complex.

Rick James graciously allowed us to use this article in the documentation.

has other useful tips, how-tos, optimizations, and debugging tips.

Original source:

This page is licensed: CC BY-SA / Gnu FDL

WHERE t1.aa = 123 AND t2.bb = 456 -- You must only consider columns in the current table.

xxx IS NOT NULL Add the column in the range to your putative INDEX.

WHERE aaa >= 123 ORDER BY aaa ⇒ INDEX(aaa) -- Bonus: The ORDER BY will use the INDEX.
  • WHERE aaa >= 123 ORDER BY aaa ⇒ INDEX(aaa) DESC -- Same Bonus.

  • WHERE aaa = 123 GROUP BY xxx, (a+b) ⇒ INDEX(aaa) -- expression in GROUP BY, so no use including even xxx.

    WHERE ccc > 432 ORDER BY ccc LIMIT 10 ⇒ INDEX(ccc) -- This "range" is compatible with ORDER BY

    You should not have redundant indexes. (See below.)

    "InnoDB" -- More effecient than MyISAM because of the way the PRIMARY KEY is clustered with the data in InnoDB.

  • "INDEX(y_id, x_id)" -- The PRIMARY KEY makes it efficient to go one direction; this index makes the other direction efficient. No need to say UNIQUE; that would be extra effort on INSERTs.

  • In the secondary index, saying justINDEX(y_id) would work because it would implicit include x_id. But I would rather make it more obvious that I am hoping for a 'covering' index.

  • etc.

    "Commajoin" -- That is FROM a, b WHERE a.x=b.x instead of FROM a JOIN b ON a.x=b.x

    When would meta_key or meta_value ever be NULL?

    Some discussion of JOIN

  • Indexing 101: Optimizing MySQL queries on a single table (Stephane Combaudon - Percona)

  • A complex query, well explained.

  • More on prefix indexing
    Another variant
    IODKU
    Find nearest 10 pizza parlors
    Percona 2015 Tutorial Slides
    ORDER BY Optimization
    example
    MySQL manual page on range accesses in composite indexes
    Rick James' site
    random
    INDEX(last_name, first_name) -- the order of the list.
        WHERE last_name = 'James' AND first_name = 'Rick'  -- best case
        WHERE last_name = 'James' AND first_name LIKE 'R%' -- pretty good
        WHERE last_name LIKE 'J%' AND first_name = 'Rick'  -- pretty bad
    SELECT ... FROM t WHERE flag = true;
    SELECT ... FROM t WHERE flag = true AND date >= '2015-01-01';
    INDEX(x)
    UPDATE t SET x = ... WHERE ...;
    INDEX(z, x)
    UPDATE t SET x = ... WHERE ...;
    ( SELECT ... WHERE a=1 )   -- and have INDEX(a)
       UNION DISTINCT -- "DISTINCT" is assuming you need to get rid of dups
       ( SELECT ... WHERE b=2 )   -- and have INDEX(b)
       GROUP BY ... ORDER BY ...  -- whatever you had at the end of the original query
    ( SELECT ... LIMIT 200 )   -- Note: OFFSET 0, LIMIT 190+10
       UNION DISTINCT -- (or ALL)
       ( SELECT ... LIMIT 200 )
       LIMIT 190, 10              -- Same as originally
    INDEX(last_name(2), first_name)
    date_col >= '2016-01-01'
        AND date_col  < '2016-01-01' + INTERVAL 3 MONTH
    SELECT  ...
        FROM  a
        WHERE  test_a
          AND  x IN (
            SELECT  x
                FROM  b
                WHERE  test_b
                    );
    ⇒
    SELECT  ...
        FROM  a
        JOIN  b USING(x)
        WHERE  test_a
          AND  test_b;
    SELECT  ...
        FROM  a
        WHERE  test_a
          AND  x IN ( SELECT  x  FROM ... );
    ⇒
    SELECT  ...
        FROM  a
        JOIN        ( SELECT  x  FROM ... ) b
            USING(x)
        WHERE  test_a;
    SELECT  DISTINCT
            a.*,
            b.y
        FROM a
        JOIN b
    ⇒
    SELECT  a.*,
            ( SELECT GROUP_CONCAT(b.y) FROM b WHERE b.x = a.x ) AS ys
        FROM a
    SELECT  a.*
        FROM a
        JOIN b  ON b.x = a.x
        GROUP BY a.id
    ⇒
    SELECT  a.*,
        FROM a
        WHERE EXISTS ( SELECT *  FROM b  WHERE b.x = a.x )
    CREATE TABLE XtoY (
            # No surrogate id for this table
            x_id MEDIUMINT UNSIGNED NOT NULL,   -- For JOINing to one table
            y_id MEDIUMINT UNSIGNED NOT NULL,   -- For JOINing to the other table
            # Include other fields specific to the 'relation'
            PRIMARY KEY(x_id, y_id),            -- When starting with X
            INDEX      (y_id, x_id)             -- When starting with Y
        ) ENGINE=InnoDB;
    WHERE x = 1
          AND MATCH (...) AGAINST (...)
    CREATE TABLE wp_postmeta (
          meta_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
          post_id BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
          meta_key VARCHAR(255) DEFAULT NULL,
          meta_value LONGTEXT,
          PRIMARY KEY (meta_id),
          KEY post_id (post_id),
          KEY meta_key (meta_key)
        ) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
    CREATE TABLE wp_postmeta (
            post_id BIGINT UNSIGNED NOT NULL,
            meta_key VARCHAR(255) NOT NULL,
            meta_value LONGTEXT NOT NULL,
            PRIMARY KEY(post_id, meta_key),
            INDEX(meta_key)
            ) ENGINE=InnoDB;

    Full-Text Indexes

    Implement full-text indexes in MariaDB Server for efficient text search. This section guides you through creating and utilizing these indexes to optimize queries on large text datasets.

    Primary Keys with Nullable Columns

    MariaDB deals with primary keys over nullable columns according to the SQL standards.

    Take the following table structure:

    CREATE TABLE t1(
      c1 INT NOT NULL AUTO_INCREMENT, 
      c2 INT NULL DEFAULT NULL, 
      PRIMARY KEY(c1,c2)
    );

    Column c2 is part of a primary key, and thus it cannot be NULL.

    Before , MariaDB (as well as versions of MySQL before MySQL 5.7) would silently convert it into a NOT NULL column with a default value of 0.

    Since , the column is converted to NOT NULL, but without a default value. If we then attempt to insert a record without explicitly setting c2, a warning (or, in strict mode, an error), will be thrown, for example:

    MySQL, since 5.7, will abort such a CREATE TABLE with an error.

    The behavior adheres to the SQL 2003 standard.

    SQL-2003, Part II, “Foundation” says:

    **11.7 **Syntax Rules

    …

    5) If the specifies PRIMARY KEY, then for each in the explicit or implicit for which NOT NULL is not specified, NOT NULL is implicit in the .

    Essentially this means that all PRIMARY KEY columns are automatically converted to NOT NULL. Furthermore:

    11.5 General Rules

    …

    3) When a site S is set to its default value,

    …

    b) If the data descriptor for the site includes a , then S is set to the value specified by that .

    …

    e) Otherwise, S is set to the null value.

    There is no concept of “no default value” in the standard. Instead, a column always has an implicit default value of NULL. On insertion it might however fail the NOT NULL constraint. MariaDB and MySQL instead mark such a column as “not having a default value”. The end result is the same — a value must be specified explicitly or an INSERT will fail.

    MariaDB since 10.1.7 behaves in a standard compatible manner — being part of a PRIMARY KEY, the nullable column gets an automatic NOT NULL constraint, on insertion one must specify a value for such a column. MariaDB before 10.1.7 was automatically assigning a default value of 0 — this behavior was non-standard. Issuing an error at CREATE TABLE time is also non-standard.

    See Also

    • describes an edge-case that may result in replication problems when replicating from a master server before this change to a slave server after this change.

    This page is licensed: CC BY-SA / Gnu FDL

    INSERT INTO t1() VALUES();
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Warning (Code 1364): Field 'c2' doesn't have a default value
    
    SELECT * FROM t1;
    +----+----+
    | c1 | c2 |
    +----+----+
    |  1 |  0 |
    +----+----+
    MDEV-12248

    Storage Engine Index Types

    This refers to the index_type definition when creating an index, i.e. BTREE, HASH or RTREE.

    For more information on general types of indexes, such as primary keys, unique indexes etc, go to Getting Started with Indexes.

    Storage Engine
    Permitted Indexes

    BTREE, RTREE

    BTREE, RTREE

    BTREE

    BTREE is generally the default index type. For tables, HASH is the default. uses a particular data structure called fractal trees, which is optimized for data that do not entirely fit memory.

    Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes. B-Tree Index Characteristics

    B-tree Indexes

    B-tree indexes are used for column comparisons using the >, >=, =, >=, < or BETWEEN operators, as well as for LIKE comparisons that begin with a constant.

    For example, the query SELECT * FROM Employees WHERE First_Name LIKE 'Maria%'; can make use of a B-tree index, while SELECT * FROM Employees WHERE First_Name LIKE '%aria'; cannot.

    B-tree indexes also permit leftmost prefixing for searching of rows.

    If the number or rows doesn't change, hash indexes occupy a fixed amount of memory, which is lower than the memory occupied by BTREE indexes.

    Hash Indexes

    Hash indexes, in contrast, can only be used for equality comparisons, so those using the = or <=> operators. They cannot be used for ordering, and provide no information to the optimizer on how many rows exist between two values.

    Hash indexes do not permit leftmost prefixing - only the whole index can be used.

    R-tree Indexes

    See for more information.

    This page is licensed: CC BY-SA / Gnu FDL

    Index Statistics

    Index statistics provide crucial insights to the MariaDB query optimizer, guiding it in executing queries efficiently. Up-to-date index statistics ensure optimized query performance.

    How Index Statistics Help the Query Optimizer

    Understanding index statistics is crucial for the MariaDB query optimizer to efficiently execute queries. Accurate and current statistics guide the optimizer in choosing the best way to access data, similar to using a personal address book for quicker searches rather than a larger phone book. Up-to-date index statistics ensure optimized query performance.

    Ignored Indexes

    Ignored indexes allow indexes to be visible and maintained without being used by the optimizer. This feature is comparable to MySQL 8’s "invisible indexes."

    This feature is available from MariaDB 10.6.

    Ignored indexes are indexes that are visible and maintained, but which are not used by the optimizer. MySQL 8 has a similar feature which they call "invisible indexes".

    Syntax

    HASH, BTREE

    MEMORY
    TokuDB
    SPATIAL
    Aria
    MyISAM
    InnoDB
    MEMORY/HEAP
    By default, an index is not ignored. One can mark existing index as ignored (or not ignored) with an ALTER TABLE statement:

    It is also possible to specify IGNORED attribute when creating an index with a CREATE TABLE, or CREATE INDEX statement:

    table's primary key cannot be ignored. This applies to both explicitly defined primary key, as well as implicit primary key - if there is no explicit primary key defined but the table has a unique key containing only NOT NULL columns, the first of such keys becomes the implicitly defined primary key.

    Handling for ignored indexes

    The optimizer will treats ignored indexes as if they didn't exist. They will not be used in the query plans, or as a source of statistical information. Also, an attempt to use an ignored index in a USE INDEX, FORCE INDEX, or IGNORE INDEX hint will result in an error - the same what would have if one used a name of a non-existent index.

    Information about whether or not indexes are ignored can be viewed in the IGNORED column in the Information Schema STATISTICS table or the SHOW INDEX statement.

    Intended Usage

    The primary use case is as follows: a DBA sees an index that seems to have little or no usage and considers whether to remove it. Dropping the index is a risk as it may still be needed in a few cases. For example, the optimizer may rely on the estimates provided by the index without using the index in query plans. If dropping an index causes an issue, it will take a while to re-create the index. On the other hand, marking the index as ignored (or not ignored) is instant, so the suggested workflow is:

    1. Mark the index as ignored

    2. Check if everything continues to work

    3. If not, mark the index as not ignored.

    4. If everything continues to work, one can safely drop the index.

    Examples

    The optimizer does not make use of an index when it is ignored, while if the index is not ignored (the default), the optimizer will consider it in the optimizer plan, as shown in the EXPLAIN output.

    This page is licensed: CC BY-SA / Gnu FDL

    ALTER TABLE table_name ALTER {KEY|INDEX} [IF EXISTS] key_name [NOT] IGNORED;
    CREATE TABLE table_name (
      ...
      INDEX index_name ( ...) [NOT] IGNORED
      ...
    CREATE INDEX index_name (...) [NOT] IGNORED ON tbl_name (...);
    CREATE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b) IGNORED);
    CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b));
    ALTER TABLE t1 ALTER INDEX k1 IGNORED;
    CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT);
    CREATE INDEX k1 ON t1(b) IGNORED;
    SELECT * FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_NAME = 't1'\G
    *************************** 1. row ***************************
    TABLE_CATALOG: def
     TABLE_SCHEMA: test
       TABLE_NAME: t1
       NON_UNIQUE: 0
     INDEX_SCHEMA: test
       INDEX_NAME: PRIMARY
     SEQ_IN_INDEX: 1
      COLUMN_NAME: id
        COLLATION: A
      CARDINALITY: 0
         SUB_PART: NULL
           PACKED: NULL
         NULLABLE: 
       INDEX_TYPE: BTREE
          COMMENT: 
    INDEX_COMMENT: 
          IGNORED: NO
    *************************** 2. row ***************************
    TABLE_CATALOG: def
     TABLE_SCHEMA: test
       TABLE_NAME: t1
       NON_UNIQUE: 1
     INDEX_SCHEMA: test
       INDEX_NAME: k1
     SEQ_IN_INDEX: 1
      COLUMN_NAME: b
        COLLATION: A
      CARDINALITY: 0
         SUB_PART: NULL
           PACKED: NULL
         NULLABLE: YES
       INDEX_TYPE: BTREE
          COMMENT: 
    INDEX_COMMENT: 
          IGNORED: YES
    SHOW INDEXES FROM t1\G
    *************************** 1. row ***************************
            Table: t1
       Non_unique: 0
         Key_name: PRIMARY
     Seq_in_index: 1
      Column_name: id
        Collation: A
      Cardinality: 0
         Sub_part: NULL
           Packed: NULL
             Null: 
       Index_type: BTREE
          Comment: 
    Index_comment: 
          Ignored: NO
    *************************** 2. row ***************************
            Table: t1
       Non_unique: 1
         Key_name: k1
     Seq_in_index: 1
      Column_name: b
        Collation: A
      Cardinality: 0
         Sub_part: NULL
           Packed: NULL
             Null: YES
       Index_type: BTREE
          Comment: 
    Index_comment: 
          Ignored: YES
    CREATE OR REPLACE TABLE t1 (id INT PRIMARY KEY, b INT, KEY k1(b) IGNORED);
    
    EXPLAIN SELECT * FROM t1 ORDER BY b;
    +------+-------------+-------+------+---------------+------+---------+------+------+----------------+
    | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
    +------+-------------+-------+------+---------------+------+---------+------+------+----------------+
    |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1    | Using filesort |
    +------+-------------+-------+------+---------------+------+---------+------+------+----------------+
    
    ALTER TABLE t1 ALTER INDEX k1 NOT IGNORED;
    
    EXPLAIN SELECT * FROM t1 ORDER BY b;
    +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
    +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    |    1 | SIMPLE      | t1    | index | NULL          | k1   | 5       | NULL | 1    | Using index |
    +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    Value Groups

    The statistics primarily focus on groups of index elements with identical values. In a primary key, each index is unique, resulting in a group size of one. In a non-unique index, multiple keys may share the same value. The worst-case scenario involves large groups with identical values, such as an index on a boolean field.

    MariaDB makes heavy use of the average group size statistic. For example, if there are 100 rows, and twenty groups with the same index values, the average group size would be five.

    However, averages can be skewed by extremes, and the usual culprit is NULL values. The row of 100 may have 19 groups with an average size of 1, while the other 81 values are all NULL. MariaDB may think five is a good average size and choose to use that index, and then end up having to read through 81 rows with identical keys, taking longer than an alternative.

    Dealing with NULLs

    There are three main approaches to the problem of NULLs. NULL index values can be treated as a single group (nulls_equal). This is usually fine, but if you have large numbers of NULLs the average group size is slanted higher, and the optimizer may miss using the index for ref accesses when it would be useful. This is the default used by InnoDB. The opposite approach is nulls_unequal, with each NULL forming its own group of one. Conversely, the average group size is slanted lower, and the optimizer may use the index for ref accesses when not suitable. This is the default used by the Aria and MyISAM storage engines. A third option, nulls_ignored, sees NULLs ignored altogether from index group calculations.

    The default approaches can be changed by setting the aria_stats_method, myisam_stats_method and innodb_stats_method server variables.

    Null-Safe and Regular Comparisons

    The comparison operator used plays an important role. If two values are compared with <=> (see the null-safe-equal comparison operator), and both are null, 1 is returned. If the same values are compared with = (see the equal comparison operator) null is returned. For example:

    Engine-Independent Statistics

    introduced a way to gather statistics independently of the storage engine. See Engine-independent table statistics.

    Histogram-Based Statistics

    Histogram-Based Statistics were introduced in , and are collected by default from .

    See Also

    • User Statistics. This plugin provides user, client, table and index usage statistics.

    • InnoDB Persistent Statistics

    • Engine-independent Statistics

    • Histogram-based Statistics

    This page is licensed: CC BY-SA / Gnu FDL

    SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
    +---------+---------------+------------+
    | 1 <=> 1 | NULL <=> NULL | 1 <=> NULL |
    +---------+---------------+------------+
    |       1 |             1 |          0 |
    +---------+---------------+------------+
    
    SELECT 1 = 1, NULL = NULL, 1 = NULL;
    +-------+-------------+----------+
    | 1 = 1 | NULL = NULL | 1 = NULL |
    +-------+-------------+----------+
    |     1 |        NULL |     NULL |
    +-------+-------------+----------+
    Ignored Indexes

    Full-Text Index Overview

    MariaDB has support for full-text indexing and searching:

    • A full-text index in MariaDB is an index of type FULLTEXT, and it allows more options when searching for portions of text from a field.

    • Full-text indexes can be used only with MyISAM, Aria, InnoDB and Mroonga tables, and can be created only for CHAR, VARCHAR, or TEXT columns.

    • Partitioned tables cannot contain fulltext indexes, even if the storage engine supports them.

    • A FULLTEXT index definition can be given in the statement when a table is created, or added later using or .

    • For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.

    Full-text searching is performed using syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a literal string, not a variable or a column name.

    Excluded Results

    • Partial words are excluded.

    • Words less than 4 (MyISAM) or 3 (InnoDB) characters in length will not be stored in the fulltext index. This value can be adjusted by changing the system variable (or, for , ).

    • Words longer than 84 characters in length will also not be stored in the fulltext index. This values can be adjusted by changing the system variable (or, for , ).

    Relevance

    MariaDB calculates a relevance for each result, based on a number of factors, including the number of words in the index, the number of unique words in a row, the total number of words in both the index and the result, and the weight of the word. In English, 'cool' will be weighted less than 'dandy', at least at present! The relevance can be returned as part of a query simply by using the MATCH function in the field list.

    Types of Full-Text search

    IN NATURAL LANGUAGE MODE

    IN NATURAL LANGUAGE MODE is the default type of full-text search, and the keywords can be omitted. There are no special operators, and searches consist of one or more comma-separated keywords.

    Searches are returned in descending order of relevance.

    IN BOOLEAN MODE

    Boolean search permits the use of a number of special operators:

    Operator
    Description

    Searches are not returned in order of relevance, and nor does the 50% limit apply. Stopwords and word minimum and maximum lengths still apply as usual.

    WITH QUERY EXPANSION

    A query expansion search is a modification of a natural language search. The search string is used to perform a regular natural language search. Then, words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. It can be useful when relying on implied knowledge within the data, for example that MariaDB is a database.

    Examples

    Creating a table, and performing a basic search:

    Multiple words:

    Since 'Once' is a , no result is returned:

    Inserting the word 'wicked' into more than half the rows excludes it from the results:

    Using IN BOOLEAN MODE to overcome the 50% limitation:

    Returning the relevance:

    WITH QUERY EXPANSION. In the following example, 'MariaDB' is always associated with the word 'database', so it is returned when query expansion is used, even though not explicitly requested.

    Partial word matching with IN BOOLEAN MODE:

    Using boolean operators

    See Also

    • For simpler searches of a substring in text columns, see the operator.

    This page is licensed: CC BY-SA / Gnu FDL

    Compound (Composite) Indexes

    A mini-lesson in "compound indexes" ("composite indexes")

    This document starts out trivial and perhaps boring, but builds up to more interesting information, perhaps things you did not realize about how MariaDB and MySQL indexing works.

    This also explains (to some extent).

    (Most of this applies to other databases, too.)

    Stopwords are a list of common words such as "once" or "then" that do not reflect in the search results unless IN BOOLEAN MODE is used. The stopword list for MyISAM/Aria tables and InnoDB tables can differ. See
    for details and a full list, as well as for details on how to change the default list.
  • For MyISAM/Aria fulltext indexes only, if a word appears in more than half the rows, it is also excluded from the results of a fulltext search.

  • For InnoDB indexes, only committed rows appear - modifications from the current transaction do not apply.

  • *

    The wildcard, indicating zero or more characters. It can only appear at the end of a word.

    "

    Anything enclosed in the double quotes is taken as a whole (so you can match phrases, for example).

    +

    The word is mandatory in all rows returned.

    -

    The word cannot appear in any row returned.

    <

    The word that follows has a lower relevance than other words, although rows containing it will still match

    >

    The word that follows has a higher relevance than other words.

    ()

    Used to group words into subexpressions.

    ~

    The word following contributes negatively to the relevance of the row (which is different to the '-' operator, which specifically excludes the word, or the '<' operator, which still causes the word to contribute positively to the relevance of the row.

    CREATE TABLE
    ALTER TABLE
    CREATE INDEX
    MATCH() ... AGAINST
    ft_min_word_length
    InnoDB
    innodb_ft_min_token_size
    ft_max_word_length
    InnoDB
    innodb_ft_max_token_size
    stopword
    LIKE
    stopwords
    MATCH (col1,col2,...) AGAINST (expr [search_modifier])
    CREATE TABLE ft_myisam(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
    
    INSERT INTO ft_myisam(copy) VALUES ('Once upon a time'),
      ('There was a wicked witch'), ('Who ate everybody up');
    
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
    +--------------------------+
    | copy                     |
    +--------------------------+
    | There was a wicked witch |
    +--------------------------+
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked,witch');
    +---------------------------------+
    | copy                            |
    +---------------------------------+
    | There was a wicked witch        |
    +---------------------------------+
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('Once');
    Empty set (0.00 sec)
    INSERT INTO ft_myisam(copy) VALUES ('Once upon a wicked time'),
      ('There was a wicked wicked witch'), ('Who ate everybody wicked up');
    
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
    Empty set (0.00 sec)
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked' IN BOOLEAN MODE);
    +---------------------------------+
    | copy                            |
    +---------------------------------+
    | There was a wicked witch        |
    | Once upon a wicked time         |
    | There was a wicked wicked witch |
    | Who ate everybody wicked up     |
    +---------------------------------+
    SELECT copy,MATCH(copy) AGAINST('witch') AS relevance 
      FROM ft_myisam WHERE MATCH(copy) AGAINST('witch');
    +---------------------------------+--------------------+
    | copy                            | relevance          |
    +---------------------------------+--------------------+
    | There was a wicked witch        | 0.6775632500648499 |
    | There was a wicked wicked witch | 0.5031757950782776 |
    +---------------------------------+--------------------+
    CREATE TABLE ft2(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
    
    INSERT INTO ft2(copy) VALUES
     ('MySQL vs MariaDB database'),
     ('Oracle vs MariaDB database'), 
     ('PostgreSQL vs MariaDB database'),
     ('MariaDB overview'),
     ('Foreign keys'),
     ('Primary keys'),
     ('Indexes'),
     ('Transactions'),
     ('Triggers');
    
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database');
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    +--------------------------------+
    3 rows in set (0.00 sec)
    
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database' WITH QUERY EXPANSION);
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    | MariaDB overview               |
    +--------------------------------+
    4 rows in set (0.00 sec)
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('Maria*' IN BOOLEAN MODE);
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    | MariaDB overview               |
    +--------------------------------+
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('+MariaDB -database' 
      IN BOOLEAN MODE);
    +------------------+
    | copy             |
    +------------------+
    | MariaDB overview |
    +------------------+
    The query to discuss

    The question is "When was Andrew Johnson president of the US?".

    The available table Presidents looks like:

    ("Andrew Johnson" was picked for this lesson because of the duplicates.)

    What index(es) would be best for that question? More specifically, what would be best for

    Some INDEXes to try...

    • No indexes

    • INDEX(first_name), INDEX(last_name) (two separate indexes)

    • "Index Merge Intersect"

    • INDEX(last_name, first_name) (a "compound" index)

    • INDEX(last_name, first_name, term) (a "covering" index)

    • Variants

    No indexes

    Well, I am fudging a little here. I have a PRIMARY KEY on seq, but that has no advantage on the query we are studying.

    Implementation Details

    First, let's describe how InnoDB stores and uses indexes.

    • The data and the PRIMARY KEY are "clustered" together in on BTree.

    • A BTree lookup is quite fast and efficient. For a million-row table there might be 3 levels of BTree, and the top two levels are probably cached.

    • Each secondary index is in another BTree, with the PRIMARY KEY at the leaf.

    • Fetching 'consecutive' (according to the index) items from a BTree is very efficient because they are stored consecutively.

    • For the sake of simplicity, we can count each BTree lookup as 1 unit of work, and ignore scans for consecutive items. This approximates the number of disk hits for a large table in a busy system.

    For MyISAM, the PRIMARY KEY is not stored with the data, so think of it as being a secondary key (over-simplified).

    INDEX(first_name), INDEX(last_name)

    The novice, once he learns about indexing, decides to index lots of columns, one at a time. But...

    MariaDB rarely uses more than one index at a time in a query. So, it will analyze the possible indexes.

    • first_name -- there are 2 possible rows (one BTree lookup, then scan consecutively)

    • last_name -- there are 2 possible rows Let's say it picks last_name. Here are the steps for doing the SELECT:

    1. Using INDEX(last_name), find 2 index entries with last_name = 'Johnson'.

    2. Get the PRIMARY KEY (implicitly added to each secondary index in InnoDB); get (17, 36).

    3. Reach into the data using seq = (17, 36) to get the rows for Andrew Johnson and Lyndon B. Johnson.

    4. Use the rest of the WHERE clause filter out all but the desired row.

    5. Deliver the answer (1865-1869).

    "Index Merge Intersect"

    OK, so you get really smart and decide that MariaDB should be smart enough to use both name indexes to get the answer. This is called "Intersect".

    1. Using INDEX(last_name), find 2 index entries with last_name = 'Johnson'; get (7, 17)

    2. Using INDEX(first_name), find 2 index entries with first_name = 'Andrew'; get (17, 36)

    3. "And" the two lists together (7,17) & (17,36) = (17)

    4. Reach into the data using seq = (17) to get the row for Andrew Johnson.

    5. Deliver the answer (1865-1869).

    The EXPLAIN fails to give the gory details of how many rows collected from each index, etc.

    INDEX(last_name, first_name)

    This is called a "compound" or "composite" index since it has more than one column.

    1. Drill down the BTree for the index to get to exactly the index row for Johnson+Andrew; get seq = (17).

    2. Reach into the data using seq = (17) to get the row for Andrew Johnson.

    3. Deliver the answer (1865-1869). This is much better. In fact this is usually the "best".

    "Covering": INDEX(last_name, first_name, term)

    Surprise! We can actually do a little better. A "Covering" index is one in which all of the fields of the SELECT are found in the index. It has the added bonus of not having to reach into the "data" to finish the task.

    1. Drill down the BTree for the index to get to exactly the index row for Johnson+Andrew; get seq = (17).

    2. Deliver the answer (1865-1869). The "data" BTree is not touched; this is an improvement over "compound".

    Everything is similar to using "compound", except for the addition of "Using index".

    Variants

    • What would happen if you shuffled the fields in the WHERE clause? Answer: The order of ANDed things does not matter.

    • What would happen if you shuffled the fields in the INDEX? Answer: It may make a huge difference. More in a minute.

    • What if there are extra fields on the end? Answer: Minimal harm; possibly a lot of good (eg, 'covering').

    • Reduncancy? That is, what if you have both of these: INDEX(a), INDEX(a,b)? Answer: Reduncy costs something on INSERTs; it is rarely useful for SELECTs.

    • Prefix? That is, INDEX(last_name(5). first_name(5)) Answer: Don't bother; it rarely helps, and often hurts. (The details are another topic.)

    More examples:

    Postlog

    Refreshed -- Oct, 2012; more links -- Nov 2016

    See also

    • Cookbook on designing the best index for a SELECT

    • Sheeri's discussing of Indexes

    • Slides on EXPLAIN

    • Mysql manual page on range accesses in composite indexes

    Rick James graciously allowed us to use this article in the documentation.

    Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.

    Original source: index1

    This page is licensed: CC BY-SA / Gnu FDL

    EXPLAIN
    +-----+------------+----------------+-----------+
    | seq | last_name  | first_name     | term      |
    +-----+------------+----------------+-----------+
    |   1 | Washington | George         | 1789-1797 |
    |   2 | Adams      | John           | 1797-1801 |
    ...
    |   7 | Jackson    | Andrew         | 1829-1837 |
    ...
    |  17 | Johnson    | Andrew         | 1865-1869 |
    ...
    |  36 | Johnson    | Lyndon B.      | 1963-1969 |
    ...
    SELECT  term
            FROM  Presidents
            WHERE  last_name = 'Johnson'
              AND  first_name = 'Andrew';
    SHOW CREATE TABLE Presidents \G
    CREATE TABLE `presidents` (
      `seq` TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT,
      `last_name` VARCHAR(30) NOT NULL,
      `first_name` VARCHAR(30) NOT NULL,
      `term` VARCHAR(9) NOT NULL,
      PRIMARY KEY (`seq`)
    ) ENGINE=InnoDB AUTO_INCREMENT=45 DEFAULT CHARSET=utf8
    
    EXPLAIN  SELECT  term
       FROM  Presidents
       WHERE  last_name = 'Johnson'
       AND  first_name = 'Andrew';
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    | id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    |  1 | SIMPLE      | Presidents | ALL  | NULL          | NULL | NULL    | NULL |   44 | Using where |
    +----+-------------+------------+------+---------------+------+---------+------+------+-------------+
    
    # Or, using the other form of display:  EXPLAIN ... \G
               id: 1
      select_type: SIMPLE
            table: Presidents
             type: ALL        <-- Implies table scan
    possible_keys: NULL
              key: NULL       <-- Implies that no index is useful, hence table scan
          key_len: NULL
              ref: NULL
             rows: 44         <-- That's about how many rows in the table, so table scan
            Extra: Using where
    EXPLAIN  SELECT  term
      FROM  Presidents
      WHERE  last_name = 'Johnson'
      AND  first_name = 'Andrew'  \G
      select_type: SIMPLE
            table: Presidents
             type: ref
    possible_keys: last_name, first_name
              key: last_name
          key_len: 92                 <-- VARCHAR(30) utf8 may need 2+3*30 bytes
              ref: const
             rows: 2                  <-- Two 'Johnson's
            Extra: Using where
    id: 1
      select_type: SIMPLE
            table: Presidents
             type: index_merge
    possible_keys: first_name,last_name
              key: first_name,last_name
          key_len: 92,92
              ref: NULL
             rows: 1
            Extra: Using intersect(first_name,last_name); Using where
    ALTER TABLE Presidents
            (DROP old indexes AND...)
            ADD INDEX compound(last_name, first_name);
    
               id: 1
      select_type: SIMPLE
            table: Presidents
             type: ref
    possible_keys: compound
              key: compound
          key_len: 184             <-- The length of both fields
              ref: const,const     <-- The WHERE clause gave constants for both
             rows: 1               <-- Goodie!  It homed in on the one row.
            Extra: Using where
    ... ADD INDEX covering(last_name, first_name, term);
    
               id: 1
      select_type: SIMPLE
            table: Presidents
             type: ref
    possible_keys: covering
              key: covering
          key_len: 184
              ref: const,const
             rows: 1
            Extra: Using where; Using index   <-- Note
    INDEX(last, first)
        ... WHERE last = '...' -- good (even though `first` is unused)
        ... WHERE first = '...' -- index is useless
    
        INDEX(first, last), INDEX(last, first)
        ... WHERE first = '...' -- 1st index is used
        ... WHERE last = '...' -- 2nd index is used
        ... WHERE first = '...' AND last = '...' -- either could be used equally well
    
        INDEX(last, first)
        Both of these are handled by that one INDEX:
        ... WHERE last = '...'
        ... WHERE last = '...' AND first = '...'
    
        INDEX(last), INDEX(last, first)
        In light of the above example, don't bother including INDEX(last).
    Overhead of Composite Indexes
    Size and other limits on Indexes

    Latitude/Longitude Indexing

    The problem

    You want to find the nearest 10 pizza parlors, but you cannot figure out how to do it efficiently in your huge database. Database indexes are good at one-dimensional indexing, but poor at two-dimensions.

    You might have tried

    • INDEX(lat), INDEX(lon) -- but the optimizer used only one

    • INDEX(lat,lon) -- but it still had to work too hard

    • Sometimes you ended up with a full table scan -- Yuck.

    WHERE < ... -- No chance of using any index.

    WHERE lat BETWEEN ... AND lng BETWEEN... -- This has some chance of using such indexes.

    The goal is to look only at records "close", in both directions, to the target lat/lng.

    A solution -- first, the principles

    in MariaDB and MySQL sort of give you a way to have two clustered indexes. So, if we could slice up (partition) the globe in one dimension and use ordinary indexing in the other dimension, maybe we can get something approximating a 2D index. This 2D approach keeps the number of disk hits significantly lower than 1D approaches, thereby speeding up "find nearest" queries.

    It works. Not perfectly, but better than the alternatives.

    What to PARTITION on? It seems like latitude or longitude would be a good idea. Note that longitudes vary in width, from 69 miles (111 km) at the equator, to 0 at the poles. So, latitude seems like a better choice.

    How many PARTITIONs? It does not matter a lot. Some thoughts:

    • 90 partitions - 2 degrees each. (I don't like tables with too many partitions; 90 seems like plenty.)

    • 50-100 - evenly populated. (This requires code. For 2.7M placenames, 85 partitions varied from 0.5 degrees to very wide partitions at the poles.)

    • Don't have more than 100 partitions, there are inefficiencies in the partition implementation.

    How to PARTITION? Well, MariaDB and MySQL are very picky. So / are out. is out. So, we are stuck with some kludge. Essentially, we need to convert Lat/Lng to some size of and use PARTITION BY RANGE.

    Representation choices

    To get to a datatype that can be used in PARTITION, you need to "scale" the latitude and longitude. (Consider only the *INTs; the other datatypes are included for comparison)

    (Sorted by resolution)

    What these mean...

    Deg*100 () -- you take the lat/lng, multiply by 100, round, and store into a SMALLINT. That will take 2 bytes for each dimension, for a total of 4 bytes. Two items might be 1570 meters apart, but register as having the same latitude and longitude.

    for latitude and DECIMAL(5,2) for longitude will take 2+3 bytes and have no better resolution than Deg*100.

    SMALLINT scaled -- Convert latitude into a SMALLINT SIGNED by doing (degrees / 90 * 32767) and rounding; longitude by (degrees / 180 * 32767).

    has 24 significant bits; has 53. (They don't work with PARTITIONing but are included for completeness. Often people use DOUBLE without realizing how much an overkill it is, and how much space it takes.)

    Sure, you could do DEG_1000 and other "in between" cases, but there is no advantage. DEG_1000 takes as much space as DEG*10000, but has less resolution.

    So, go down the list to see how much resolution you need, then pick an encoding you are comfortable with. However, since we are about to use latitude as a "partition key", it must be limited to one of the INTs. For the sample code, I will use Deg*10000 ().

    GCDist -- compute "great circle distance"

    GCDist is a helper FUNCTION that correctly computes the distance between two points on the globe.

    The code has been benchmarked at about 20 microseconds per call on a 2011-vintage PC. If you had to check a million points, that would take 20 seconds -- far too much for a web application. So, one goal of the Procedure that uses it will be to minimize the usage of this function. With the code presented here, the function need be called only a few dozen or few hundred times, except in pathological cases.

    Sure, you could use the Pythagorean formula. And it would work for most applications. But it does not take extra effort to do the GC. Furthermore, GC works across a pole and across the dateline. And, a Pythagorean function is not that much faster.

    For efficiency, GCDist understands the scaling you picked and has that stuff hardcoded. I am picking "Deg*10000", so the function expects 350000 for representing 35 degrees. If you choose a different scaling, you will need to change the code.

    GCDist() takes 4 scaled DOUBLEs -- lat1, lon1, lat2, lon2 -- and returns a scaled number of "degrees" representing the distance.

    The table of representation choices says 52 feet of resolution for Deg*10000 and DECIMAL(x,4). Here is how it was calculated: To measuring a diagonal between lat/lng (0,0) and (0.0001,00001) (one 'unit in the last place'): GCDist(0,0,1,1) * 69.172 / 10000 * 5280 = 51.65, where

    • 69.172 miles/degree of latitude

    • 10000 units per degree for the scaling chosen

    • 5280 feet / mile.

    (No, this function does not compensate for the Earth being an oblate spheroid, etc.)

    Required table structure

    There will be one table (plus normalization tables as needed). The one table must be partitioned and indexed as indicated below.

    Fields and indexes

    • PARTITION BY RANGE(lat)

    • lat -- scaled latitude (see above)

    • lon -- scaled longitude

    • PRIMARY KEY(lon, lat, ...) -- lon must be first; something must be added to make it UNIQUE

    For most of this discussion, lat is assumed to be MEDIUMINT -- scaled from -90 to +90 by multiplying by 10000. Similarly for lon and -180 to +180.

    The PRIMARY KEY must

    • start with lon since the algorithm needs the "clustering" that InnoDB will provide, and

    • include lat somewhere, since it is the PARTITION key, and

    • contain something to make the key UNIQUE (lon+lat is unlikely to be sufficient).

    The FindNearest PROCEDURE will do multiple SELECTs something like this:

    The query planner will

    • Do PARTITION "pruning" based on the latitude; then

    • Within a PARTITION (which is effectively a table), use lon do a 'clustered' range scan; then

    • Use the "condition" to filter down to the rows you desire, plus recheck lat. This design leads to very few disk blocks needing to be read, which is the main goal of the design.

    Note that this does not even call GCDist. That comes in the last pass when the ORDER BY and LIMIT are used.

    The has a loop. At least two SELECTs will be executed, but with proper tuning; usually no more than about 6 SELECTs will be performed. Because of searching by the PRIMARY KEY, each SELECT hits only one block, sometimes more, of the table. Counting the number of blocks hit is a crude, but effective way, of comparing the performance of multiple designs. By comparison, a full table scan will probably touch thousands of blocks. A simple INDEX(lat) probably leads to hitting hundreds of blocks.

    Filtering... An argument to the FindNearest procedure includes a boolean expression ("condition") for a WHERE clause. If you don't need any filtering, pass in "1". To avoid "SQL injection", do not let web users put arbitrary expressions; instead, construct the "condition" from inputs they provide, thereby making sure it is safe.

    The algorithm

    The algorithm is embodied in a because of its complexity.

    • You feed it a starting width for a "square" and a number of items to find.

    • It builds a "square" around where you are.

    • A SELECT is performed to see how many items are in the square.

    • Loop, doubling the width of the square, until enough items are found.

    The next section ("Performance") should make this a bit clearer as it walks through some examples.

    Performance

    Because of all the variations, it is hard to get a meaningful benchmark. So, here is some hand-waving instead.

    Each SELECT is constrained by a "square" defined by a latitude range and a longitude range. (See the WHERE clause mentioned above, or in the sample code below.) Because of the way longitude lines warp, the longitude range of the "square" will be more degrees than the latitude range. Let's say the latitude partitioning is 3 degrees wide in the area where you are searching. That is over 200 miles (over 300km), so you are very likely to have a latitude range smaller than the partition width. Still, if you are reaching from the edge of a latitude stripe, the square could span two partitions. After partition pruning down to one (sometimes more) partition, the query is then constrained by a longitude range. (Remember, the PRIMARY KEY starts with lon.) If an InnoDB data block contains 100 rows (a handy Rule of Thumb), the select will touch one (or a few) block. If the square spans two (or more) partitions, then the same logic applies to each partition.

    So, scanning the square will involve as little as one block; rarely more than a few blocks. The number of blocks is mostly independent of the dataset size.

    The primary use case for this algorithm is when the data is significantly larger than will fit into cache (the buffer_pool). Hence, the main goal is to minimize the number of disk hits.

    Now let's look at some edge cases, and argue that the number of blocks is still better (usually) than with traditional indexing techniques.

    What if you are looking for Starbucks in a dense city? There would be dozens, maybe hundreds per square mile. If you start the guess at 100 miles, the SELECTs would be hitting lots of blocks -- not efficient. In this case, the "starting distance" should be small, say, 2 miles. Let's say your app wants the closest 10 stores. In this example, you would probably find more than 10 Starbucks within 2 miles in 1 InnoDB block in one partition. Even though there is a second SELECT to finish off the query, it would be hitting the same block. Total: One block hit == cheap.

    Let's say you start with a 5 mile square. Since there are upwards of 200 Starbucks within a 5-miles radius in some dense cities of the world, that might imply 300 in our "square". That maps to about 4 disk blocks, and a modest amount of CPU to chew through the 300 records. Still not bad.

    Now, suppose you are on an ocean liner somewhere in the Pacific. And there is one Starbucks onboard, but you are looking for the nearest 10. If you again start with 2 miles, it will take several iterations to find 10 sites. But, let's walk through it anyway. The first probe will hit one partition (maybe 2), and find just one hit. The second probe doubles the width of the square; 4 miles will still give you one hit -- the same hit in the same block, which is now cached, so we won't count it as a second disk I/O. Eventually the square will be wide enough to span multiple partitions. Each extra partition will be one new disk hit to discover no sites in the square. Finally, the square will hit Chile or Hawaii or Fiji and find some more sites, perhaps enough to stop the iteration. Since the main criteria in determining the number of disk hits is the number of partitions hit, we do not want to split the world into too many partitions. If there are, say, 40 partitions, then I have just described a case where there might be 20 disk hits.

    2-degree partitions might be good for a global table of stores or restaurants. A 5-mile starting distance might be good when filtering for Starbucks. 20 miles might be better for a department store.

    Now, let's discuss the 'last' SELECT, wherein the square is expanded by SQRT(2) and it uses the Great Circle formula to precisely order the N results. The SQRT(2) is in case that the N items were all at the corners of the 'square'. Growing the square by this much allows us to catch any other sites that were just outside the old square.

    First, note that this 'last' SELECT is hitting the same block(s) that the iteration hit, plus possibly hitting some more blocks. It is hard to predict how many extra blocks might be hit. Here's a pathological case. You are in the middle of a desert; the square grows and grows. Eventually it finds N sites. There is a big city just outside the final square from the iterating. Now the 'last' SELECT kicks in, and it includes lots of sites in this big city. "Lots of sites" --> lots of blocks --> lots of disk hits.

    Discussion of reference code

    Here's the gist of the FindNearest().

    • Make a guess at how close to "me" to look.

    • See how many items are in a 'square' around me, after filtering.

    • If not enough, repeat, doubling the width of the square.

    • After finding enough, or giving up because we are looking "too far", make one last pass to get all the data, ORDERed and LIMITed

    Note that the loop merely uses 'squares' of lat/lng ranges. This is crude, but works well with the partitioning and indexing, and avoids calling to GCDist (until the last step). In the sample code, I picked 15 miles as starting value. Adjusting this will have some impact on the Procedure's performance, but the impact will vary with the use cases. A rough way to set the radius is to guess what will find the desired LIMIT about half the time. (This value is hardcoded in the PROCEDURE.)

    Parameters passed into FindNearest():

    • your Latitude -- -90..90 (not scaled -- see hardcoded conversion in PROCEDURE)

    • your Longitude -- -180..180 (not scaled)

    • Start distance -- (miles or km) -- see discussion below

    • Max distance -- in miles or km -- see hardcoded conversion in PROCEDURE

    The function will find the nearest items, up to Limit that meet the Condition. But it will give up at Max distance. (If you are at the South Pole, why bother searching very far for the tenth pizza parlor?)

    Because of the "scaling", "hardcoding", "Condition", the table name, etc, this PROCEDURE is not truly generic; the code must be modified for each application. Yes, I could have designed it to pass all that stuff in. But what a mess.

    The "_start_dist" gives some control over the performance. Making this too small leads to extra iterations; too big leads to more rows being checked. If you choose to tune the Stored Procedure, do the following. "SELECT @iterations" after calling the SP for a number of typical values. If the value is usually 1, then decrease _start_dist. If it is usually 2 or more, then increase it.

    Timing: Under 10ms for "typical" usage; any dataset size. Slower for pathological cases (low min distance, high max distance, crossing dateline, bad filtering, cold cache, etc)

    End-cases:

    • By using GC distance, not Pythagoras, distances are 'correct' even near poles.

    • Poles -- Even if the "nearest" is almost 360 degrees away (longitude), it can find it.

    • Dateline -- There is a small, 'contained', piece of code for crossing the Dateline. Example: you are at +179 deg longitude, and the nearest item is at -179.

    The procedure returns one resultset, SELECT *, distance.

    • Only rows that meet your Condition, within Max distance are returned

    • At most Limit rows are returned

    • The rows will be ordered, "closest" first.

    • "dist" will be in miles or km (based on a hardcoded constant in the SP)

    Reference code, assuming deg*10000 and 'miles'

    This version is based on scaling "Deg*10000 (MEDIUMINT)".

    Postlog

    There is a "Haversine" algorithm that is twice as fast as the GCDist function here. But it has a fatal flaw of sometimes returning NULL for the distance between a point and itself. (This is because of computing a number slightly bigger than 1.0, then trying to take the ACOS of it.)

    See also

    Rick James graciously allowed us to use this article in the documentation.

    has other useful tips, how-tos, optimizations, and debugging tips.

    Original source:

    This page is licensed: CC BY-SA / Gnu FDL

    id -- (optional) you may need to identify the rows for your purposes; AUTO_INCREMENT if you like
  • INDEX(id) -- if id is AUTO_INCREMENT, then this plain INDEX (not UNIQUE, not PRIMARY KEY) is necessary

  • ENGINE=InnoDB -- so the PRIMARY KEY will be "clustered"

  • Other indexes -- keep to a minimum (this is a general performance rule for large tables)

  • Now, a 'last' SELECT is performed to get the exact distances, sort them (ORDER BY) and LIMIT to the desired number.

  • If spanning a pole or the dateline, a more complex SELECT is used.

  • Limit -- maximum number of items to return

  • Condition -- something to put after 'AND' (more discussion above)

  • Z-ordering
    SQRT(...)
    PARTITIONs
    FLOAT
    DOUBLE
    DECIMAL
    INT
    SMALLINT
    DECIMAL(4,2)
    FLOAT
    DOUBLE
    MEDIUMINT
    stored procedure
    stored procedure
    stored procedure
    Cities used for testing
    A forum thread
    StackOverflow discussion
    Sample
    Rick James' site
    latlng
    Datatype           Bytes       resolution
       ------------------ -----  --------------------------------
       Deg*100 (SMALLINT)     4  1570 m    1.0 mi  Cities
       DECIMAL(4,2)/(5,2)     5  1570 m    1.0 mi  Cities
       SMALLINT scaled        4   682 m    0.4 mi  Cities
       Deg*10000 (MEDIUMINT)  6    16 m     52 ft  Houses/Businesses
       DECIMAL(6,4)/(7,4)     7    16 m     52 ft  Houses/Businesses
       MEDIUMINT scaled       6   2.7 m    8.8 ft
       FLOAT                  8   1.7 m    5.6 ft
       DECIMAL(8,6)/(9,6)     9    16cm    1/2 ft  Friends in a mall
       Deg*10000000 (INT)     8    16mm    5/8 in  Marbles
       DOUBLE                16   3.5nm     ...    Fleas on a dog
    WHERE lat    BETWEEN @my_lat - @dlat
                           AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
            AND lon    BETWEEN @my_lon - @dlon
                           AND @my_lon + @dlon   -- first part of PK
            AND condition                        -- filter out non-pizza parlors
    DELIMITER //
    
    DROP function IF EXISTS GCDist //
    CREATE FUNCTION GCDist (
            _lat1 DOUBLE,  -- Scaled Degrees north for one point
            _lon1 DOUBLE,  -- Scaled Degrees west for one point
            _lat2 DOUBLE,  -- other point
            _lon2 DOUBLE
        ) RETURNS DOUBLE
        DETERMINISTIC
        CONTAINS SQL  -- SQL but does not read or write
        SQL SECURITY INVOKER  -- No special privileges granted
    -- Input is a pair of latitudes/longitudes multiplied by 10000.
    --    For example, the south pole has latitude -900000.
    -- Multiply output by .0069172 to get miles between the two points
    --    or by .0111325 to get kilometers
    BEGIN
        -- Hardcoded constant:
        DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;  -- For scaled by 1e4 to MEDIUMINT
        DECLARE _rlat1 DOUBLE DEFAULT _deg2rad * _lat1;
        DECLARE _rlat2 DOUBLE DEFAULT _deg2rad * _lat2;
        -- compute as if earth's radius = 1.0
        DECLARE _rlond DOUBLE DEFAULT _deg2rad * (_lon1 - _lon2);
        DECLARE _m     DOUBLE DEFAULT COS(_rlat2);
        DECLARE _x     DOUBLE DEFAULT COS(_rlat1) - _m * COS(_rlond);
        DECLARE _y     DOUBLE DEFAULT               _m * SIN(_rlond);
        DECLARE _z     DOUBLE DEFAULT SIN(_rlat1) - SIN(_rlat2);
        DECLARE _n     DOUBLE DEFAULT SQRT(
                            _x * _x +
                            _y * _y +
                            _z * _z    );
        RETURN  2 * ASIN(_n / 2) / _deg2rad;   -- again--scaled degrees
    END;
    //
    DELIMITER ;
    
    DELIMITER //
    -- FindNearest (about my 6th approach)
    DROP PROCEDURE IF EXISTS FindNearest6 //
    CREATE
    PROCEDURE FindNearest (
            IN _my_lat DOUBLE,  -- Latitude of me [-90..90] (not scaled)
            IN _my_lon DOUBLE,  -- Longitude [-180..180]
            IN _START_dist DOUBLE,  -- Starting estimate of how far to search: miles or km
            IN _max_dist DOUBLE,  -- Limit how far to search: miles or km
            IN _limit INT,     -- How many items to try to get
            IN _condition VARCHAR(1111)   -- will be ANDed in a WHERE clause
        )
        DETERMINISTIC
    BEGIN
        -- lat and lng are in degrees -90..+90 and -180..+180
        -- All computations done in Latitude degrees.
        -- Thing to tailor
        --   *Locations* -- the table
        --   Scaling of lat, lon; here using *10000 in MEDIUMINT
        --   Table name
        --   miles versus km.
    
        -- Hardcoded constant:
        DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;  -- For scaled by 1e4 to MEDIUMINT
    
        -- Cannot use params in PREPARE, so switch to @variables:
        -- Hardcoded constant:
        SET @my_lat := _my_lat * 10000,
            @my_lon := _my_lon * 10000,
            @deg2dist := 0.0069172,  -- 69.172 for miles; 111.325 for km  *** (mi vs km)
            @start_deg := _start_dist / @deg2dist,  -- Start with this radius first (eg, 15 miles)
            @max_deg := _max_dist / @deg2dist,
            @cutoff := @max_deg / SQRT(2),  -- (slightly pessimistic)
            @dlat := @start_deg,  -- note: must stay positive
            @lon2lat := COS(_deg2rad * @my_lat),
            @iterations := 0;        -- just debugging
    
        -- Loop through, expanding search
        --   Search a 'square', repeat with bigger square until find enough rows
        --   If the inital probe found _limit rows, then probably the first
        --   iteration here will find the desired data.
        -- Hardcoded table name:
        -- This is the "first SELECT":
        SET @sql = CONCAT(
            "SELECT COUNT(*) INTO @near_ct
                FROM Locations
                WHERE lat    BETWEEN @my_lat - @dlat
                                 AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                  AND lon    BETWEEN @my_lon - @dlon
                                 AND @my_lon + @dlon   -- first part of PK
                  AND ", _condition);
        PREPARE _sql FROM @sql;
        MainLoop: LOOP
            SET @iterations := @iterations + 1;
            -- The main probe: Search a 'square'
            SET @dlon := ABS(@dlat / @lon2lat);  -- good enough for now  -- note: must stay positive
            -- Hardcoded constants:
            SET @dlon := IF(ABS(@my_lat) + @dlat >= 900000, 3600001, @dlon);  -- near a Pole
            EXECUTE _sql;
            IF ( @near_ct >= _limit OR         -- Found enough
                 @dlat >= @cutoff ) THEN       -- Give up (too far)
                LEAVE MainLoop;
            END IF;
            -- Expand 'square':
            SET @dlat := LEAST(2 * @dlat, @cutoff);   -- Double the radius to search
        END LOOP MainLoop;
        DEALLOCATE PREPARE _sql;
    
        -- Out of loop because found _limit items, or going too far.
        -- Expand range by about 1.4 (but not past _max_dist),
        -- then fetch details on nearest 10.
    
        -- Hardcoded constant:
        SET @dlat := IF( @dlat >= @max_deg OR @dlon >= 1800000,
                    @max_deg,
                    GCDist(ABS(@my_lat), @my_lon,
                           ABS(@my_lat) - @dlat, @my_lon - @dlon) );
                -- ABS: go toward equator to find farthest corner (also avoids poles)
                -- Dateline: not a problem (see GCDist code)
    
        -- Reach for longitude line at right angle:
        -- sin(dlon)*cos(lat) = sin(dlat)
        -- Hardcoded constant:
        SET @dlon := IFNULL(ASIN(SIN(_deg2rad * @dlat) /
                                 COS(_deg2rad * @my_lat))
                                / _deg2rad -- precise
                            , 3600001);    -- must be too near a pole
    
        -- This is the "last SELECT":
        -- Hardcoded constants:
        IF (ABS(@my_lon) + @dlon < 1800000 OR    -- Usual case - not crossing dateline
            ABS(@my_lat) + @dlat <  900000) THEN -- crossing pole, so dateline not an issue
            -- Hardcoded table name:
            SET @sql = CONCAT(
                "SELECT *,
                        @deg2dist * GCDist(@my_lat, @my_lon, lat, lon) AS dist
                    FROM Locations
                    WHERE lat BETWEEN @my_lat - @dlat
                                  AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                      AND lon BETWEEN @my_lon - @dlon
                                  AND @my_lon + @dlon   -- first part of PK
                      AND ", _condition, "
                    HAVING dist <= ", _max_dist, "
                    ORDER BY dist
                    LIMIT ", _limit
                            );
        ELSE
            -- Hardcoded constants and table name:
            -- Circle crosses dateline, do two SELECTs, one for each side
            SET @west_lon := IF(@my_lon < 0, @my_lon, @my_lon - 3600000);
            SET @east_lon := @west_lon + 3600000;
            -- One of those will be beyond +/- 180; this gets points beyond the dateline
            SET @sql = CONCAT(
                "( SELECT *,
                        @deg2dist * GCDist(@my_lat, @west_lon, lat, lon) AS dist
                    FROM Locations
                    WHERE lat BETWEEN @my_lat - @dlat
                                  AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                      AND lon BETWEEN @west_lon - @dlon
                                  AND @west_lon + @dlon   -- first part of PK
                      AND ", _condition, "
                    HAVING dist <= ", _max_dist, " )
                UNION ALL
                ( SELECT *,
                        @deg2dist * GCDist(@my_lat, @east_lon, lat, lon) AS dist
                    FROM Locations
                    WHERE lat BETWEEN @my_lat - @dlat
                                  AND @my_lat + @dlat   -- PARTITION Pruning and bounding box
                      AND lon BETWEEN @east_lon - @dlon
                                  AND @east_lon + @dlon   -- first part of PK
                      AND ", _condition, "
                    HAVING dist <= ", _max_dist, " )
                ORDER BY dist
                LIMIT ", _limit
                            );
        END IF;
    
        PREPARE _sql FROM @sql;
        EXECUTE _sql;
        DEALLOCATE PREPARE _sql;
    END;
    //
    DELIMITER ;
    <<code>>
    
    == Sample
    
    Find the 5 cities with non-zero population (out of 3 million) nearest to (+35.15, -90.15). Start with a 10-mile bounding box and give up at 100 miles.
    
    <<code>>
    CALL FindNearestLL(35.15, -90.05, 10, 100, 5, 'population > 0');
    +---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
    | id      | lat    | lon     | country | ascii_city   | city         | state | population | @gcd_ct := 0 | dist                | @gcd_ct := @gcd_ct + 1 |
    +---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
    | 3023545 | 351494 | -900489 | us      | memphis      | Memphis      | TN    |     641608 |            0 | 0.07478733189367963 |                      3 |
    | 2917711 | 351464 | -901844 | us      | west memphis | West Memphis | AR    |      28065 |            0 |   7.605683607627499 |                      2 |
    | 2916457 | 352144 | -901964 | us      | marion       | Marion       | AR    |       9227 |            0 |     9.3994963998986 |                      1 |
    | 3020923 | 352044 | -898739 | us      | bartlett     | Bartlett     | TN    |      43264 |            0 |  10.643941157860604 |                      7 |
    | 2974644 | 349889 | -900125 | us      | southaven    | Southaven    | MS    |      38578 |            0 |  11.344042217329935 |                      5 |
    +---------+--------+---------+---------+--------------+--------------+-------+------------+--------------+---------------------+------------------------+
    5 rows in set (0.00 sec)
    Query OK, 0 rows affected (0.04 sec)
    
    SELECT COUNT(*) FROM ll_table;
    +----------+
    | COUNT(*) |
    +----------+
    |  3173958 |
    +----------+
    1 row in set (5.04 sec)
    
    FLUSH STATUS;
    CALL...
    SHOW SESSION STATUS LIKE 'Handler%';
    
    SHOW session status LIKE 'Handler%';
    +----------------------------+-------+
    | Variable_name              | Value |
    +----------------------------+-------+
    | Handler_read_first         | 1     |
    | Handler_read_key           | 3     |
    | Handler_read_next          | 1307  |  -- some index, some tmp, but far less than 3 million.
    | Handler_read_rnd           | 5     |
    | Handler_read_rnd_next      | 13    |
    | Handler_write              | 12    |  -- it needed a tmp
    +----------------------------+-------+

    Full-Text Index Stopwords

    Stopwords are used to provide a list of commonly-used words that can be ignored for the purposes of Full-text-indexes.

    Full-text indexes built in MyISAM and InnoDB have different stopword lists by default.

    MyISAM Stopwords

    For full-text indexes on MyISAM tables, by default, the list is built from the file storage/myisam/ft_static.c, and searched using the server's character set and collation. The ft_stopword_file system variable allows the default list to be overridden with words from another file, or for stopwords to be ignored altogether.

    If the stopword list is changed, any existing full-text indexes need to be rebuilt

    The following table shows the default list of stopwords, although you should always treat storage/myisam/ft_static.c as the definitive list. See the for more details, and for related articles.

    InnoDB Stopwords

    Stopwords on full-text indexes are only enabled if the system variable is set (by default it is) at the time the index was created.

    The stopword list is determined as follows:

    • If the system variable is set, that table is used as a stopword list.

    • If innodb_ft_user_stopword_table is not set, the table set by is used.

    • If neither variable is set, the built-in list is used, which can be viewed by querying the in the .

    In the first two cases, the specified table must exist at the time the system variable is set and the full-text index created. It must be an InnoDB table with a single column, a named VALUE.

    The default InnoDB stopword list differs from the default MyISAM list, being much shorter, and contains the following words:

    This page is licensed: CC BY-SA / Gnu FDL

    Foreign Keys

    A foreign key is a database constraint that references columns in a parent table to enforce data integrity in a child table. When used, MariaDB checks to maintain these integrity rules.

    Overview

    A foreign key is a constraint which can be used to enforce data integrity. It is composed by a column (or a set of columns) in a table called the child table, which references to a column (or a set of columns) in a table called the parent table. If foreign keys are used, MariaDB performs some checks to enforce that some integrity rules are always enforced. For a more exhaustive explanation, see .

    Foreign keys can only be used with storage engines that support them. The default InnoDB supports foreign keys.

    Partitioned tables cannot contain foreign keys, and cannot be referenced by a foreign key.

    Syntax

    Note: Until , MariaDB accepts the shortcut format with a REFERENCES clause only in ALTER TABLE and CREATE TABLE statements, but that syntax does nothing. For example:

    MariaDB simply parses it without returning any error or warning, for compatibility with other DBMS's. However, only the syntax described below creates foreign keys. From , MariaDB will attempt to apply the constraint. See the below.

    Foreign keys are created with or . The definition must follow this syntax:

    The symbol clause, if specified, is used in error messages and must be unique in the database.

    The columns in the child table must be a BTREE (not HASH, RTREE, or FULLTEXT — see ) index, or the leftmost part of a BTREE index. Index prefixes are not supported (thus, and columns cannot be used as foreign keys). If MariaDB automatically creates an index for the foreign key (because it does not exist and is not explicitly created), its name will be index_name.

    The referenced columns in the parent table must be a an index or a prefix of an index.

    The foreign key columns and the referenced columns must be of the same type, or similar types. For integer types, the size and sign must also be the same.

    Both the foreign key columns and the referenced columns can be columns. However, the ON UPDATE CASCADE, ON UPDATE SET NULL, ON DELETE SET NULL clauses are not allowed in this case.

    The parent and the child table must use the same storage engine, and must not be TEMPORARY or partitioned tables. They can be the same table.

    Constraints

    If a foreign keys exists, each row in the child table must match a row in the parent table. Multiple child rows can match the same parent row. A child row matches a parent row if all its foreign key values are identical to a parent row's values in the parent table. However, if at least one of the foreign key values is NULL, the row has no parents, but it is still allowed.

    MariaDB performs certain checks to guarantee that the data integrity is enforced:

    • Trying to insert non-matching rows (or update matching rows in a way that makes them non-matching rows) in the child table produces a 1452 error ( '23000').

    • When a row in the parent table is deleted and at least one child row exists, MariaDB performs an action which depends on the ON DELETE clause of the foreign key.

    • When a value in the column referenced by a foreign key changes and at least one child row exists, MariaDB performs an action which depends on the ON UPDATE clause of the foreign key.

    The allowed actions for ON DELETE and ON UPDATE are:

    • RESTRICT: The change on the parent table is prevented. The statement terminates with a 1451 error ( '2300'). This is the default behavior for both ON DELETE and ON UPDATE.

    • NO ACTION: Synonym for RESTRICT.

    The delete or update operations triggered by foreign keys do not activate and are not counted in the and status variables.

    Foreign key constraints can be disabled by setting the server system variable to 0. This speeds up the insertion of large quantities of data.

    Metadata

    The table contains information about foreign keys. The individual columns are listed in the table.

    The InnoDB-specific Information Schema tables also contain information about the InnoDB foreign keys. The foreign key information is stored in the . Data about the individual columns are stored in .

    The most human-readable way to get information about a table's foreign keys sometimes is the statement.

    Limitations

    Foreign keys have the following limitations in MariaDB:

    • Currently, foreign keys are only supported by InnoDB.

    • Cannot be used with views.

    • The SET DEFAULT action is not supported.

    • Foreign keys actions do not activate .

    Examples

    Let's see an example. We will create an author table and a book table. Both tables have a primary key called id. book also has a foreign key composed by a field called author_id, which refers to the author primary key. The foreign key constraint name is optional, but we'll specify it because we want it to appear in error messages: fk_book_author.

    Now, if we try to insert a book with a non-existing author, we will get an error:

    The error is very descriptive.

    Now, let's try to properly insert two authors and their books:

    It worked!

    Now, let's delete the second author. When we created the foreign key, we specified ON DELETE CASCADE. This should propagate the deletion, and make the deleted author's books disappear:

    We also specified ON UPDATE RESTRICT. This should prevent us from modifying an author's id (the column referenced by the foreign key) if a child row exists:

    REFERENCES

    This page is licensed: CC BY-SA / Gnu FDL

    almost

    alone

    along

    already

    also

    although

    always

    am

    among

    amongst

    an

    and

    another

    any

    anybody

    anyhow

    anyone

    anything

    anyway

    anyways

    anywhere

    apart

    appear

    appreciate

    appropriate

    are

    aren't

    around

    as

    aside

    ask

    asking

    associated

    at

    available

    away

    awfully

    be

    became

    because

    become

    becomes

    becoming

    been

    before

    beforehand

    behind

    being

    believe

    below

    beside

    besides

    best

    better

    between

    beyond

    both

    brief

    but

    by

    c'mon

    c's

    came

    can

    can't

    cannot

    cant

    cause

    causes

    certain

    certainly

    changes

    clearly

    co

    com

    come

    comes

    concerning

    consequently

    consider

    considering

    contain

    containing

    contains

    corresponding

    could

    couldn't

    course

    currently

    definitely

    described

    despite

    did

    didn't

    different

    do

    does

    doesn't

    doing

    don't

    done

    down

    downwards

    during

    each

    edu

    eg

    eight

    either

    else

    elsewhere

    enough

    entirely

    especially

    et

    etc

    even

    ever

    every

    everybody

    everyone

    everything

    everywhere

    ex

    exactly

    example

    except

    far

    few

    fifth

    first

    five

    followed

    following

    follows

    for

    former

    formerly

    forth

    four

    from

    further

    furthermore

    get

    gets

    getting

    given

    gives

    go

    goes

    going

    gone

    got

    gotten

    greetings

    had

    hadn't

    happens

    hardly

    has

    hasn't

    have

    haven't

    having

    he

    he's

    hello

    help

    hence

    her

    here

    here's

    hereafter

    hereby

    herein

    hereupon

    hers

    herself

    hi

    him

    himself

    his

    hither

    hopefully

    how

    howbeit

    however

    i'd

    i'll

    i'm

    i've

    ie

    if

    ignored

    immediate

    in

    inasmuch

    inc

    indeed

    indicate

    indicated

    indicates

    inner

    insofar

    instead

    into

    inward

    is

    isn't

    it

    it'd

    it'll

    it's

    its

    itself

    just

    keep

    keeps

    kept

    know

    knows

    known

    last

    lately

    later

    latter

    latterly

    least

    less

    lest

    let

    let's

    like

    liked

    likely

    little

    look

    looking

    looks

    ltd

    mainly

    many

    may

    maybe

    me

    mean

    meanwhile

    merely

    might

    more

    moreover

    most

    mostly

    much

    must

    my

    myself

    name

    namely

    nd

    near

    nearly

    necessary

    need

    needs

    neither

    never

    nevertheless

    new

    next

    nine

    no

    nobody

    non

    none

    noone

    nor

    normally

    not

    nothing

    novel

    now

    nowhere

    obviously

    of

    off

    often

    oh

    ok

    okay

    old

    on

    once

    one

    ones

    only

    onto

    or

    other

    others

    otherwise

    ought

    our

    ours

    ourselves

    out

    outside

    over

    overall

    own

    particular

    particularly

    per

    perhaps

    placed

    please

    plus

    possible

    presumably

    probably

    provides

    que

    quite

    qv

    rather

    rd

    re

    really

    reasonably

    regarding

    regardless

    regards

    relatively

    respectively

    right

    said

    same

    saw

    say

    saying

    says

    second

    secondly

    see

    seeing

    seem

    seemed

    seeming

    seems

    seen

    self

    selves

    sensible

    sent

    serious

    seriously

    seven

    several

    shall

    she

    should

    shouldn't

    since

    six

    so

    some

    somebody

    somehow

    someone

    something

    sometime

    sometimes

    somewhat

    somewhere

    soon

    sorry

    specified

    specify

    specifying

    still

    sub

    such

    sup

    sure

    t's

    take

    taken

    tell

    tends

    th

    than

    thank

    thanks

    thanx

    that

    that's

    thats

    the

    their

    theirs

    them

    themselves

    then

    thence

    there

    there's

    thereafter

    thereby

    therefore

    therein

    theres

    thereupon

    these

    they

    they'd

    they'll

    they're

    they've

    think

    third

    this

    thorough

    thoroughly

    those

    though

    three

    through

    throughout

    thru

    thus

    to

    together

    too

    took

    toward

    towards

    tried

    tries

    truly

    try

    trying

    twice

    two

    un

    under

    unfortunately

    unless

    unlikely

    until

    unto

    up

    upon

    us

    use

    used

    useful

    uses

    using

    usually

    value

    various

    very

    via

    viz

    vs

    want

    wants

    was

    wasn't

    way

    we

    we'd

    we'll

    we're

    we've

    welcome

    well

    went

    were

    weren't

    what

    what's

    whatever

    when

    whence

    whenever

    where

    where's

    whereafter

    whereas

    whereby

    wherein

    whereupon

    wherever

    whether

    which

    while

    whither

    who

    who's

    whoever

    whole

    whom

    whose

    why

    will

    willing

    wish

    with

    within

    without

    won't

    wonder

    would

    wouldn't

    yes

    yet

    you

    you'd

    you'll

    you're

    you've

    your

    yours

    yourself

    yourselves

    zero

    is

    it

    la

    of

    on

    or

    that

    the

    this

    to

    was

    what

    when

    where

    who

    will

    with

    und

    the

    www

    a's

    able

    about

    above

    according

    accordingly

    across

    actually

    after

    afterwards

    again

    against

    ain't

    all

    allow

    allows

    a

    about

    an

    are

    as

    at

    be

    by

    com

    de

    en

    for

    from

    how

    i

    in

    Fulltext Index Overview
    Full-text-indexes
    innodb_ft_enable_stopword
    innodb_ft_user_stopword_table
    innodb_ft_server_stopword_table
    INNODB_FT_DEFAULT_STOPWORD table
    Information Schema
    VARCHAR

    Trying to drop a table that is referenced by a foreign key produces a 1217 error (SQLSTATE '23000').

  • A TRUNCATE TABLE against a table containing one or more foreign keys is executed as a DELETE without WHERE, so that the foreign keys are enforced for each row.

  • CASCADE
    : The change is allowed and propagates on the child table. For example, if a parent row is deleted, the child row is also deleted; if a parent row's ID changes, the child row's ID will also change.
  • SET NULL: The change is allowed, and the child row's foreign key columns are set to NULL.

  • SET DEFAULT: Only worked with PBXT. Similar to SET NULL, but the foreign key columns were set to their default values. If default values do not exist, an error is produced.

  • If ON UPDATE CASCADE recurses to update the same table it has previously updated during the cascade, it acts like RESTRICT.

  • Indexed generated columns (both VIRTUAL and PERSISTENT) are not supported as InnoDB foreign key indexes.

  • Prior to MariaDB 12.1, foreign key names are required to be unique per database. From MariaDB 12.1, foreign key names are only required to be unique per table.

  • Examples
    CREATE TABLE
    ALTER TABLE
    SHOW INDEX
    TEXT
    BLOB
    PERSISTENT
    SQLSTATE
    SQLSTATE
    triggers
    Com_delete
    Com_update
    foreign_key_checks
    Information Schema
    REFERENTIAL_CONSTRAINTS
    KEY_COLUMN_USAGE
    INNODB_SYS_FOREIGN
    INNODB_SYS_FOREIGN_COLS
    SHOW CREATE TABLE
    triggers
    CREATE TABLE b(for_key INT REFERENCES a(not_key));
    [CONSTRAINT [symbol]] FOREIGN KEY
        [index_name] (index_col_name, ...)
        REFERENCES tbl_name (index_col_name,...)
        [ON DELETE reference_option]
        [ON UPDATE reference_option]
    
    reference_option:
        RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULT
    CREATE TABLE author (
      id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      name VARCHAR(100) NOT NULL
    ) ENGINE = InnoDB;
    
    CREATE TABLE book (
      id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      title VARCHAR(200) NOT NULL,
      author_id SMALLINT UNSIGNED NOT NULL,
      CONSTRAINT `fk_book_author`
        FOREIGN KEY (author_id) REFERENCES author (id)
        ON DELETE CASCADE
        ON UPDATE RESTRICT
    ) ENGINE = InnoDB;
    INSERT INTO book (title, author_id) VALUES ('Necronomicon', 1);
    ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
     (`test`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`) 
      REFERENCES `author` (`id`) ON DELETE CASCADE)
    INSERT INTO author (name) VALUES ('Abdul Alhazred');
    INSERT INTO book (title, author_id) VALUES ('Necronomicon', LAST_INSERT_ID());
    
    INSERT INTO author (name) VALUES ('H.P. Lovecraft');
    INSERT INTO book (title, author_id) VALUES
      ('The call of Cthulhu', LAST_INSERT_ID()),
      ('The colour out of space', LAST_INSERT_ID());
    DELETE FROM author WHERE name = 'H.P. Lovecraft';
    
    SELECT * FROM book;
    +----+--------------+-----------+
    | id | title        | author_id |
    +----+--------------+-----------+
    |  3 | Necronomicon |         1 |
    +----+--------------+-----------+
    UPDATE author SET id = 10 WHERE id = 1;
    ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails 
     (`test`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`) 
      REFERENCES `author` (`id`) ON DELETE CASCADE)
    CREATE TABLE a(a_key INT PRIMARY KEY, not_key INT);
    
    CREATE TABLE b(for_key INT REFERENCES a(not_key));
    ERROR 1005 (HY000): Can't create table `test`.`b` 
      (errno: 150 "Foreign key constraint is incorrectly formed")
    
    CREATE TABLE c(for_key INT REFERENCES a(a_key));
    
    SHOW CREATE TABLE c;
    +-------+----------------------------------------------------------------------------------+
    | Table | Create Table                                                                     |
    +-------+----------------------------------------------------------------------------------+
    | c     | CREATE TABLE `c` (
      `for_key` INT(11) DEFAULT NULL,
      KEY `for_key` (`for_key`),
      CONSTRAINT `c_ibfk_1` FOREIGN KEY (`for_key`) REFERENCES `a` (`a_key`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
    +-------+----------------------------------------------------------------------------------+
    
    INSERT INTO a VALUES (1,10);
    Query OK, 1 row affected (0.004 sec)
    
    INSERT INTO c VALUES (10);
    ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails 
      (`test`.`c`, CONSTRAINT `c_ibfk_1` FOREIGN KEY (`for_key`) REFERENCES `a` (`a_key`))
    
    INSERT INTO c VALUES (1);
    Query OK, 1 row affected (0.004 sec)
    
    SELECT * FROM c;
    +---------+
    | for_key |
    +---------+
    |       1 |
    +---------+
    MariaDB 10.1.7
    MariaDB 10.1.7
    MariaDB 10.1.7
    MariaDB 10.0.1
    MariaDB 10.0.2
    MariaDB 10.4.3
    MariaDB 10.4
    MariaDB 10.5
    Relational databases: Foreign Keys