FULLTEXT Indexes with MariaDB Enterprise Server

Overview

Full-text search can be performed against an InnoDB table using full-text indexes. Some details about full-text indexes with InnoDB:

  • MariaDB Community Server can have up to 64 total indexes for a given table.

  • MariaDB Enterprise Server can have up to 128 total indexes for a given table.

  • InnoDB stores different parts of full-text indexes in multiple places. Part of the full-text index is stored the same tablespace file as the clustered index and data. Other parts of the full-text index are stored in auxiliary tables.

  • Full-text indexes are used to perform full-text search of columns that use the CHAR, VARCHAR, LONGTEXT, TEXT, MEDIUMTEXT, and TINYTEXT data types.

  • Performing a full-text search requires the use of the MATCH() ... AGAINST() clause.

  • Full-text indexes are inverted indexes, which map each word into a set of FTS_DOC_ID values that identify rows where the indexed columns contain the word. The partitioned internal tables FTS_%_INDEX_% are B+ trees, which are data structures that are very efficient for searching for exact values, performing range scans, and checking uniqueness.

  • By default, stop words will be excluded.

  • Certain word lengths will be excluded.

Compatibility

  • MariaDB Enterprise Server 10.2

  • MariaDB Enterprise Server 10.3

  • MariaDB Enterprise Server 10.4

  • MariaDB Enterprise Server 10.5

  • MariaDB Enterprise Server 10.6

Creating an InnoDB Table with a Single Column Full-text Index

Let's create an InnoDB table with a single column full-text index after confirming that the default storage engine is InnoDB:

  1. Connect to the server using MariaDB Client:

    $ mariadb --user=root
    
  2. Confirm that the default storage engine is InnoDB by checking the default_storage_engine system variable using the SHOW SESSION VARIABLES statement:

    SHOW SESSION VARIABLES
       LIKE 'default_storage_engine';
    
    +------------------------+--------+
    | Variable_name          | Value  |
    +------------------------+--------+
    | default_storage_engine | InnoDB |
    +------------------------+--------+
    
  3. If the database does not exist, then create the database for the table using the CREATE DATABASE statement:

    CREATE DATABASE hq_sales;
    
  4. Create the table using the CREATE TABLE statement and specify the full-text index with the FULLTEXT INDEX() clause:

    CREATE TABLE hq_sales.products (
       product_id BIGINT AUTO_INCREMENT NOT NULL,
       product_name VARCHAR(500),
       product_brand VARCHAR(500),
       product_description TEXT,
       PRIMARY KEY(product_id),
       FULLTEXT INDEX(product_description)
    );
    

Creating an InnoDB Table with a Composite Full-text Index

Let's create an InnoDB table with a composite (multi-column) full-text index after confirming that the default storage engine is InnoDB:

  1. Connect to the server using MariaDB Client:

    $ mariadb --user=root
    
  2. Confirm that the default storage engine is InnoDB by checking the default_storage_engine system variable using the SHOW SESSION VARIABLES statement:

    SHOW SESSION VARIABLES
       LIKE 'default_storage_engine';
    
    +------------------------+--------+
    | Variable_name          | Value  |
    +------------------------+--------+
    | default_storage_engine | InnoDB |
    +------------------------+--------+
    
  3. If the database does not exist, then create the database for the table using the CREATE DATABASE statement:

    CREATE DATABASE hq_sales;
    
  4. Create the table using the CREATE TABLE statement and specify the full-text index with the FULLTEXT INDEX() clause:

    CREATE TABLE hq_sales.products (
       product_id BIGINT AUTO_INCREMENT NOT NULL,
       product_name VARCHAR(500),
       product_brand VARCHAR(500),
       product_description TEXT,
       PRIMARY KEY(product_id),
       FULLTEXT INDEX(product_name, product_brand, product_description)
    );
    

Adding a Full-text Index to an InnoDB Table

Let's create an InnoDB table without a full-text index, and then add a full-text index to it:

  1. Connect to the server using MariaDB Client:

    $ mariadb --user=root
    
  2. Confirm that the default storage engine is InnoDB by checking the default_storage_engine system variable using the SHOW SESSION VARIABLES statement:

    SHOW SESSION VARIABLES
       LIKE 'default_storage_engine';
    
    +------------------------+--------+
    | Variable_name          | Value  |
    +------------------------+--------+
    | default_storage_engine | InnoDB |
    +------------------------+--------+
    
  3. If the database does not exist, then create the database for the table using the CREATE DATABASE statement:

    CREATE DATABASE hq_sales;
    
  4. Create the table without a primary key using the CREATE TABLE statement:

    CREATE TABLE hq_sales.products (
       product_id BIGINT AUTO_INCREMENT NOT NULL,
       product_name VARCHAR(500),
       product_brand VARCHAR(500),
       product_description TEXT,
       PRIMARY KEY(product_id)
    );
    
  5. Alter the table using the ALTER TABLE statement and specify the new secondary index with the ADD FULLTEXT INDEX() clause:

    ALTER TABLE hq_sales.products ADD FULLTEXT INDEX (product_description);
    
    • This operation will block concurrent writes to the table until the operation is complete.

    • If the table does not already have the special FTS_DOC_ID column, then InnoDB will add a hidden column with that name.

    • If the hidden FTS_DOC_ID column has to be added, then the table will be rebuilt.

Dropping a Full-text Index from an InnoDB Table

Let's drop the full-text index from the table created in the Creating an InnoDB Table with a Single Column Full-text Index section:

  1. Obtain the name of the index by joining the information_schema.INNODB_SYS_INDEXES, information_schema.INNODB_SYS_TABLES, and information_schema.INNODB_SYS_FIELDS tables:

    SELECT isi.NAME AS index_name, isf.NAME AS index_column
    FROM information_schema.INNODB_SYS_INDEXES isi
    JOIN information_schema.INNODB_SYS_TABLES ist
    ON isi.TABLE_ID = ist.TABLE_ID
    JOIN information_schema.INNODB_SYS_FIELDS isf
    ON isi.INDEX_ID = isf.INDEX_ID
    WHERE ist.NAME = 'hq_sales/products'
    ORDER BY isf.INDEX_ID, isf.POS;
    
    +---------------------+---------------------+
    | index_name          | index_column        |
    +---------------------+---------------------+
    | PRIMARY             | product_id          |
    | product_description | product_description |
    | FTS_DOC_ID_INDEX    | FTS_DOC_ID          |
    +---------------------+---------------------+
    
  2. Alter the table using the ALTER TABLE statement and specify the DROP INDEX clause:

    ALTER TABLE hq_sales.products DROP INDEX product_description;
    

Excluded Words with Full-text Search and InnoDB

InnoDB excludes some words from the full-text index.

Stop words

Stop words are incredibly common words that should not be included in searches, because they would match most data. For example, examples of English stop words would be "and", "or", "the", and "a".

If the innodb_ft_enable_stopword system variable is enabled, then InnoDB excludes stop words from full-text indexes. This system variable is enabled by default, so InnoDB excludes stop words by default.

Stop words have several potential sources:

If a custom table is used for stop words:

Word Lengths

InnoDB also excludes certain words from full-text indexes based on their lengths:

  • If a word is shorter than the value of the innodb_ft_min_token_size system variable, then it will be excluded from full-text search.

  • If a word is longer than the value of the innodb_ft_max_token_size system variable, then it will be excluded from full-text search.

Transactional Considerations

If a transaction affects a full-text index, then InnoDB only updates the index after the transaction has been committed.

Therefore, changes made by the current transaction are excluded from full-text search.

InnoDB's Full-text Index Caches

When a transaction that affects a full-text index is committed, InnoDB does not update the full-text index immediately. Instead, the changes to the index are written to a table-specific in-memory full-text index cache. When the table's full-text index cache exceeds the value of the innodb_ft_cache_size system variable, then the changes are written to the full-text index. The maximum total size of the full-text index caches for all tables cannot exceed the value of the innodb_ft_total_cache_size system variable.

When a full-text search is performed against a table, InnoDB does also search the table's in-memory full-text index cache, so the cached changes are not excluded from results.

Performing a Full-text Search of an InnoDB Table in Natural Language Mode

Natural language mode is the most basic mode of full-text search:

  • It is the default mode of full-text search.

  • It can be specified using IN NATURAL LANGUAGE MODE in the AGAINST() clause, but this is not required, since this mode is used by default.

  • It does not support any operators.

  • It searches the data for a list of comma-separated keywords.

  • By default, stop words will be excluded.

  • Certain word lengths will be excluded.

  • If a keyword is in over 50% of results, then it is excluded from results.

  • It returns results in descending order of relevance.

Let's perform a full-text search in natural language mode of the table created in the Creating an InnoDB Table with a Single Column Full-text Index section:

  1. Insert some data using the INSERT statement:

    INSERT INTO hq_sales.products
       (product_id, product_name, product_brand, product_description)
    VALUES
       (1, 'Gizmo', 'Widget Inc.', 'The best gizmo for performing work'),
       (2, 'Gadget', 'Strawberry Computer Inc.', 'A beautiful gadget that is also great for working'),
       (3, 'Contraption', 'Jungle Inc.', 'A fun contraption that will entertain everyone');
    
  2. Query the table using the SELECT statement with the MATCH() ... AGAINST() clause to perform a full-text search:

    SELECT *
    FROM hq_sales.products
    WHERE MATCH(product_description) AGAINST('work, working');
    
    • The columns to search are specified as an argument to MATCH().

    • The keywords to search for are specified as an argument to AGAINST(), with each keyword separated by a comma.

    Example output:

    +------------+--------------+--------------------------+---------------------------------------------------+
    | product_id | product_name | product_brand            | product_description                               |
    +------------+--------------+--------------------------+---------------------------------------------------+
    |          1 | Gizmo        | Widget Inc.              | The best gizmo for performing work                |
    |          2 | Gadget       | Strawberry Computer Inc. | A beautiful gadget that is also great for working |
    +------------+--------------+--------------------------+---------------------------------------------------+
    

Performing Full-text Search of an InnoDB Table in Boolean Mode

Boolean mode is a more advanced mode of full-text search:

  • It can be specified using IN BOOLEAN MODE in the AGAINST() clause.

  • It supports several operators, which are specified below.

  • It searches the data for a list of keywords that can contain the supported operators.

  • By default, stop words will be excluded.

  • Certain word lengths will be excluded.

  • If a keyword is in over 50% of results, then it is not excluded from results.

  • It does not return results in order of relevance.

Boolean mode adds support for several operators:

Operator

Description

+

The word is mandatory in all rows returned.

-

The word cannot appear in any row returned.

<

The word that follows has a lower relevance than other words, although rows containing it will still match

>

The word that follows has a higher relevance than other words.

()

Used to group words into sub-expressions.

~

The word following contributes negatively to the relevance of the row (which is different to the '-' operator, which specifically excludes the word, or the '<' operator, which still causes the word to contribute positively to the relevance of the row.

*

The wildcard, indicating zero or more characters. It can only appear at the end of a word.

"

Anything enclosed in the double quotes is taken as a whole (so you can match phrases, for example).

Let's perform a full-text search of the table created in the Creating an InnoDB Table with a Single Column Full-text Index section:

  1. Insert some data using the INSERT statement:

    INSERT INTO hq_sales.products
       (product_id, product_name, product_brand, product_description)
    VALUES
       (1, 'Gizmo', 'Widget Inc.', 'The best gizmo for performing work'),
       (2, 'Gadget', 'Strawberry Computer Inc.', 'A beautiful gadget that is also great for working'),
       (3, 'Contraption', 'Jungle Inc.', 'A fun contraption that will entertain everyone');
    
  2. Query the table using the SELECT statement with the MATCH() ... AGAINST() clause to perform a full-text search:

    SELECT *
    FROM hq_sales.products
    WHERE MATCH(product_description) AGAINST('work* -gizmo <fun' IN BOOLEAN MODE);
    
    • The columns to search are specified as an argument to MATCH().

    • The keywords to search for are specified as an argument to AGAINST(), with each keyword separated by a comma.

    • To use boolean mode, specify IN BOOLEAN MODE after the keyword list.

    • When boolean mode is used, the keyword list can contain the supported operators listed above.

    Example output:

    +------------+--------------+--------------------------+---------------------------------------------------+
    | product_id | product_name | product_brand            | product_description                               |
    +------------+--------------+--------------------------+---------------------------------------------------+
    |          2 | Gadget       | Strawberry Computer Inc. | A beautiful gadget that is also great for working |
    |          3 | Contraption  | Jungle Inc.              | A fun contraption that will entertain everyone    |
    +------------+--------------+--------------------------+---------------------------------------------------+
    

Performing Full-text Search of an InnoDB Table with Query Expansion

Natural language mode with query expansion is similar to natural language mode of full-text search, but with some differences:

  • It can be specified using IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or just using WITH QUERY EXPANSION in the AGAINST() clause.

  • It does not support any operators.

  • It searches the data for a list of comma-separated keywords.

  • By default, stop words will be excluded.

  • Certain word lengths will be excluded.

  • If a keyword is in over 50% of results, then it is excluded from results.

  • It detects common words from the most relevant results, and then it performs another search using those words.

  • It returns results in descending order of relevance.