All pages
Powered by GitBook
1 of 3

Loading...

Loading...

Loading...

Full-Text Indexes

Implement full-text indexes in MariaDB Server for efficient text search. This section guides you through creating and utilizing these indexes to optimize queries on large text datasets.

Full-Text Index Stopwords

Stopwords are used to provide a list of commonly-used words that can be ignored for the purposes of Full-text-indexes.

Full-text indexes built in MyISAM and InnoDB have different stopword lists by default.

MyISAM Stopwords

For full-text indexes on MyISAM tables, by default, the list is built from the file storage/myisam/ft_static.c, and searched using the server's character set and collation. The ft_stopword_file system variable allows the default list to be overridden with words from another file, or for stopwords to be ignored altogether.

If the stopword list is changed, any existing full-text indexes need to be rebuilt

The following table shows the default list of stopwords, although you should always treat storage/myisam/ft_static.c as the definitive list. See the for more details, and for related articles.

InnoDB Stopwords

Stopwords on full-text indexes are only enabled if the system variable is set (by default it is) at the time the index was created.

The stopword list is determined as follows:

  • If the system variable is set, that table is used as a stopword list.

  • If innodb_ft_user_stopword_table is not set, the table set by is used.

  • If neither variable is set, the built-in list is used, which can be viewed by querying the in the .

In the first two cases, the specified table must exist at the time the system variable is set and the full-text index created. It must be an InnoDB table with a single column, a named VALUE.

The default InnoDB stopword list differs from the default MyISAM list, being much shorter, and contains the following words:

This page is licensed: CC BY-SA / Gnu FDL

almost

alone

along

already

also

although

always

am

among

amongst

an

and

another

any

anybody

anyhow

anyone

anything

anyway

anyways

anywhere

apart

appear

appreciate

appropriate

are

aren't

around

as

aside

ask

asking

associated

at

available

away

awfully

be

became

because

become

becomes

becoming

been

before

beforehand

behind

being

believe

below

beside

besides

best

better

between

beyond

both

brief

but

by

c'mon

c's

came

can

can't

cannot

cant

cause

causes

certain

certainly

changes

clearly

co

com

come

comes

concerning

consequently

consider

considering

contain

containing

contains

corresponding

could

couldn't

course

currently

definitely

described

despite

did

didn't

different

do

does

doesn't

doing

don't

done

down

downwards

during

each

edu

eg

eight

either

else

elsewhere

enough

entirely

especially

et

etc

even

ever

every

everybody

everyone

everything

everywhere

ex

exactly

example

except

far

few

fifth

first

five

followed

following

follows

for

former

formerly

forth

four

from

further

furthermore

get

gets

getting

given

gives

go

goes

going

gone

got

gotten

greetings

had

hadn't

happens

hardly

has

hasn't

have

haven't

having

he

he's

hello

help

hence

her

here

here's

hereafter

hereby

herein

hereupon

hers

herself

hi

him

himself

his

hither

hopefully

how

howbeit

however

i'd

i'll

i'm

i've

ie

if

ignored

immediate

in

inasmuch

inc

indeed

indicate

indicated

indicates

inner

insofar

instead

into

inward

is

isn't

it

it'd

it'll

it's

its

itself

just

keep

keeps

kept

know

knows

known

last

lately

later

latter

latterly

least

less

lest

let

let's

like

liked

likely

little

look

looking

looks

ltd

mainly

many

may

maybe

me

mean

meanwhile

merely

might

more

moreover

most

mostly

much

must

my

myself

name

namely

nd

near

nearly

necessary

need

needs

neither

never

nevertheless

new

next

nine

no

nobody

non

none

noone

nor

normally

not

nothing

novel

now

nowhere

obviously

of

off

often

oh

ok

okay

old

on

once

one

ones

only

onto

or

other

others

otherwise

ought

our

ours

ourselves

out

outside

over

overall

own

particular

particularly

per

perhaps

placed

please

plus

possible

presumably

probably

provides

que

quite

qv

rather

rd

re

really

reasonably

regarding

regardless

regards

relatively

respectively

right

said

same

saw

say

saying

says

second

secondly

see

seeing

seem

seemed

seeming

seems

seen

self

selves

sensible

sent

serious

seriously

seven

several

shall

she

should

shouldn't

since

six

so

some

somebody

somehow

someone

something

sometime

sometimes

somewhat

somewhere

soon

sorry

specified

specify

specifying

still

sub

such

sup

sure

t's

take

taken

tell

tends

th

than

thank

thanks

thanx

that

that's

thats

the

their

theirs

them

themselves

then

thence

there

there's

thereafter

thereby

therefore

therein

theres

thereupon

these

they

they'd

they'll

they're

they've

think

third

this

thorough

thoroughly

those

though

three

through

throughout

thru

thus

to

together

too

took

toward

towards

tried

tries

truly

try

trying

twice

two

un

under

unfortunately

unless

unlikely

until

unto

up

upon

us

use

used

useful

uses

using

usually

value

various

very

via

viz

vs

want

wants

was

wasn't

way

we

we'd

we'll

we're

we've

welcome

well

went

were

weren't

what

what's

whatever

when

whence

whenever

where

where's

whereafter

whereas

whereby

wherein

whereupon

wherever

whether

which

while

whither

who

who's

whoever

whole

whom

whose

why

will

willing

wish

with

within

without

won't

wonder

would

wouldn't

yes

yet

you

you'd

you'll

you're

you've

your

yours

yourself

yourselves

zero

is

it

la

of

on

or

that

the

this

to

was

what

when

where

who

will

with

und

the

www

a's

able

about

above

according

accordingly

across

actually

after

afterwards

again

against

ain't

all

allow

allows

a

about

an

are

as

at

be

by

com

de

en

for

from

how

i

in

Fulltext Index Overview
Full-text-indexes
innodb_ft_enable_stopword
innodb_ft_user_stopword_table
innodb_ft_server_stopword_table
INNODB_FT_DEFAULT_STOPWORD table
Information Schema
VARCHAR

Full-Text Index Overview

MariaDB has support for full-text indexing and searching:

  • A full-text index in MariaDB is an index of type FULLTEXT, and it allows more options when searching for portions of text from a field.

  • Full-text indexes can be used only with MyISAM, Aria, InnoDB and Mroonga tables, and can be created only for CHAR, VARCHAR, or TEXT columns.

  • Partitioned tables cannot contain fulltext indexes, even if the storage engine supports them.

  • A FULLTEXT index definition can be given in the statement when a table is created, or added later using or .

  • For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.

Full-text searching is performed using syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a literal string, not a variable or a column name.

Excluded Results

  • Partial words are excluded.

  • Words less than 4 (MyISAM) or 3 (InnoDB) characters in length will not be stored in the fulltext index. This value can be adjusted by changing the system variable (or, for , ).

  • Words longer than 84 characters in length will also not be stored in the fulltext index. This values can be adjusted by changing the system variable (or, for , ).

Relevance

MariaDB calculates a relevance for each result, based on a number of factors, including the number of words in the index, the number of unique words in a row, the total number of words in both the index and the result, and the weight of the word. In English, 'cool' will be weighted less than 'dandy', at least at present! The relevance can be returned as part of a query simply by using the MATCH function in the field list.

Types of Full-Text search

IN NATURAL LANGUAGE MODE

IN NATURAL LANGUAGE MODE is the default type of full-text search, and the keywords can be omitted. There are no special operators, and searches consist of one or more comma-separated keywords.

Searches are returned in descending order of relevance.

IN BOOLEAN MODE

Boolean search permits the use of a number of special operators:

Operator
Description

Searches are not returned in order of relevance, and nor does the 50% limit apply. Stopwords and word minimum and maximum lengths still apply as usual.

WITH QUERY EXPANSION

A query expansion search is a modification of a natural language search. The search string is used to perform a regular natural language search. Then, words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. It can be useful when relying on implied knowledge within the data, for example that MariaDB is a database.

Examples

Creating a table, and performing a basic search:

Multiple words:

Since 'Once' is a , no result is returned:

Inserting the word 'wicked' into more than half the rows excludes it from the results:

Using IN BOOLEAN MODE to overcome the 50% limitation:

Returning the relevance:

WITH QUERY EXPANSION. In the following example, 'MariaDB' is always associated with the word 'database', so it is returned when query expansion is used, even though not explicitly requested.

Partial word matching with IN BOOLEAN MODE:

Using boolean operators

See Also

  • For simpler searches of a substring in text columns, see the operator.

This page is licensed: CC BY-SA / Gnu FDL

Stopwords are a list of common words such as "once" or "then" that do not reflect in the search results unless IN BOOLEAN MODE is used. The stopword list for MyISAM/Aria tables and InnoDB tables can differ. See
for details and a full list, as well as for details on how to change the default list.
  • For MyISAM/Aria fulltext indexes only, if a word appears in more than half the rows, it is also excluded from the results of a fulltext search.

  • For InnoDB indexes, only committed rows appear - modifications from the current transaction do not apply.

  • *

    The wildcard, indicating zero or more characters. It can only appear at the end of a word.

    "

    Anything enclosed in the double quotes is taken as a whole (so you can match phrases, for example).

    +

    The word is mandatory in all rows returned.

    -

    The word cannot appear in any row returned.

    <

    The word that follows has a lower relevance than other words, although rows containing it will still match

    >

    The word that follows has a higher relevance than other words.

    ()

    Used to group words into subexpressions.

    ~

    The word following contributes negatively to the relevance of the row (which is different to the '-' operator, which specifically excludes the word, or the '<' operator, which still causes the word to contribute positively to the relevance of the row.

    CREATE TABLE
    ALTER TABLE
    CREATE INDEX
    MATCH() ... AGAINST
    ft_min_word_length
    InnoDB
    innodb_ft_min_token_size
    ft_max_word_length
    InnoDB
    innodb_ft_max_token_size
    stopword
    LIKE
    stopwords
    MATCH (col1,col2,...) AGAINST (expr [search_modifier])
    CREATE TABLE ft_myisam(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
    
    INSERT INTO ft_myisam(copy) VALUES ('Once upon a time'),
      ('There was a wicked witch'), ('Who ate everybody up');
    
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
    +--------------------------+
    | copy                     |
    +--------------------------+
    | There was a wicked witch |
    +--------------------------+
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked,witch');
    +---------------------------------+
    | copy                            |
    +---------------------------------+
    | There was a wicked witch        |
    +---------------------------------+
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('Once');
    Empty set (0.00 sec)
    INSERT INTO ft_myisam(copy) VALUES ('Once upon a wicked time'),
      ('There was a wicked wicked witch'), ('Who ate everybody wicked up');
    
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked');
    Empty set (0.00 sec)
    SELECT * FROM ft_myisam WHERE MATCH(copy) AGAINST('wicked' IN BOOLEAN MODE);
    +---------------------------------+
    | copy                            |
    +---------------------------------+
    | There was a wicked witch        |
    | Once upon a wicked time         |
    | There was a wicked wicked witch |
    | Who ate everybody wicked up     |
    +---------------------------------+
    SELECT copy,MATCH(copy) AGAINST('witch') AS relevance 
      FROM ft_myisam WHERE MATCH(copy) AGAINST('witch');
    +---------------------------------+--------------------+
    | copy                            | relevance          |
    +---------------------------------+--------------------+
    | There was a wicked witch        | 0.6775632500648499 |
    | There was a wicked wicked witch | 0.5031757950782776 |
    +---------------------------------+--------------------+
    CREATE TABLE ft2(copy TEXT,FULLTEXT(copy)) ENGINE=MyISAM;
    
    INSERT INTO ft2(copy) VALUES
     ('MySQL vs MariaDB database'),
     ('Oracle vs MariaDB database'), 
     ('PostgreSQL vs MariaDB database'),
     ('MariaDB overview'),
     ('Foreign keys'),
     ('Primary keys'),
     ('Indexes'),
     ('Transactions'),
     ('Triggers');
    
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database');
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    +--------------------------------+
    3 rows in set (0.00 sec)
    
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('database' WITH QUERY EXPANSION);
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    | MariaDB overview               |
    +--------------------------------+
    4 rows in set (0.00 sec)
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('Maria*' IN BOOLEAN MODE);
    +--------------------------------+
    | copy                           |
    +--------------------------------+
    | MySQL vs MariaDB database      |
    | Oracle vs MariaDB database     |
    | PostgreSQL vs MariaDB database |
    | MariaDB overview               |
    +--------------------------------+
    SELECT * FROM ft2 WHERE MATCH(copy) AGAINST('+MariaDB -database' 
      IN BOOLEAN MODE);
    +------------------+
    | copy             |
    +------------------+
    | MariaDB overview |
    +------------------+