1 of 8

Regular Expressions Functions

Learn about regular expression functions in MariaDB Server. This section details SQL functions for powerful pattern matching and manipulation of string data using regular expressions.

Regular Expressions Overview

Get an overview of regex usage. This page introduces the pattern matching capabilities and common metacharacters used in MariaDB regular expressions.

Regular Expressions allow MariaDB to perform complex pattern matching on a string. In many cases, the simple pattern matching provided by LIKE is sufficient. LIKE performs two kinds of matches:

_ - the underscore, matching a single character
% - the percentage sign, matching any number of characters.

In other cases you may need more control over the returned matches, and will need to use regular expressions.

Regular expression matches are performed with the function. RLIKE is a synonym for REGEXP.

Comparisons are performed on the byte value, so characters that are treated as equivalent by a collation, but do not have the same byte-value, such as accented characters, could evaluate as unequal.

Without any special characters, a regular expression match is true if the characters match. The match is case-insensitive, except in the case of BINARY strings.

Note that the word being matched must match the whole pattern:

The first returns true because the pattern "Mari" exists in the expression "Maria". When the order is reversed, the result is false, as the pattern "Maria" does not exist in the expression "Mari"

A match can be performed against more than one word with the | character. For example:

Special Characters

The above examples introduce the syntax, but are not very useful on their own. It's the special characters that give regular expressions their power.

^

^ matches the beginning of a string (inside square brackets it can also mean NOT - see below):

$

$ matches the end of a string:

.

. matches any single character:

*

x* matches zero or more of a character x. In the examples below, it's the r character.

+

x+ matches one or more of a character x. In the examples below, it's the r character.

?

x? matches zero or one of a character x. In the examples below, it's the r character.

()

(xyz) - combine a sequence, for example (xyz)+ or (xyz)*

{}

x{n} and x{m,n} This notation is used to match many instances of the x. In the case of x{n} the match must be exactly that many times. In the case of x{m,n}, the match can occur from m to n times. For example, to match zero or one instance of the string ari (which is identical to (ari)?), the following can be used:

[]

[xy] groups characters for matching purposes. For example, to match either the p or the r character:

The square brackets also permit a range match, for example, to match any character from a-z, [a-z] is used. Numeric ranges are also permitted.

The following does not match, as r falls outside of the range a-p.

The ^ character means does NOT match, for example:

The [ and ] characters on their own can be literally matched inside a [] block, without escaping, as long as they immediately match the opening bracket:

Incorrect order, so no match:

The - character can also be matched in the same way:

Word boundaries

The :<: and :>: patterns match the beginning and the end of a word respectively. For example:

Character Classes

There are a number of shortcuts to match particular preset character classes. These are matched with the [:character_class:] pattern (inside a [] set). The following character classes exist:

Character Class

Description

For example:

Remember that matches are by default case-insensitive, unless a binary string is used, so the following example, specifically looking for an uppercase, counter-intuitively matches a lowercase character:

Character Names

There are also number of shortcuts to match particular preset character names. These are matched with the [.character.] pattern (inside a [] set). The following character classes exist:

Name

Character

For example:

Combining

The true power of regular expressions is unleashed when the above is combined, to form more complex examples. Regular expression's reputation for complexity stems from the seeming complexity of multiple combined regular expressions, when in reality, it's simply a matter of understanding the characters and how they apply:

The first example fails to match, as while the Ma matches, either i or r only matches once before the ia characters at the end.

This example matches, as either i or r match exactly twice after the Ma, in this case one r and one i.

Escaping

With the large number of special characters, care needs to be taken to properly escape characters. Two backslash characters, `` (one for the MariaDB parser, one for the regex library), are required to properly escape a character. For example:

To match the literal (Ma:

To match r+: The first two examples are incorrect, as they match r one or more times, not r+:

_{This page is licensed: CC BY-SA / Gnu FDL}

REGEXP

Test if a string matches a regex. This operator returns 1 if the pattern is found in the string, and 0 otherwise.

Syntax

Description

Performs a pattern match of a string expression expr

REGEXP_INSTR

Return the index of a regex match. This function finds the starting position of the first substring that matches the given pattern.

Syntax

REGEXP_INSTR(subject, pattern)

Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

The positions start with 1 and are measured in characters (i.e. not in bytes), which is important for multi-byte character sets. You can cast a multi-byte character set to BINARY to get offsets in bytes.

The function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

The collation case sensitivity can be overwritten using the (?i) and (?-i) PCRE flags.

MariaDB uses the library for enhanced regular expression performance, and REGEXP_INSTR was introduced as part of this enhancement.

Examples

Casting a multi-byte character set as BINARY to get offsets in bytes:

Case sensitivity:

_{This page is licensed: CC BY-SA / Gnu FDL}

REGEXP_REPLACE

Replace regex matches in a string. This function substitutes occurrences of a pattern with a specified replacement string.

Syntax

Description

REGEXP_REPLACE returns the string

REGEXP_SUBSTR

Return the substring matching a regex. This function extracts the actual part of the string that matches the given pattern.

Syntax

REGEXP_SUBSTR(subject,pattern)

Description

Returns the part of the string subject that matches the regular expression pattern, or an empty string if pattern was not found.

The function follows the case sensitivity rules of the effective . Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

The collation case sensitivity can be overwritten using the (?i) and (?-i) PCRE flags.

MariaDB uses the library for enhanced regular expression performance, and REGEXP_SUBSTR was introduced as part of this enhancement.

The variable addresses the remaining compatibilities between PCRE and the old regex library.

Examples

_{This page is licensed: CC BY-SA / Gnu FDL}

RLIKE

Synonym for REGEXP. This operator performs a regular expression match against a string argument.

Syntax

Description

RLIKE is a synonym for

Regular Expressions Overview

Get an overview of regex usage. This page introduces the pattern matching capabilities and common metacharacters used in MariaDB regular expressions.

Regular Expressions allow MariaDB to perform complex pattern matching on a string. In many cases, the simple pattern matching provided by LIKE is sufficient. LIKE performs two kinds of matches:

_ - the underscore, matching a single character
% - the percentage sign, matching any number of characters.

In other cases you may need more control over the returned matches, and will need to use regular expressions.

Regular expression matches are performed with the function. RLIKE is a synonym for REGEXP.

Comparisons are performed on the byte value, so characters that are treated as equivalent by a collation, but do not have the same byte-value, such as accented characters, could evaluate as unequal.

Without any special characters, a regular expression match is true if the characters match. The match is case-insensitive, except in the case of BINARY strings.

Note that the word being matched must match the whole pattern:

The first returns true because the pattern "Mari" exists in the expression "Maria". When the order is reversed, the result is false, as the pattern "Maria" does not exist in the expression "Mari"

A match can be performed against more than one word with the | character. For example:

Special Characters

The above examples introduce the syntax, but are not very useful on their own. It's the special characters that give regular expressions their power.

^

^ matches the beginning of a string (inside square brackets it can also mean NOT - see below):

$

$ matches the end of a string:

.

. matches any single character:

*

x* matches zero or more of a character x. In the examples below, it's the r character.

+

x+ matches one or more of a character x. In the examples below, it's the r character.

?

x? matches zero or one of a character x. In the examples below, it's the r character.

()

(xyz) - combine a sequence, for example (xyz)+ or (xyz)*

{}

[]

[xy] groups characters for matching purposes. For example, to match either the p or the r character:

The square brackets also permit a range match, for example, to match any character from a-z, [a-z] is used. Numeric ranges are also permitted.

The following does not match, as r falls outside of the range a-p.

The ^ character means does NOT match, for example:

The [ and ] characters on their own can be literally matched inside a [] block, without escaping, as long as they immediately match the opening bracket:

Incorrect order, so no match:

The - character can also be matched in the same way:

Word boundaries

The :<: and :>: patterns match the beginning and the end of a word respectively. For example:

Character Classes

There are a number of shortcuts to match particular preset character classes. These are matched with the [:character_class:] pattern (inside a [] set). The following character classes exist:

Character Class

Description

For example:

Character Names

There are also number of shortcuts to match particular preset character names. These are matched with the [.character.] pattern (inside a [] set). The following character classes exist:

Name

Character

For example:

Combining

The first example fails to match, as while the Ma matches, either i or r only matches once before the ia characters at the end.

This example matches, as either i or r match exactly twice after the Ma, in this case one r and one i.

Escaping

To match the literal (Ma:

To match r+: The first two examples are incorrect, as they match r one or more times, not r+:

_{This page is licensed: CC BY-SA / Gnu FDL}

Regular Expressions Functions

Regular Expressions Overview

Special Characters

^

$

.

*

+

?

()

{}

[]

Word boundaries

Character Classes

Character Names

Combining

Escaping

REGEXP

Syntax

Description

REGEXP_INSTR

Syntax

Examples

REGEXP_REPLACE

Syntax

Description

REGEXP_SUBSTR

Syntax

Description

Examples

RLIKE

Syntax

Description

REGEXP_INSTR

Syntax

Examples

REGEXP

Syntax

Description

Examples

default_regex_flags examples

See Also

Regular Expressions Functions

REGEXP_SUBSTR

Syntax

Description

Examples

REGEXP_REPLACE

Syntax

Description

RLIKE

Syntax

Description

Examples

Regular Expressions Overview

Special Characters

^

$

.

*

+

?

()

{}

[]

Word boundaries

Character Classes

Character Names

Combining

Escaping

PCRE - Perl Compatible Regular Expressions

PCRE Versions

PCRE Enhancements

New Regular Expression Functions

PCRE Syntax

Special Characters

Character Classes

Generic Character Types

Unicode Character Properties

Extended Unicode Grapheme Sequence