All pages
Powered by GitBook
1 of 8

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Regular Expressions Functions

Learn about regular expression functions in MariaDB Server. This section details SQL functions for powerful pattern matching and manipulation of string data using regular expressions.

Regular Expressions Overview

Get an overview of regex usage. This page introduces the pattern matching capabilities and common metacharacters used in MariaDB regular expressions.

Regular Expressions allow MariaDB to perform complex pattern matching on a string. In many cases, the simple pattern matching provided by LIKE is sufficient. LIKE performs two kinds of matches:

  • _ - the underscore, matching a single character

  • % - the percentage sign, matching any number of characters.

In other cases you may need more control over the returned matches, and will need to use regular expressions.

Regular expression matches are performed with the function. RLIKE is a synonym for REGEXP.

Comparisons are performed on the byte value, so characters that are treated as equivalent by a collation, but do not have the same byte-value, such as accented characters, could evaluate as unequal.

Without any special characters, a regular expression match is true if the characters match. The match is case-insensitive, except in the case of BINARY strings.

Note that the word being matched must match the whole pattern:

The first returns true because the pattern "Mari" exists in the expression "Maria". When the order is reversed, the result is false, as the pattern "Maria" does not exist in the expression "Mari"

A match can be performed against more than one word with the | character. For example:

Special Characters

The above examples introduce the syntax, but are not very useful on their own. It's the special characters that give regular expressions their power.

^

^ matches the beginning of a string (inside square brackets it can also mean NOT - see below):

$

$ matches the end of a string:

.

. matches any single character:

*

x* matches zero or more of a character x. In the examples below, it's the r character.

+

x+ matches one or more of a character x. In the examples below, it's the r character.

?

x? matches zero or one of a character x. In the examples below, it's the r character.

()

(xyz) - combine a sequence, for example (xyz)+ or (xyz)*

{}

x{n} and x{m,n} This notation is used to match many instances of the x. In the case of x{n} the match must be exactly that many times. In the case of x{m,n}, the match can occur from m to n times. For example, to match zero or one instance of the string ari (which is identical to (ari)?), the following can be used:

[]

[xy] groups characters for matching purposes. For example, to match either the p or the r character:

The square brackets also permit a range match, for example, to match any character from a-z, [a-z] is used. Numeric ranges are also permitted.

The following does not match, as r falls outside of the range a-p.

^

The ^ character means does NOT match, for example:

The [ and ] characters on their own can be literally matched inside a [] block, without escaping, as long as they immediately match the opening bracket:

Incorrect order, so no match:

The - character can also be matched in the same way:

Word boundaries

The :<: and :>: patterns match the beginning and the end of a word respectively. For example:

Character Classes

There are a number of shortcuts to match particular preset character classes. These are matched with the [:character_class:] pattern (inside a [] set). The following character classes exist:

Character Class
Description

For example:

Remember that matches are by default case-insensitive, unless a binary string is used, so the following example, specifically looking for an uppercase, counter-intuitively matches a lowercase character:

Character Names

There are also number of shortcuts to match particular preset character names. These are matched with the [.character.] pattern (inside a [] set). The following character classes exist:

Name
Character

For example:

Combining

The true power of regular expressions is unleashed when the above is combined, to form more complex examples. Regular expression's reputation for complexity stems from the seeming complexity of multiple combined regular expressions, when in reality, it's simply a matter of understanding the characters and how they apply:

The first example fails to match, as while the Ma matches, either i or r only matches once before the ia characters at the end.

This example matches, as either i or r match exactly twice after the Ma, in this case one r and one i.

Escaping

With the large number of special characters, care needs to be taken to properly escape characters. Two backslash characters, `` (one for the MariaDB parser, one for the regex library), are required to properly escape a character. For example:

To match the literal (Ma:

To match r+: The first two examples are incorrect, as they match r one or more times, not r+:

This page is licensed: CC BY-SA / Gnu FDL

lower

Lowercase alphabetic

print

Graphic or space characters

punct

Punctuation

space

Space, tab, newline, and carriage return

upper

Uppercase alphabetic

xdigit

Hexadecimal digit

006

BEL

007

alert

007

BS

010

backspace

'\b'

HT

011

tab

'\t'

LF

012

newline

'\n'

VT

013

vertical-tab

'\v'

FF

014

form-feed

'\f'

CR

015

carriage-return

'\r'

SO

016

SI

017

DLE

020

DC1

021

DC2

022

DC3

023

DC4

024

NAK

025

SYN

026

ETB

027

CAN

030

EM

031

SUB

032

ESC

033

IS4

034

FS

034

IS3

035

GS

035

IS2

036

RS

036

IS1

037

US

037

space

' '

exclamation-mark

'!'

quotation-mark

'"'

number-sign

'#'

dollar-sign

'$'

percent-sign

'%'

ampersand

'&'

apostrophe

'''

left-parenthesis

'('

right-parenthesis

')'

asterisk

'*'

plus-sign

'+'

comma

','

hyphen

'-'

hyphen-minus

'-'

period

'.'

full-stop

'.'

slash

'/'

solidus

'/'

zero

'0'

one

'1'

two

'2'

three

'3'

four

'4'

five

'5'

six

'6'

seven

'7'

eight

'8'

nine

'9'

colon

':'

semicolon

';'

less-than-sign

'<'

equals-sign

'='

greater-than-sign

'>'

question-mark

'?'

commercial-at

'@'

left-square-bracket

'['

backslash

''

reverse-solidus

''

right-square-bracket

']'

circumflex

'^'

circumflex-accent

'^'

underscore

'_'

low-line

'_'

grave-accent

'`'

left-brace

'{'

left-curly-bracket

'{'

vertical-line

'

right-brace

'}'

right-curly-bracket

'}'

tilde

''

DEL

177

alnum

Alphanumeric

alpha

Alphabetic

blank

Whitespace

cntrl

Control characters

digit

Digits

graph

Graphic characters

NUL

0

SOH

001

STX

002

ETX

003

EOT

004

ENQ

005

REGEXP

ACK

RLIKE

Synonym for REGEXP. This operator performs a regular expression match against a string argument.

Syntax

Description

RLIKE is a synonym for

SELECT 'Maria' REGEXP 'Maria';
+------------------------+
| 'Maria' REGEXP 'Maria' |
+------------------------+
|                      1 |
+------------------------+

SELECT 'Maria' REGEXP 'maria';
+------------------------+
| 'Maria' REGEXP 'maria' |
+------------------------+
|                      1 |
+------------------------+

SELECT BINARY 'Maria' REGEXP 'maria';
+-------------------------------+
| BINARY 'Maria' REGEXP 'maria' |
+-------------------------------+
|                             0 |
+-------------------------------+
SELECT 'Maria' REGEXP 'Mari';
+-----------------------+
| 'Maria' REGEXP 'Mari' |
+-----------------------+
|                     1 |
+-----------------------+

SELECT 'Mari' REGEXP 'Maria';
+-----------------------+
| 'Mari' REGEXP 'Maria' |
+-----------------------+
|                     0 |
+-----------------------+
SELECT 'Maria' REGEXP 'Monty|Maria';
+------------------------------+
| 'Maria' REGEXP 'Monty|Maria' |
+------------------------------+
|                            1 |
+------------------------------+
SELECT 'Maria' REGEXP '^Ma';
+----------------------+
| 'Maria' REGEXP '^Ma' |
+----------------------+
|                    1 |
+----------------------+
SELECT 'Maria' REGEXP 'ia$';
+----------------------+
| 'Maria' REGEXP 'ia$' |
+----------------------+
|                    1 |
+----------------------+
SELECT 'Maria' REGEXP 'Ma.ia';
+------------------------+
| 'Maria' REGEXP 'Ma.ia' |
+------------------------+
|                      1 |
+------------------------+

SELECT 'Maria' REGEXP 'Ma..ia';
+-------------------------+
| 'Maria' REGEXP 'Ma..ia' |
+-------------------------+
|                       0 |
+-------------------------+
SELECT 'Maria' REGEXP 'Mar*ia';
+-------------------------+
| 'Maria' REGEXP 'Mar*ia' |
+-------------------------+
|                       1 |
+-------------------------+

SELECT 'Maia' REGEXP 'Mar*ia';
+------------------------+
| 'Maia' REGEXP 'Mar*ia' |
+------------------------+
|                      1 |
+------------------------+

SELECT 'Marrria' REGEXP 'Mar*ia';
+---------------------------+
| 'Marrria' REGEXP 'Mar*ia' |
+---------------------------+
|                         1 |
+---------------------------+
SELECT 'Maria' REGEXP 'Mar+ia';
+-------------------------+
| 'Maria' REGEXP 'Mar+ia' |
+-------------------------+
|                       1 |
+-------------------------+

SELECT 'Maia' REGEXP 'Mar+ia';
+------------------------+
| 'Maia' REGEXP 'Mar+ia' |
+------------------------+
|                      0 |
+------------------------+

SELECT 'Marrria' REGEXP 'Mar+ia';
+---------------------------+
| 'Marrria' REGEXP 'Mar+ia' |
+---------------------------+
|                         1 |
+---------------------------+
SELECT 'Maria' REGEXP 'Mar?ia';
+-------------------------+
| 'Maria' REGEXP 'Mar?ia' |
+-------------------------+
|                       1 |
+-------------------------+

SELECT 'Maia' REGEXP 'Mar?ia';
+------------------------+
| 'Maia' REGEXP 'Mar?ia' |
+------------------------+
|                      1 |
+------------------------+

SELECT 'Marrria' REGEXP 'Mar?ia';
+---------------------------+
| 'Marrria' REGEXP 'Mar?ia' |
+---------------------------+
|                         0 |
+---------------------------+
SELECT 'Maria' REGEXP '(ari)+';
+-------------------------+
| 'Maria' REGEXP '(ari)+' |
+-------------------------+
|                       1 |
+-------------------------+
SELECT 'Maria' REGEXP '(ari){0,1}';
+-----------------------------+
| 'Maria' REGEXP '(ari){0,1}' |
+-----------------------------+
|                           1 |
+-----------------------------+
SELECT 'Maria' REGEXP 'Ma[pr]ia';
+---------------------------+
| 'Maria' REGEXP 'Ma[pr]ia' |
+---------------------------+
|                         1 |
+---------------------------+
SELECT 'Maria' REGEXP 'Ma[a-z]ia';
+----------------------------+
| 'Maria' REGEXP 'Ma[a-z]ia' |
+----------------------------+
|                          1 |
+----------------------------+
SELECT 'Maria' REGEXP 'Ma[a-p]ia';
+----------------------------+
| 'Maria' REGEXP 'Ma[a-p]ia' |
+----------------------------+
|                          0 |
+----------------------------+
SELECT 'Maria' REGEXP 'Ma[^p]ia';
+---------------------------+
| 'Maria' REGEXP 'Ma[^p]ia' |
+---------------------------+
|                         1 |
+---------------------------+

SELECT 'Maria' REGEXP 'Ma[^r]ia';
+---------------------------+
| 'Maria' REGEXP 'Ma[^r]ia' |
+---------------------------+
|                         0 |
+---------------------------+
SELECT '[Maria' REGEXP '[[]';
+-----------------------+
| '[Maria' REGEXP '[[]' |
+-----------------------+
|                     1 |
+-----------------------+

SELECT '[Maria' REGEXP '[]]';
+-----------------------+
| '[Maria' REGEXP '[]]' |
+-----------------------+
|                     0 |
+-----------------------+

SELECT ']Maria' REGEXP '[]]';
+-----------------------+
| ']Maria' REGEXP '[]]' |
+-----------------------+
|                     1 |
+-----------------------+

SELECT ']Maria' REGEXP '[]a]';
+------------------------+
| ']Maria' REGEXP '[]a]' |
+------------------------+
|                      1 |
+------------------------+
SELECT ']Maria' REGEXP '[a]]';
+------------------------+
| ']Maria' REGEXP '[a]]' |
+------------------------+
|                      0 |
+------------------------+
SELECT '-Maria' REGEXP '[1-10]';
+--------------------------+
| '-Maria' REGEXP '[1-10]' |
+--------------------------+
|                        0 |
+--------------------------+

SELECT '-Maria' REGEXP '[-1-10]';
+---------------------------+
| '-Maria' REGEXP '[-1-10]' |
+---------------------------+
|                         1 |
+---------------------------+
SELECT 'How do I upgrade MariaDB?' REGEXP '[[:<:]]MariaDB[[:>:]]';
+------------------------------------------------------------+
| 'How do I upgrade MariaDB?' REGEXP '[[:<:]]MariaDB[[:>:]]' |
+------------------------------------------------------------+
|                                                          1 |
+------------------------------------------------------------+

SELECT 'How do I upgrade MariaDB?' REGEXP '[[:<:]]Maria[[:>:]]';
+----------------------------------------------------------+
| 'How do I upgrade MariaDB?' REGEXP '[[:<:]]Maria[[:>:]]' |
+----------------------------------------------------------+
|                                                        0 |
+----------------------------------------------------------+
SELECT 'Maria' REGEXP 'Mar[[:alnum:]]*';
+--------------------------------+
| 'Maria' REGEXP 'Mar[:alnum:]*' |
+--------------------------------+
|                              1 |
+--------------------------------+
SELECT 'Mari' REGEXP 'Mar[[:upper:]]+';
+---------------------------------+
| 'Mari' REGEXP 'Mar[[:upper:]]+' |
+---------------------------------+
|                               1 |
+---------------------------------+

SELECT BINARY 'Mari' REGEXP 'Mar[[:upper:]]+';
+----------------------------------------+
| BINARY 'Mari' REGEXP 'Mar[[:upper:]]+' |
+----------------------------------------+
|                                      0 |
+----------------------------------------+
SELECT '|' REGEXP '[[.vertical-line.]]';
+----------------------------------+
| '|' REGEXP '[[.vertical-line.]]' |
+----------------------------------+
|                                1 |
+----------------------------------+
SELECT 'Maria' REGEXP 'Ma[ir]{2}ia';
+------------------------------+
| 'Maria' REGEXP 'Ma[ir]{2}ia' |
+------------------------------+
|                            0 |
+------------------------------+
SELECT 'Maria' REGEXP 'Ma[ir]{2}';
+----------------------------+
| 'Maria' REGEXP 'Ma[ir]{2}' |
+----------------------------+
|                          1 |
+----------------------------+
SELECT '(Maria)' REGEXP '(Ma';
ERROR 1139 (42000): Got error 'parentheses not balanced' from regexp

SELECT '(Maria)' REGEXP '\(Ma';
ERROR 1139 (42000): Got error 'parentheses not balanced' from regexp

SELECT '(Maria)' REGEXP '\\(Ma';
+--------------------------+
| '(Maria)' REGEXP '\\(Ma' |
+--------------------------+
|                        1 |
+--------------------------+
SELECT 'Mar+ia' REGEXP 'r+';
+----------------------+
| 'Mar+ia' REGEXP 'r+' |
+----------------------+
|                    1 |
+----------------------+

SELECT 'Maria' REGEXP 'r+';
+---------------------+
| 'Maria' REGEXP 'r+' |
+---------------------+
|                   1 |
+---------------------+

SELECT 'Maria' REGEXP 'r\\+';
+-----------------------+
| 'Maria' REGEXP 'r\\+' |
+-----------------------+
|                     0 |
+-----------------------+

SELECT 'Maria' REGEXP 'r+';
+---------------------+
| 'Maria' REGEXP 'r+' |
+---------------------+
|                   1 |
+---------------------+
.

This page is licensed: CC BY-SA / Gnu FDL

expr REGEXP pat, expr RLIKE pat
REGEXP

REGEXP_SUBSTR

Return the substring matching a regex. This function extracts the actual part of the string that matches the given pattern.

Syntax

REGEXP_SUBSTR(subject,pattern)

Description

Returns the part of the string subject that matches the regular expression pattern, or an empty string if pattern was not found.

The function follows the case sensitivity rules of the effective . Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

The collation case sensitivity can be overwritten using the (?i) and (?-i) PCRE flags.

MariaDB uses the library for enhanced regular expression performance, and REGEXP_SUBSTR was introduced as part of this enhancement.

The variable addresses the remaining compatibilities between PCRE and the old regex library.

Examples

This page is licensed: CC BY-SA / Gnu FDL

REGEXP_REPLACE

Replace regex matches in a string. This function substitutes occurrences of a pattern with a specified replacement string.

Syntax

Description

REGEXP_REPLACE returns the string

subject
with all occurrences of the regular expression
pattern
replaced by the string
replace
. If no occurrences are found, then
subject
is returned as is.

The replace string can have backreferences to the subexpressions in the form \N, where N is a number from 1 to 9.

The function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

The collation case sensitivity can be overwritten using the (?i) and (?-i) PCRE flags.

MariaDB uses the PCRE regular expression library for enhanced regular expression performance, and REGEXP_REPLACE was introduced as part of this enhancement.

The default_regex_flags variable addresses the remaining compatibilities between PCRE and the old regex library.

Examples

Backreferences to the subexpressions in the form , where N is a number from 1 to 9:

Case insensitive and case sensitive matches:

Overwriting the collation case sensitivity using the (?i) and (?-i) PCRE flags.

This page is licensed: CC BY-SA / Gnu FDL

REGEXP_REPLACE(subject, pattern, replace)
SELECT REGEXP_SUBSTR('ab12cd','[0-9]+');
-> 12

SELECT REGEXP_SUBSTR(
  'See https://mariadb.org/en/foundation/ for details',
  'https?://[^/]*');
-> https://mariadb.org
collation
PCRE regular expression
default_regex_flags

REGEXP

Test if a string matches a regex. This operator returns 1 if the pattern is found in the string, and 0 otherwise.

Syntax

Description

Performs a pattern match of a string expression expr against a patternpat. The pattern can be an extended regular expression. See for details on the syntax for regular expressions (see also ).

Returns 1 if expr matches pat or 0 if it doesn't match. If either expr or pat are NULL, the result is NULL.

The negative form also exists, as an alias for NOT (string REGEXP pattern). RLIKE and NOT RLIKE are synonyms for REGEXP and NOT REGEXP, originally provided for mSQL compatibility.

The pattern need not be a literal string. For example, it can be specified as a string expression or table column.

Note: Because MariaDB uses the C escape syntax in strings (for example, "\n" to represent the newline character), you must double any "" that you use in your REGEXP strings.

REGEXP is not case sensitive, except when used with binary strings.

The variable addresses the remaining compatibilities between PCRE and the old regex library.

Examples

default_regex_flags examples

MariaDB uses the variable to address the remaining compatibilities between PCRE and the old regex library.

The default behavior (multiline match is off)

Enabling the multiline option using the PCRE option syntax:

Enabling the multiline option using default_regex_flags

See Also

This page is licensed: GPLv2, originally from

SELECT REGEXP_REPLACE('ab12cd','[0-9]','') AS remove_digits;
-> abcd

SELECT REGEXP_REPLACE('<html><head><title>title</title><body>body</body></htm>', '<.+?>',' ')
AS strip_html;
-> title  body
SELECT REGEXP_REPLACE('James Bond','^(.*) (.*)$','\\2, \\1') AS reorder_name;
-> Bond, James
SELECT REGEXP_REPLACE('ABC','b','-') AS case_insensitive;
-> A-C

SELECT REGEXP_REPLACE('ABC' COLLATE utf8_bin,'b','-') AS case_sensitive;
-> ABC

SELECT REGEXP_REPLACE(BINARY 'ABC','b','-') AS binary_data;
-> ABC
SELECT REGEXP_REPLACE('ABC','(?-i)b','-') AS force_case_sensitive;
-> ABC

SELECT REGEXP_REPLACE(BINARY 'ABC','(?i)b','-') AS force_case_insensitive;
-> A-C
expr REGEXP pat, expr RLIKE pat
Regular Expressions Overview
PCRE Regular Expressions
NOT REGEXP
default_regex_flags
default_regex_flags
Operator Precedence
fill_help_tables.sql
SELECT 'Monty!' REGEXP 'm%y%%';
+-------------------------+
| 'Monty!' REGEXP 'm%y%%' |
+-------------------------+
|                       0 |
+-------------------------+

SELECT 'Monty!' REGEXP '.*';
+----------------------+
| 'Monty!' REGEXP '.*' |
+----------------------+
|                    1 |
+----------------------+

SELECT 'new*\n*line' REGEXP 'new\\*.\\*line';
+---------------------------------------+
| 'new*\n*line' REGEXP 'new\\*.\\*line' |
+---------------------------------------+
|                                     1 |
+---------------------------------------+

SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
+----------------+-----------------------+
| 'a' REGEXP 'A' | 'a' REGEXP BINARY 'A' |
+----------------+-----------------------+
|              1 |                     0 |
+----------------+-----------------------+

SELECT 'a' REGEXP '^[a-d]';
+---------------------+
| 'a' REGEXP '^[a-d]' |
+---------------------+
|                   1 |
+---------------------+
SELECT 'a\nb\nc' RLIKE '^b$';
+---------------------------+
| '(?m)a\nb\nc' RLIKE '^b$' |
+---------------------------+
|                         0 |
+---------------------------+
SELECT 'a\nb\nc' RLIKE '(?m)^b$';
+---------------------------+
| 'a\nb\nc' RLIKE '(?m)^b$' |
+---------------------------+
|                         1 |
+---------------------------+
SET default_regex_flags='MULTILINE';
SELECT 'a\nb\nc' RLIKE '^b$';
+-----------------------+
| 'a\nb\nc' RLIKE '^b$' |
+-----------------------+
|                     1 |
+-----------------------+

PCRE - Perl Compatible Regular Expressions

Understand MariaDB's regex support. This concept page explains the PCRE library integration, detailing supported syntax, character classes, and special characters.

PCRE Versions

PCRE Version
Introduced
Maturity

10.34

PCRE Enhancements

MariaDB uses the PCRE library, which significantly improves the power of the operator.

The switch to PCRE added a number of features, including recursive patterns, named capture, look-ahead and look-behind assertions, non-capturing groups, non-greedy quantifiers, Unicode character properties, extended syntax for characters and character classes, multi-line matching, and many other.

These functions work with regular expressions: , , and .

Also, REGEXP/RLIKE, and the new functions, work correctly with all multi-byte supported by MariaDB, including East-Asian character sets (big5, gb2313, gbk, eucjp, eucjpms, cp932, ujis, euckr), and Unicode character sets (utf8, utf8mb4, ucs2, utf16, utf16le, utf32).

New Regular Expression Functions

  • - Replaces all occurrences of a pattern.

  • - Position of the first appearance of a regex.

  • - Returns the matching part of a string.

See the individual articles for more details and examples.

PCRE Syntax

In most cases PCRE is backward compatible with the old POSIX 1003.2 compliant regexp library (see ), so you won't need to change your applications that use SQL queries with the REGEXP/RLIKE predicate.

This section briefly describes the most important extended PCRE features. For more details, please refer to the documentation on the , or to the documentation which is bundled in the /pcre/doc/html/ directory of a MariaDB sources distribution. The pages pcresyntax.html and pcrepattern.html should be a good start. is another good resource to learn about PCRE and regular expressions generally.

Special Characters

PCRE supports the following escape sequences to match special characters:

Sequence
Description

Note, the backslash characters (here, and in all examples in the sections below) must be escaped with another backslash, unless you're using the NO_BACKSLASH_ESCAPES.

This example tests if a character has hex code 0x61:

Character Classes

PCRE supports the standard POSIX character classes such as alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit, with the following additional classes:

Class
Description

This example checks if the string consists of ASCII characters only:

Generic Character Types

Generic character types complement the POSIX character classes and serve to simplify writing patterns:

Class
Description

This example checks if the string consists of "word" characters only:

Unicode Character Properties

\p{xx} is a character with the xx property, and \P{xx} is a character without the xx property.

The property names represented by xx above are limited to the Unicode script names, the general category properties, and "Any", which matches any character (including newline). Those that are not part of an identified script are lumped together as "Common".

General Category Properties For \p and \P

Property
Description

This example checks if the string consists only of characters with property N (number):

Special Category Properties For \p and \P

Property
Description

The property Xuc matches any character that can be represented by a Universal Character Name (in C++ and other programming languages). These include $, @, ```, and all characters with Unicode code points greater than U+00A0, excluding the surrogates U+D800..U+DFFF.

Script Names For \p and \P

Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian, Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samaritan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, Yi.

This example checks if the string consists only of Greek characters:

Extended Unicode Grapheme Sequence

The \X escape sequence matches a character sequence that makes an "extended grapheme cluster", i.e. a composite character that consists of multiple Unicode code points.

One of the examples of a composite character can be a letter followed by non-spacing accent marks. This example demonstrates that U+0045 LATIN CAPITAL LETTER E followed by U+0302 COMBINING CIRCUMFLEX ACCENT followed by U+0323 COMBINING DOT BELOW together form an extended grapheme cluster:

See the for the other types of extended grapheme clusters.

Simple Assertions

An assertion specifies a certain condition that must match at a particular point, but without consuming characters from the subject string. In addition to the standard POSIX simple assertions ^ (that matches at the beginning of a line) and $ (that matches at the end of a line), PCRE supports a number of other assertions:

Assertion
Description

This example cuts a word that consists only of 3 characters from a string:

Notice that the two \b assertions checked the word boundaries but did not get into the matching pattern.

The \b assertions work well in the beginning and the end of the subject string:

By default, the ^ and $ assertions have the same meaning with \A, \Z, and \z. However, the meanings of ^ and $ can change in multiline mode (see below). By contrast, the meanings of \A, \Z, and \z are always the same; they are independent of the multiline mode.

Option Setting

A number of options that control the default match behavior can be changed within the pattern by a sequence of option letters enclosed between (? and ).

Option
Description

For example, (?im) sets case insensitive multiline matching.

A hyphen followed by the option letters unset the options. For example, (?-im) means case sensitive single line match.

A combined setting and unsetting is also possible, e.g. (?im-sx).

If an option is set outside of subpattern parentheses, the option applies to the remainder of the pattern that follows the option. If an option is set inside a subpattern, it applies to the part of this subpattern that follows the option.

In this example the pattern (?i)m((?-i)aria)db matches the words MariaDB, Mariadb, mariadb, but not MARIADB:

This example demonstrates that the (?x) option makes the regexp engine ignore all white spaces in the pattern (other than in a class).

Note, putting spaces into a pattern in combination with the (?x) option can be useful to split different logical parts of a complex pattern, to make it more readable.

Multiline Matching

Multiline matching changes the meaning of ^ and $ from "the beginning of the subject string" and "the end of the subject string" to "the beginning of any line in the subject string" and "the end of any line in the subject string" respectively.

This example checks if the subject string contains two consequent lines that fully consist of digits:

Notice the (?m) option in the beginning of the pattern, which switches to the multiline matching mode.

Newline Conventions

PCRE supports five line break conventions:

  • CR (\r) - a single carriage return character

  • LF (\n) - a single linefeed character

  • CRLF (\r\n) - a carriage return followed by a linefeed

By default, the newline convention is set to any Unicode newline sequence, which includes:

Sequence
Description

The newline convention can be set by starting a pattern with one of the following sequences:

Sequence
Description

The newline conversion affects the ^ and $ assertions, the interpretation of the dot metacharacter, and the behavior of .

Note, the new line convention does not affect the meaning of .

This example demonstrates that the dot metacharacter matches , because it is not a newline sequence anymore:

Newline Sequences

By default, the escape sequence matches any Unicode newline sequences.

The meaning of can be set by starting a pattern with one of the following sequences:

Sequence
Description

Comments

It's possible to include comments inside a pattern. Comments do not participate in the pattern matching. Comments start at the (?

sequence and continue up to the next closing parenthesis:

Quoting

POSIX uses the backslash to remove a special meaning from a character. PCRE introduces a syntax to remove special meaning from a sequence of characters. The characters inside \Q ... \E are treated literally, without their special meaning.

This example checks if the string matches a dollar sign followed by a parenthesized name (a variable reference in some languages):

Note that the leftmost dollar sign and the parentheses are used literally, while the rightmost dollar sign is still used to match the end of the string.

Resetting the Match Start

The escape sequence \K causes any previously matched characters to be excluded from the final matched sequence. For example, the pattern: (foo)\Kbar matches foobar, but reports that it has matched bar. This feature is similar to a look-behind assertion. However, in this case, the part of the subject before the real match does not have to be of fixed length:

Non-Capturing Groups

The question mark and the colon after the opening parenthesis create a non-capturing group: (?:...).

This example removes an optional article from a word, for example for better sorting of the results.

Note that the articles are listed inside the left parentheses using the alternation operator | but they do not produce a captured subpattern, so the word followed by the article is referenced by '1' in the third argument to the function. Using non-capturing groups can be useful to save numbers on the sup-patterns that won't be used in the third argument of , as well as for performance purposes.

Non-Greedy Quantifiers

By default, the repetition quantifiers ?, *, + and {n,m} are "greedy", that is, they try to match as much as possible. Adding a question mark after a repetition quantifier makes it "non-greedy", so the pattern matches the minimum number of times possible.

This example cuts C comments from a line:

The pattern without the non-greedy flag to the quantifier /[*].*[*]/ would match the entire string between the leftmost /* and the rightmost */.

Atomic Groups

A sequence inside (?>...) makes an atomic group. Backtracking inside an atomic group is prevented once it has matched; however, backtracking past to the previous items works normally.

Consider the pattern \d+foo applied to the subject string 123bar. Once the engine scans 123 and fails on the letter b, it would normally backtrack to 2 and try to match again, then fail and backtrack to 1 and try to match and fail again, and finally fail the entire pattern. In case of an atomic group (?>\d+)foo with the same subject string 123bar, the engine gives up immediately after the first failure to match foo. An atomic group with a quantifier can match all or nothing.

Atomic groups produce faster false results (i.e. in case when a long subject string does not match the pattern), because the regexp engine saves performance on backtracking. However, don't hurry to put everything into atomic groups. This example demonstrates the difference between atomic and non-atomic match:

The non-atomic pattern matches both abbc and abc, while the atomic pattern matches abbc only.

The atomic group (?>bc|b) in the above example can be "translated" as "if there is bc, then don't try to match as b". So b can match only if bc is not found.

Atomic groups are not capturing. To make an atomic group capturing, put it into parentheses:

Possessive quantifiers

An atomic group which ends with a quantifier can be rewritten using a so called "possessive quantifier" syntax by putting an additional + sign following the quantifier.

The pattern (?>\d+)foo from the previous section's example can be rewritten as \d++foo.

Absolute and Relative Numeric Backreferences

Backreferences match the same text as previously matched by a capturing group. Backreferences can be written using:

  • a backslash followed by a digit

  • the \g escape sequence followed by a positive or negative number

  • the \g escape sequence followed by a positive or negative number enclosed in braces

The following backreferences are identical and refer to the first capturing group:

  • \1

  • \g1

  • \g{1}

This example demonstrates a pattern that matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility":

This example removes doubled words that can unintentionally creep in when you edit a text in a text editor:

Note that all double words were removed, in the beginning, in the middle and in the end of the subject string.

A negative number in a \g sequence means a relative reference. Relative references can be helpful in long patterns, and also in patterns that are created by joining fragments together that contain references within themselves. The sequence \g{-1} is a reference to the most recently started capturing subpattern before \g.

In this example \g{-1} is equivalent to \2:

Named Subpatterns and Backreferences

Using numeric backreferences for capturing groups can be hard to track in a complicated regular expression. Also, the numbers can change if an expression is modified. To overcome these difficulties, PCRE supports named subpatterns.

A subpattern can be named in one of three ways: (?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing subpatterns from other parts of the pattern, can be made by name as well as by number.

Backreferences to a named subpattern can be written using the .NET syntax \k{name}, the Perl syntax \k<name> or \k'name' or \g{name}, or the Python syntax (?P=name).

This example tests if the string is a correct HTML tag:

Positive and Negative Look-Ahead and Look-Behind Assertions

Look-ahead and look-behind assertions serve to specify the context for the searched regular expression pattern. Note that the assertions only check the context, they do not capture anything themselves!

This example finds the letter which is not followed by another letter (negative look-ahead):

This example finds the letter which is followed by a digit (positive look-ahead):

This example finds the letter which does not follow a digit character (negative look-behind):

This example finds the letter which follows another letter character (positive look-behind):

Note that look-behind assertions can only be of fixed length; you cannot have repetition operators or alternations with different lengths:

Subroutine Reference and Recursive Patterns

PCRE supports a special syntax to recourse the entire pattern or its individual subpatterns:

Syntax
Description

This example checks for a correct additive arithmetic expression consisting of numbers, unary plus and minus, binary plus and minus, and parentheses:

The recursion is done using (?1) to call for the first parenthesized subpattern, which includes everything except the leading ^ and the trailing $.

The regular expression in the above example implements the following BNF grammar:

  1. <expression> ::= <term> [(<sign> <term>)...]

  2. <term> ::= [ <sign> ] <primary>

  3. <primary> ::= <number> | <left paren> <expression> <right paren>

Defining Subpatterns For Use By Reference

Use the (?(DEFINE)...) syntax to define subpatterns that can be referenced from elsewhere.

This example defines a subpattern with the name letters that matches one or more letters, which is further reused two times:

The above example can also be rewritten to define the digit part as a subpattern as well:

Conditional Subpatterns

There are two forms of conditional subpatterns:

The yes-pattern is used if the condition is satisfied, otherwise the no-pattern (if any) is used.

Conditions With Subpattern References

If a condition consists of a number, it makes a condition with a subpattern reference. Such a condition is true if a capturing subpattern corresponding to the number has previously matched.

This example finds an optionally parenthesized number in a string:

The ([(])? part makes a capturing subpattern that matches an optional opening parenthesis; the [0-9]+ part matches a number, and the (?(1)[)]) part matches a closing parenthesis, but only if the opening parenthesis has been previously found.

Other Kinds of Conditions

The other possible condition kinds are: recursion references and assertions. See the for details.

Matching Zero Bytes (0x00)

PCRE correctly works with zero bytes in the subject strings:

Zero bytes, however, are not supported literally in the pattern strings and should be escaped using the \xhh or \x{hh} syntax:

Other PCRE Features

PCRE provides other extended features that were not covered in this document, such as duplicate subpattern numbers, backtracking control, breaking utf-8 sequences into individual bytes, setting the match limit, setting the recursion limit, optimization control, recursion conditions, assertion conditions and more types of extended grapheme clusters. Please refer to the for details.

Enhanced regex was implemented as a GSoC 2013 project by Sudheera Palihakkara.

default_regex_flags Examples

The variable was introduced to address the remaining incompatibilities between PCRE and the old regex library. Here are some examples of its usage:

The default behaviour (multiline match is off)

Enabling the multiline option using the PCRE option syntax:

Enabling the miltiline option using default_regex_flags

See Also

This page is licensed: CC BY-SA / Gnu FDL

0x09 (TAB)

\ddd

character with octal code ddd

\xhh

character with hex code hh

\x{hhh..}

character with hex code hhh..

\s

a white space character

\S

a character that is not a white space character

\v

a vertical white space character

\V

a character that is not a vertical white space character

\w

a "word" character (same as [:word:])

\W

a "non-word" character

L

Letter

Ll

Lower case letter

Lm

Modifier letter

Lo

Other letter

Lt

Title case letter

Lu

Upper case letter

L&

Ll, Lu, or Lt

M

Mark

Mc

Spacing mark

Me

Enclosing mark

Mn

Non-spacing mark

N

Number

Nd

Decimal number

Nl

Letter number

No

Other number

P

Punctuation

Pc

Connector punctuation

Pd

Dash punctuation

Pe

Close punctuation

Pf

Final punctuation

Pi

Initial punctuation

Po

Other punctuation

Ps

Open punctuation

S

Symbol

Sc

Currency symbol

Sk

Modifier symbol

Sm

Mathematical symbol

So

Other symbol

Z

Separator

Zl

Line separator

Zp

Paragraph separator

Zs

Space separator

(?X)

extra PCRE functionality (e.g. force error on unknown escaped character)

(?-...)

unset option(s)

any of the previous three
  • any Unicode newline sequence

  • LS

    (U+2028, line separator)

    PS

    (U+2029, paragraph separator)

    \g

    call subpattern by name (Oniguruma)

    \g'name'

    call subpattern by name (Oniguruma)

    \g

    call subpattern by absolute number (Oniguruma)

    \g'n'

    call subpattern by absolute number (Oniguruma)

    \g<+n>

    call subpattern by relative number

    \g<-n>

    call subpattern by relative number

    \g'+n'

    call subpattern by relative number

    \g'-n'

    call subpattern by relative number

    <sign> ::= <plus sign> | <minus sign>

    Stable

    PCRE 8.43

    Stable

    PCRE 8.42

    , ,

    Stable

    PCRE 8.41

    , ,

    Stable

    PCRE 8.40

    , ,

    Stable

    PCRE 8.39

    ,

    Stable

    PCRE 8.38

    ,

    Stable

    PCRE 8.37

    ,

    Stable

    PCRE 8.36

    ,

    Stable

    PCRE 8.35

    ,

    Stable

    PCRE 8.34

    Stable

    \a

    0x07 (BEL)

    \cx

    "control-x", where x is any ASCII character

    \e

    0x1B (escape)

    \f

    0x0C (form feed)

    0x0A (newline)

    0x0D (carriage return)

    ascii

    any ASCII character (0x00..0x7F)

    word

    any "word" character (a letter, a digit, or an underscore)

    \d

    a decimal digit (same as [:digit:])

    \D

    a character that is not a decimal digit

    \h

    a horizontal white space character

    \H

    a character that is not a horizontal white space character

    a character that is not a new line

    a newline sequence

    C

    Other

    Cc

    Control

    Cf

    Format

    Cn

    Unassigned

    Co

    Private use

    Cs

    Surrogate

    Xan

    Alphanumeric: union of properties L and N

    Xps

    POSIX space: property Z or tab, NL, VT, FF, CR

    Xsp

    Perl space: property Z or tab, NL, FF, CR

    Xuc

    A character than can be represented by a Universal Character Name

    Xwd

    Perl word: property Xan or underscore

    \b

    matches at a word boundary

    \B

    matches when not at a word boundary

    \A

    matches at the start of the subject

    \Z

    matches at the end of the subject, also matches before a newline at the end of the subject

    \z

    matches only at the end of the subject

    \G

    matches at the first matching position in the subject

    (?i)

    case insensitive match

    (?m)

    multiline mode

    (?s)

    dotall mode (dot matches newline characters)

    (?x)

    extended (ignore white space)

    (?U)

    ungreedy (lazy) match

    (?J)

    allow named subpatterns with duplicate names

    LF

    (U+000A, carriage return)

    CR

    (U+000D, carriage return)

    CRLF

    (a carriage return followed by a linefeed)

    VT

    (U+000B, vertical tab)

    FF

    (U+000C, form feed)

    NEL

    (U+0085, next line)

    (*CR)

    carriage return

    (*LF)

    linefeed

    (*CRLF)

    carriage return followed by linefeed

    (*ANYCRLF)

    any of the previous three

    (*ANY)

    all Unicode newline sequences

    (*BSR_ANYCRLF)

    any of CR, LF or CRLF

    (*BSR_UNICODE)

    any Unicode newline sequence

    (?R)

    Recourse the entire pattern

    (?n)

    call subpattern by absolute number

    (?+n)

    call subpattern by relative number

    (?-n)

    call subpattern by relative number

    (?&name)

    call subpattern by name (Perl)

    (?P>name)

    call subpattern by name (Python)

    REGEXP/RLIKE
    REGEXP_REPLACE()
    REGEXP_INSTR()
    REGEXP_SUBSTR()
    character sets
    REGEXP_REPLACE(subject, pattern, replace)
    REGEXP_INSTR(subject, pattern)
    REGEXP_SUBSTR(subject,pattern)
    Regular Expressions Overview
    PCRE site
    Regular-Expressions.Info
    SQL_MODE
    PCRE documentation
    REGEXP_REPLACE()
    PCRE documentation
    PCRE documentation
    default_regex_flags
    MariaDB upgrades to PCRE-8.34
    PCRE2
    SELECT 'a' RLIKE '\\x{61}';
    -> 1
    SELECT 'abc' RLIKE '^[[:ascii:]]+$';
    -> 1
    SELECT 'abc' RLIKE '^\\w+$';
    -> 1
    SELECT '1¼①' RLIKE '^\\p{N}+$';
    -> 1
    SELECT 'ΣΦΩ' RLIKE '^\\p{Greek}+$';
    -> 1
    SELECT _ucs2 0x004503020323 RLIKE '^\\X$';
    -> 1
    SELECT REGEXP_SUBSTR('---abcd---xyz---', '\\b\\w{3}\\b');
    -> xyz
    SELECT REGEXP_SUBSTR('xyz', '\\b\\w{3}\\b');
    -> xyz
    SELECT 'MariaDB' RLIKE '(?i)m((?-i)aria)db';
    -> 1
    
    SELECT 'mariaDB' RLIKE '(?i)m((?-i)aria)db';
    -> 1
    
    SELECT 'Mariadb' RLIKE '(?i)m((?-i)aria)db';
    -> 1
    
    SELECT 'MARIADB' RLIKE '(?i)m((?-i)aria)db';
    -> 0
    SELECT 'ab' RLIKE '(?x)a b';
    -> 1
    SELECT 'abc\n123\n456\nxyz\n' RLIKE '(?m)^\\d+\\R\\d+$';
    -> 1
    SELECT 'a\nb' RLIKE '(*CR)a.b';
    -> 1
    SELECT 'ab12' RLIKE 'ab(?#expect digits)12';
    -> 1
    SELECT '$(abc)' RLIKE '^\\Q$(\\E\\w+\\Q)\\E$';
    -> 1
    SELECT REGEXP_SUBSTR('aaa123', '[a-z]*\\K[0-9]*');
    -> 123
    SELECT REGEXP_REPLACE('The King','(?:the|an|a)[^a-z]([a-z]+)','\\1');
    -> King
    SELECT REGEXP_REPLACE('/* Comment1 */ i+= 1; /* Comment2 */', '/[*].*?[*]/','');
    ->  i+= 1;
    SELECT 'abcc' RLIKE 'a(?>bc|b)c' AS atomic1;
    -> 1
    
    SELECT 'abc' RLIKE 'a(?>bc|b)c' AS atomic2;
    -> 0
     
    SELECT 'abcc' RLIKE 'a(bc|b)c' AS non_atomic1;
    -> 1
    
    SELECT 'abc' RLIKE 'a(bc|b)c' AS non_atomic2;
    -> 1
    SELECT REGEXP_REPLACE('abcc','a((?>bc|b))c','\\1');
    -> bc
    SELECT 'sense and sensibility' RLIKE '(sens|respons)e and \\1ibility';
    -> 1
    SELECT REGEXP_REPLACE('using using the the regexp regexp',
     '\\b(\\w+)\\s+\\1\\b','\\1');
    -> USING the regexp
    SELECT 'abc123def123' RLIKE '(abc(123)def)\\g{-1}';     
    -> 1
    
    SELECT 'abc123def123' RLIKE '(abc(123)def)\\2';
    -> 1
    SELECT '<a href="../">Up</a>' RLIKE '<(?<tag>[a-z][a-z0-9]*)[^>]*>[^<]*</(?P=tag)>';
    -> 1
    SELECT REGEXP_SUBSTR('ab1','[a-z](?![a-z])');
    -> b
    SELECT REGEXP_SUBSTR('ab1','[a-z](?=[0-9])');
    -> b
    SELECT REGEXP_SUBSTR('1ab','(?<![0-9])[a-z]');
    -> b
    SELECT REGEXP_SUBSTR('1ab','(?<=[a-z])[a-z]');
    -> b
    SELECT 'aaa' RLIKE '(?<=(a|bc))a';
    ERROR 1139 (42000): Got error 'lookbehind assertion is not fixed length at offset 10' from regexp
    SELECT '1+2-3+(+(4-1)+(-2)+(+1))' RLIKE  '^(([+-]?(\\d+|[(](?1)[)]))(([+-](?1))*))$';
    -> 1
    SELECT 'abc123xyz' RLIKE '^(?(DEFINE)(?<letters>[a-z]+))(?&letters)[0-9]+(?&letters)$';
    -> 1
    SELECT 'abc123xyz' RLIKE
     '^(?(DEFINE)(?<letters>[a-z]+)(?<digits>[0-9]+))(?&letters)(?&digits)(?&letters)$';
    -> 1
    (?(condition)yes-pattern)
    (?(condition)yes-pattern|no-pattern)
    SELECT REGEXP_SUBSTR('a(123)b', '([(])?[0-9]+(?(1)[)])');
    -> (123)
    SELECT 'a\0b' RLIKE '^a.b$';
    -> 1
    SELECT 'a\0b' RLIKE '^a\\x{00}b$';
    -> 1
    SELECT 'a\nb\nc' RLIKE '^b$';
    +---------------------------+
    | '(?m)a\nb\nc' RLIKE '^b$' |
    +---------------------------+
    |                         0 |
    +---------------------------+
    SELECT 'a\nb\nc' RLIKE '(?m)^b$';
    +---------------------------+
    | 'a\nb\nc' RLIKE '(?m)^b$' |
    +---------------------------+
    |                         1 |
    +---------------------------+
    SET default_regex_flags='MULTILINE';
    SELECT 'a\nb\nc' RLIKE '^b$';
    +-----------------------+
    | 'a\nb\nc' RLIKE '^b$' |
    +-----------------------+
    |                     1 |
    +-----------------------+
    MariaDB 10.5.1
    MariaDB 10.1.39
    MariaDB 10.2.15
    MariaDB 10.1.33
    MariaDB 10.0.35
    MariaDB 10.2.8
    MariaDB 10.1.26
    MariaDB 10.0.32
    MariaDB 10.2.5
    MariaDB 10.1.22
    MariaDB 10.0.30
    MariaDB 10.1.15
    MariaDB 10.0.26
    MariaDB 10.1.10
    MariaDB 10.0.23
    MariaDB 10.1.5
    MariaDB 10.0.18
    MariaDB 10.1.2
    MariaDB 10.0.15
    MariaDB 10.1.0
    MariaDB 10.0.12
    MariaDB 10.0.8

    REGEXP_INSTR

    Return the index of a regex match. This function finds the starting position of the first substring that matches the given pattern.

    Syntax

    REGEXP_INSTR(subject, pattern)

    Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

    The positions start with 1 and are measured in characters (i.e. not in bytes), which is important for multi-byte character sets. You can cast a multi-byte character set to BINARY to get offsets in bytes.

    The function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

    The collation case sensitivity can be overwritten using the (?i) and (?-i) PCRE flags.

    MariaDB uses the library for enhanced regular expression performance, and REGEXP_INSTR was introduced as part of this enhancement.

    Examples

    Casting a multi-byte character set as BINARY to get offsets in bytes:

    Case sensitivity:

    This page is licensed: CC BY-SA / Gnu FDL

    SELECT REGEXP_INSTR('abc','b');
    -> 2
    
    SELECT REGEXP_INSTR('abc','x');
    -> 0
    
    SELECT REGEXP_INSTR('BJÖRN','N');
    -> 5
    PCRE regular expression
    SELECT REGEXP_INSTR(BINARY 'BJÖRN','N') AS cast_utf8_to_binary;
    -> 6
    SELECT REGEXP_INSTR('ABC','b');
    -> 2
    
    SELECT REGEXP_INSTR('ABC' COLLATE utf8_bin,'b');
    -> 0
    
    SELECT REGEXP_INSTR(BINARY'ABC','b');
    -> 0
    
    SELECT REGEXP_INSTR('ABC','(?-i)b');
    -> 0
    
    SELECT REGEXP_INSTR('ABC' COLLATE utf8_bin,'(?i)b');
    -> 2