Collations in MariaDB Enterprise Server

Overview

MariaDB Enterprise Server in SkySQL supports many different collations. A collation is a set of rules for sorting and comparing character sets.

The detail provided here applies to:

  • MariaDB Platform for Transactions

  • MariaDB Platform for Smart Transactions

Example of Collation

As an example, consider a list of names. The character set defines which characters can appear in each name. The collation defines the order in which the names would be sorted, and whether two names would be equal:

  • In a case-insensitive collation, "Mueller" and "mueller" would be considered the same name.

  • In a case-sensitive collation, "Mueller" and "mueller" would be considered different names.

  • In a German collation, "Mueller" and "Müller" would be considered the same name.

  • In an English collation, "Mueller" and "Müller" would be considered different names.

  • In a collation that ignores padding, "Mueller" and "Mueller " would be considered the same name.

  • In a no-pad collation, "Mueller" and "Mueller " would be considered different names.

Supported Collations

For the full list of supported collations, see Collations in Enterprise Server 10.5 and Collations in Enterprise Server 10.4.

SHOW COLLATION
   LIKE 'utf8mb4_%';
+------------------------------+---------+------+---------+----------+---------+
| Collation                    | Charset | Id   | Default | Compiled | Sortlen |
+------------------------------+---------+------+---------+----------+---------+
| utf8mb4_general_ci           | utf8mb4 |   45 | Yes     | Yes      |       1 |
| utf8mb4_bin                  | utf8mb4 |   46 |         | Yes      |       1 |
| utf8mb4_unicode_ci           | utf8mb4 |  224 |         | Yes      |       8 |
| utf8mb4_icelandic_ci         | utf8mb4 |  225 |         | Yes      |       8 |
| utf8mb4_latvian_ci           | utf8mb4 |  226 |         | Yes      |       8 |
| utf8mb4_romanian_ci          | utf8mb4 |  227 |         | Yes      |       8 |
| utf8mb4_slovenian_ci         | utf8mb4 |  228 |         | Yes      |       8 |
| utf8mb4_polish_ci            | utf8mb4 |  229 |         | Yes      |       8 |
| utf8mb4_estonian_ci          | utf8mb4 |  230 |         | Yes      |       8 |
| utf8mb4_spanish_ci           | utf8mb4 |  231 |         | Yes      |       8 |
| utf8mb4_swedish_ci           | utf8mb4 |  232 |         | Yes      |       8 |
| utf8mb4_turkish_ci           | utf8mb4 |  233 |         | Yes      |       8 |
| utf8mb4_czech_ci             | utf8mb4 |  234 |         | Yes      |       8 |
| utf8mb4_danish_ci            | utf8mb4 |  235 |         | Yes      |       8 |
| utf8mb4_lithuanian_ci        | utf8mb4 |  236 |         | Yes      |       8 |
| utf8mb4_slovak_ci            | utf8mb4 |  237 |         | Yes      |       8 |
| utf8mb4_spanish2_ci          | utf8mb4 |  238 |         | Yes      |       8 |
| utf8mb4_roman_ci             | utf8mb4 |  239 |         | Yes      |       8 |
| utf8mb4_persian_ci           | utf8mb4 |  240 |         | Yes      |       8 |
| utf8mb4_esperanto_ci         | utf8mb4 |  241 |         | Yes      |       8 |
| utf8mb4_hungarian_ci         | utf8mb4 |  242 |         | Yes      |       8 |
| utf8mb4_sinhala_ci           | utf8mb4 |  243 |         | Yes      |       8 |
| utf8mb4_german2_ci           | utf8mb4 |  244 |         | Yes      |       8 |
| utf8mb4_croatian_mysql561_ci | utf8mb4 |  245 |         | Yes      |       8 |
| utf8mb4_unicode_520_ci       | utf8mb4 |  246 |         | Yes      |       8 |
| utf8mb4_vietnamese_ci        | utf8mb4 |  247 |         | Yes      |       8 |
| utf8mb4_croatian_ci          | utf8mb4 |  608 |         | Yes      |       8 |
| utf8mb4_myanmar_ci           | utf8mb4 |  609 |         | Yes      |       8 |
| utf8mb4_thai_520_w2          | utf8mb4 |  610 |         | Yes      |       4 |
| utf8mb4_general_nopad_ci     | utf8mb4 | 1069 |         | Yes      |       1 |
| utf8mb4_nopad_bin            | utf8mb4 | 1070 |         | Yes      |       1 |
| utf8mb4_unicode_nopad_ci     | utf8mb4 | 1248 |         | Yes      |       8 |
| utf8mb4_unicode_520_nopad_ci | utf8mb4 | 1270 |         | Yes      |       8 |
+------------------------------+---------+------+---------+----------+---------+

Collation Suffixes

There are several different collation suffixes, which refer to collations with different types of rules:

Collation Suffix

Case-Sensitive

Case-Insensitive

Binary

Padded

ci

No

Yes

No

Yes

cs

Yes

No

No

Yes

bin

No

No

Yes

Yes

nopad_ci

No

Yes

No

No

nopad_cs

Yes

No

No

No

nopad_bin

No

No

Yes

No

Server Collation

The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table.

Topology

Default Value

Transactional Standalone

utf8mb4_unicode_ci

HA (Primary/Replica)

utf8mb4_unicode_ci

Galera

utf8mb4_unicode_ci

Analytical

utf8_general_ci

HTAP

utf8_general_ci

Checking the Server Collation

SHOW SESSION VARIABLES
   LIKE 'collation_server';
+------------------+--------------------+
| Variable_name    | Value              |
+------------------+--------------------+
| collation_server | utf8mb4_unicode_ci |
+------------------+--------------------+

Effect of the Server Collation

The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table. To demonstrate this effect:

Database's Default Collation

A database's default collation is used to determine the collation of new tables within the database, unless a different collation is specified for a given table.

Checking a Database's Default Collation

SELECT DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8mb4_unicode_ci     |
+------------------------+

Setting a Database's Default Collation

CREATE DATABASE db1
   DEFAULT COLLATE 'utf8mb4_bin';

The new database's default character set will be the character set associated with the specified collation.

Changing a Database's Default Collation

ALTER DATABASE db1
   DEFAULT COLLATE 'utf8mb4_bin';

The database's default character set will be the character set associated with the specified collation.

Effect of a Database's Default Collation

A database's default collation is used to determine the default collation of new tables within the database, unless a different collation is specified for a given table. To demonstrate this effect:

Table's Default Collation

A table's default collation is used to determine the collation of new columns within the table, unless a different collation is specified for a given column.

Checking a Table's Default Collation

SELECT TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab';
+--------------------+
| TABLE_COLLATION    |
+--------------------+
| utf8mb4_unicode_ci |
+--------------------+

Setting a Table's Default Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
) DEFAULT COLLATE 'utf8mb4_bin';

The new table's default character set will be the character set associated with the specified collation.

Changing a Table's Default Collation

ALTER TABLE db1.tab
   DEFAULT COLLATE 'utf8mb4_bin';

The table's default character set will be the character set associated with the specified collation.

Effect of a Table's Default Collation

A table's default collation is used to determine the default collation of new columns within the table, unless a different collation is specified for a given column. To demonstrate this effect:

Column's Collation

A column's collation is used by the column to sort and compare its character data.

Checking a Column's collation

SELECT COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab'
   AND COLUMN_NAME='str';
+--------------------+
| COLLATION_NAME     |
+--------------------+
| utf8mb4_unicode_ci |
+--------------------+

Setting a Column's Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) COLLATE 'utf8mb4_bin'
);

The column's character set will be the character set associated with the specified collation.

Changing a Column's Collation

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   COLLATE 'utf8mb4_bin';

The column's character set will be the character set associated with the specified collation.

Binary Column Collations

For binary column collations, the collation compares the column's character data by comparing the binary values of each character.

For example, to see how the utf8mb4_bin collation would handle sorting strings that use the utf8mb4 character set:

Case-Insensitive Column Collations

For case-insensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as equal.

For example, to see how the utf8mb4_unicode_ci collation would handle sorting strings that use the utf8mb4 character set:

Case-Sensitive Column Collations

For case-sensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as different.

For example, to see how the latin1_general_cs collation would handle sorting strings that use the latin1 character set:

Connection Collation

The connection collation is used to sort and compare string literals.

Setting the Connection Collation

SET SESSION collation_connection='utf8mb4_bin';

Binary Connection Collations

For binary connection collations, the collation compares string literals by comparing the binary values of each character.

For example, to see how the utf8mb4_bin collation would handle sorting strings that use the utf8mb4 character set:

Case-Insensitive Connection Collations

For case-insensitive connection collations, the collation compares string literals by treating upper case and lower case characters of the same character as equal.

For example, to see how the utf8mb4_unicode_ci collation would handle sorting strings that use the utf8mb4 character set:

Case-Sensitive Connection Collations

For case-sensitive connection collations, the collation compares string literals by treating upper case and lower case characters of the same character as different.

For example, to see how the latin1_general_cs collation would handle sorting strings that use the latin1 character set: