Collations in MariaDB Xpand

Overview

MariaDB Xpand in SkySQL supports many different collations. A collation is a set of rules for sorting and comparing character sets.

Compatibility

The detail provided here applies to:

  • MariaDB Xpand 5.3

  • MariaDB Xpand 6.0

  • MariaDB Xpand 6.1

  • Distributed Transactions

Example of Collation

As an example, consider a list of names. The character set defines which characters can appear in each name. The collation defines the order in which the names would be sorted, and whether two names would be equal:

  • In a case-insensitive collation, "Mueller" and "mueller" would be considered the same name.

  • In a case-sensitive collation, "Mueller" and "mueller" would be considered different names.

  • In a German collation, "Mueller" and "Müller" would be considered the same name.

  • In an English collation, "Mueller" and "Müller" would be considered different names.

  • In a collation that ignores padding, "Mueller" and "Mueller " would be considered the same name.

  • In a no-pad collation, "Mueller" and "Mueller " would be considered different names.

Supported Collations

For the full list of supported collations, see Collations in Xpand.

SHOW COLLATION
   LIKE 'utf8mb4_%';
+--------------------+---------+------+---------+----------+---------+
| Collation          | Charset | Id   | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 |   45 | Yes     | Yes      |       1 |
| utf8mb4_bin        | utf8mb4 |   46 | No      | Yes      |       1 |
| utf8mb4_unicode_ci | utf8mb4 |  224 | No      | Yes      |       1 |
+--------------------+---------+------+---------+----------+---------+

Collation Suffixes

There are several different collation suffixes, which refer to collations with different types of rules:

Collation Suffix

Case-Sensitive

Case-Insensitive

Binary

Legacy

ci

No

Yes

No

No

cs

Yes

No

No

No

bin

No

No

Yes

No

ci_legacy

No

Yes

No

Yes

cs_legacy

Yes

No

No

Yes

Server Collation

The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table.

Topology

Default Value

Distributed Transactions

utf8_general_ci

Checking the Server Collation

SHOW SESSION VARIABLES
   LIKE 'collation_server';
+------------------+-----------------+
| Variable_name    | Value           |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+

Effect of the Server Collation

The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table. To demonstrate this effect:

Database's Default Collation

A database's default collation is used to determine the collation of new tables within the database, unless a different collation is specified for a given table.

Checking a Database's Default Collation

SELECT DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8_general_ci        |
+------------------------+

Setting a Database's Default Collation

CREATE DATABASE db1
   DEFAULT COLLATE 'utf8mb4_bin';

The new database's default character set will be the character set associated with the specified collation.

Changing a Database's Default Collation

ALTER DATABASE db1
   DEFAULT COLLATE 'utf8mb4_bin';

The database's default character set will be the character set associated with the specified collation.

Effect of a Database's Default Collation

A database's default collation is used to determine the default collation of new tables within the database, unless a different collation is specified for a given table. To demonstrate this effect:

Table's Default Collation

A table's default collation is used to determine the collation of new columns within the table, unless a different collation is specified for a given column.

Checking a Table's Default Collation

SELECT TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab';
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+

Setting a Table's Default Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
) DEFAULT COLLATE 'utf8mb4_bin';

The new table's default character set will be the character set associated with the specified collation.

Changing a Table's Default Collation

ALTER TABLE db1.tab
   DEFAULT COLLATE 'utf8mb4_bin';

The table's default character set will be the character set associated with the specified collation.

Effect of a Table's Default Collation

A table's default collation is used to determine the default collation of new columns within the table, unless a different collation is specified for a given column. To demonstrate this effect:

Column's Collation

A column's collation is used by the column to sort and compare its character data.

Checking a Column's collation

SELECT COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab'
   AND COLUMN_NAME='str';
+-----------------+
| COLLATION_NAME  |
+-----------------+
| utf8_general_ci |
+-----------------+

Setting a Column's Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) COLLATE 'utf8mb4_bin'
);

The column's character set will be the character set associated with the specified collation.

Changing a Column's Collation

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   COLLATE 'utf8mb4_bin';

The column's character set will be the character set associated with the specified collation.

Binary Column Collations

For binary column collations, the collation compares the column's character data by comparing the binary values of each character.

For example, to see how the utf8mb4_bin collation would handle sorting strings that use the utf8mb4 character set:

Case-Insensitive Column Collations

For case-insensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as equal.

For example, to see how the utf8mb4_unicode_ci collation would handle sorting strings that use the utf8mb4 character set:

Case-Sensitive Column Collations

For case-sensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as different.

For example, to see how the latin1_general_cs collation would handle sorting strings that use the latin1 character set:

Caveats

collation_database System Variable

The collation_database system variable cannot be modified.

Collation of Control Codes

Control codes (for example, white space characters and empty strings), collate differently in Xpand than MySQL. Both MySQL and Xpand trim spaces at the end of strings, but in Xpand, it is assumed that shorter strings always collate before longer strings. MySQL, however assumes a shorter string MAY collate after a longer string if the characters of the longer string contain pre-space characters.