Collations in MariaDB Xpand

Overview

MariaDB Xpand in SkySQL supports many different collations. A collation is a set of rules for sorting and comparing character sets.

Compatibility

The detail provided here applies to:

  • MariaDB Xpand 5.3

  • MariaDB Xpand 6.0

  • MariaDB Xpand 6.1

  • Distributed Transactions

Example of Collation

As an example, consider a list of names. The character set defines which characters can appear in each name. The collation defines the order in which the names would be sorted, and whether two names would be equal:

  • In a case-insensitive collation, "Mueller" and "mueller" would be considered the same name.

  • In a case-sensitive collation, "Mueller" and "mueller" would be considered different names.

  • In a German collation, "Mueller" and "Müller" would be considered the same name.

  • In an English collation, "Mueller" and "Müller" would be considered different names.

  • In a collation that ignores padding, "Mueller" and "Mueller " would be considered the same name.

  • In a no-pad collation, "Mueller" and "Mueller " would be considered different names.

Supported Collations

For the full list of supported collations, see Collations in Xpand.

SHOW COLLATION
   LIKE 'utf8mb4_%';
+--------------------+---------+------+---------+----------+---------+
| Collation          | Charset | Id   | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 |   45 | Yes     | Yes      |       1 |
| utf8mb4_bin        | utf8mb4 |   46 | No      | Yes      |       1 |
| utf8mb4_unicode_ci | utf8mb4 |  224 | No      | Yes      |       1 |
+--------------------+---------+------+---------+----------+---------+

Collation Suffixes

There are several different collation suffixes, which refer to collations with different types of rules:

Collation Suffix

Case-Sensitive

Case-Insensitive

Binary

Legacy

ci

No

Yes

No

No

cs

Yes

No

No

No

bin

No

No

Yes

No

ci_legacy

No

Yes

No

Yes

cs_legacy

Yes

No

No

Yes

Server Collation

The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table.

Topology

Default Value

Distributed Transactions

utf8_general_ci

Checking the Server Collation

SHOW SESSION VARIABLES
   LIKE 'collation_server';
+------------------+-----------------+
| Variable_name    | Value           |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+

Effect of the Server Collation

The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table. To demonstrate this effect:

Database's Default Collation

A database's default collation is used to determine the collation of new tables within the database, unless a different collation is specified for a given table.

Checking a Database's Default Collation

SELECT DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8_general_ci        |
+------------------------+

Setting a Database's Default Collation

CREATE DATABASE db1
   DEFAULT COLLATE 'utf8mb4_bin';

The new database's default character set will be the character set associated with the specified collation.

Changing a Database's Default Collation

ALTER DATABASE db1
   DEFAULT COLLATE 'utf8mb4_bin';

The database's default character set will be the character set associated with the specified collation.

Effect of a Database's Default Collation

A database's default collation is used to determine the default collation of new tables within the database, unless a different collation is specified for a given table. To demonstrate this effect:

Table's Default Collation

A table's default collation is used to determine the collation of new columns within the table, unless a different collation is specified for a given column.

Checking a Table's Default Collation

SELECT TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab';
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+

Setting a Table's Default Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
) DEFAULT COLLATE 'utf8mb4_bin';

The new table's default character set will be the character set associated with the specified collation.

Changing a Table's Default Collation

ALTER TABLE db1.tab
   DEFAULT COLLATE 'utf8mb4_bin';

The table's default character set will be the character set associated with the specified collation.

Effect of a Table's Default Collation

A table's default collation is used to determine the default collation of new columns within the table, unless a different collation is specified for a given column. To demonstrate this effect:

Column's Collation

A column's collation is used by the column to sort and compare its character data.

Checking a Column's collation

SELECT COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab'
   AND COLUMN_NAME='str';
+-----------------+
| COLLATION_NAME  |
+-----------------+
| utf8_general_ci |
+-----------------+

Setting a Column's Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) COLLATE 'utf8mb4_bin'
);

The column's character set will be the character set associated with the specified collation.

Changing a Column's Collation

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   COLLATE 'utf8mb4_bin';

The column's character set will be the character set associated with the specified collation.

Binary Column Collations

For binary column collations, the collation compares the column's character data by comparing the binary values of each character.

For example, to see how the utf8mb4_bin collation would handle sorting strings that use the utf8mb4 character set:

Case-Insensitive Column Collations

For case-insensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as equal.

For example, to see how the utf8mb4_unicode_ci collation would handle sorting strings that use the utf8mb4 character set:

Case-Sensitive Column Collations

For case-sensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as different.

For example, to see how the latin1_general_cs collation would handle sorting strings that use the latin1 character set:

Precedence

The collation precedence is different in different contexts.

Executing a Query

When Xpand executes a query, the collation of string literals depends on whether the literal has a character set introducer.

If a string literal contains a character set introducer, Xpand uses default collation for the character set specified for the string literal:

SELECT _latin1'this is a latin1 string';
+-------------------------+
| this is a latin1 string |
+-------------------------+
| this is a latin1 string |
+-------------------------+

If a string literal does not contain a character set introducer, Xpand uses the default collation for the character set specified by the character_set_client system variable.

Xpand currently ignores the collation_connection system variable.

Xpand does not support the COLLATE clause to specify a column's collation in SELECT queries.

Creating an Object

When Xpand creates an object (such as a database, table, or column), the character set of the object is determined with the following precedence:

  1. Column default:

    • If the [DEFAULT] COLLATION column option is defined, the specified collation is used.

    • If the [DEFAULT] CHARACTER SET or [DEFAULT] CHARSET column option are defined, the default collation for the specified character set is used.

  2. Table default:

    • If the [DEFAULT] COLLATION table option is defined, the specified collation is used.

    • If the [DEFAULT] CHARACTER SET or [DEFAULT] CHARSET table option are defined, the default collation for the specified character set is used.

  3. Database default:

    • If the [DEFAULT] COLLATION database option is defined, the specified collation is used.

    • If the [DEFAULT] CHARACTER SET or [DEFAULT] CHARSET database option are defined, the default collation for the specified character set is used.

  4. Server-wide default (the default collation for the character set defined by the character_set_server system variable)

Caveats

collation_database System Variable

The collation_database system variable cannot be modified.

Collation of Control Codes

Control codes (for example, white space characters and empty strings), collate differently in Xpand than MySQL. Both MySQL and Xpand trim spaces at the end of strings, but in Xpand, it is assumed that shorter strings always collate before longer strings. MySQL, however assumes a shorter string MAY collate after a longer string if the characters of the longer string contain pre-space characters.