Collations in MariaDB Xpand
This page is part of MariaDB's Documentation.
The parent of this page is: Collations
Topics on this page:
Overview
MariaDB Xpand in SkySQL supports many different collations. A collation is a set of rules for sorting and comparing character sets.
Compatibility
The detail provided here applies to:
MariaDB Xpand 5.3
MariaDB Xpand 6.0
MariaDB Xpand 6.1
Distributed Transactions
Example of Collation
As an example, consider a list of names. The character set defines which characters can appear in each name. The collation defines the order in which the names would be sorted, and whether two names would be equal:
In a case-insensitive collation, "Mueller" and "mueller" would be considered the same name.
In a case-sensitive collation, "Mueller" and "mueller" would be considered different names.
In a German collation, "Mueller" and "Müller" would be considered the same name.
In an English collation, "Mueller" and "Müller" would be considered different names.
In a collation that ignores padding, "Mueller" and "Mueller " would be considered the same name.
In a no-pad collation, "Mueller" and "Mueller " would be considered different names.
Supported Collations
For the full list of supported collations, see Collations in Xpand.
The supported collations can also be queried with the SHOW COLLATION statement:
SHOW COLLATION
LIKE 'utf8mb4_%';
+--------------------+---------+------+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | No | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | No | Yes | 1 |
+--------------------+---------+------+---------+----------+---------+
Collation Suffixes
There are several different collation suffixes, which refer to collations with different types of rules:
Collation Suffix | Case-Sensitive | Case-Insensitive | Binary | Legacy |
---|---|---|---|---|
| No | Yes | No | No |
| Yes | No | No | No |
| No | No | Yes | No |
| No | Yes | No | Yes |
| Yes | No | No | Yes |
Server Collation
The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table.
The server collation is determined by the collation_server
system variable. The Distributed Transactions topology is the only SkySQL topology that currently uses MariaDB Xpand, which has the following default value:
Topology | Default Value |
---|---|
Distributed Transactions | utf8_ |
Checking the Server Collation
You can check the server collation by executing the SHOW SESSION VARIABLES statement:
SHOW SESSION VARIABLES
LIKE 'collation_server';
+------------------+-----------------+
| Variable_name | Value |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+
Effect of the Server Collation
The server collation is used to determine the default collation of new databases and tables, unless a different collation is specified for a given database or table. To demonstrate this effect:
Confirm the current server collationThe server collation is
utf8_general_ci
on the Distributed Transactions topology. You can confirm that with SHOW SESSION VARIABLES:SHOW SESSION VARIABLES LIKE 'collation_server';
+------------------+-----------------+ | Variable_name | Value | +------------------+-----------------+ | collation_server | utf8_general_ci | +------------------+-----------------+
Create a database:
CREATE DATABASE db1;
You can confirm this database's default collation by querying the
information_schema.SCHEMATA
table:SELECT DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME='db1';
+------------------------+ | DEFAULT_COLLATION_NAME | +------------------------+ | utf8_general_ci | +------------------------+
Also create a table in this database, which has at least one column which contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) );
You can confirm that the table's default collation is the same as the database's default collation by querying the
information_schema.TABLES
table:SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab';
+-----------------+ | TABLE_COLLATION | +-----------------+ | utf8_general_ci | +-----------------+
You can also confirm the
str
column's collation by querying theinformation_schema.COLUMNS
table:SELECT COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='str';
+-----------------+ | COLLATION_NAME | +-----------------+ | utf8_general_ci | +-----------------+
Database's Default Collation
A database's default collation is used to determine the collation of new tables within the database, unless a different collation is specified for a given table.
Checking a Database's Default Collation
You can check a database's default collation by querying the information_schema.SCHEMATA
table:
SELECT DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8_general_ci |
+------------------------+
Setting a Database's Default Collation
A database's default collation can be set at creation by executing the CREATE DATABASE statement and specifying the DEFAULT COLLATE
database option:
CREATE DATABASE db1
DEFAULT COLLATE 'utf8mb4_bin';
The new database's default character set will be the character set associated with the specified collation.
Changing a Database's Default Collation
A database's default collation can be set by executing the ALTER DATABASE statement and specifying the DEFAULT COLLATE
database option:
ALTER DATABASE db1
DEFAULT COLLATE 'utf8mb4_bin';
The database's default character set will be the character set associated with the specified collation.
Effect of a Database's Default Collation
A database's default collation is used to determine the default collation of new tables within the database, unless a different collation is specified for a given table. To demonstrate this effect:
Check the database's default collation by querying the
information_schema.SCHEMATA
table:SELECT DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME='db1';
+------------------------+ | DEFAULT_COLLATION_NAME | +------------------------+ | utf8_general_ci | +------------------------+
Create a table in this database, which has at least one column which contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) );
You can confirm that the table's default collation is the same as the database's default collation by querying the
information_schema.TABLES
table:SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab';
+-----------------+ | TABLE_COLLATION | +-----------------+ | utf8_general_ci | +-----------------+
You can confirm that the
str
column is using the database's default collation by querying theinformation_schema.COLUMNS
table:SELECT COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='str';
+-----------------+ | COLLATION_NAME | +-----------------+ | utf8_general_ci | +-----------------+
Table's Default Collation
A table's default collation is used to determine the collation of new columns within the table, unless a different collation is specified for a given column.
Checking a Table's Default Collation
You can check a table's default collation by querying the information_schema.TABLES
table:
SELECT TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA='db1'
AND TABLE_NAME='tab';
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+
Setting a Table's Default Collation
A table's default collation can be set at creation by executing the CREATE TABLE statement and specifying the DEFAULT COLLATE
table option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50)
) DEFAULT COLLATE 'utf8mb4_bin';
The new table's default character set will be the character set associated with the specified collation.
Changing a Table's Default Collation
A table's default collation can be set by executing the ALTER TABLE statement and specifying the DEFAULT COLLATE
table option:
ALTER TABLE db1.tab
DEFAULT COLLATE 'utf8mb4_bin';
The table's default character set will be the character set associated with the specified collation.
Effect of a Table's Default Collation
A table's default collation is used to determine the default collation of new columns within the table, unless a different collation is specified for a given column. To demonstrate this effect:
Check the table's default collation by querying the
information_schema.TABLES
table:SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab';
+-----------------+ | TABLE_COLLATION | +-----------------+ | utf8_general_ci | +-----------------+
Add a new column in this table by executing the ALTER TABLE ... ADD COLUMN statement. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
ALTER TABLE db1.tab ADD COLUMN new_str VARCHAR(50);
You can confirm this column is using the table's default collation by querying the
information_schema.COLUMNS
table:SELECT COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='new_str';
+-----------------+ | COLLATION_NAME | +-----------------+ | utf8_general_ci | +-----------------+
Column's Collation
A column's collation is used by the column to sort and compare its character data.
Checking a Column's collation
You can check a column's collation by querying the information_schema.COLUMNS
table:
SELECT COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
AND TABLE_NAME='tab'
AND COLUMN_NAME='str';
+-----------------+
| COLLATION_NAME |
+-----------------+
| utf8_general_ci |
+-----------------+
Setting a Column's Collation
A column's collation can be set at table creation by executing the CREATE TABLE statement and specifying the COLLATE
column option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50) COLLATE 'utf8mb4_bin'
);
The column's character set will be the character set associated with the specified collation.
Changing a Column's Collation
A column's collation can be changed by executing the ALTER TABLE statement and specifying the MODIFY
clause with the COLLATE
column option:
ALTER TABLE db1.tab
MODIFY str varchar(50)
COLLATE 'utf8mb4_bin';
The column's character set will be the character set associated with the specified collation.
Binary Column Collations
For binary column collations, the collation compares the column's character data by comparing the binary values of each character.
For example, to see how the utf8mb4_bin
collation would handle sorting strings that use the utf8mb4
character set:
Create a table with a column that uses the
utf8mb4_bin
collation by executing the CREATE TABLE statement and specifying theCOLLATE
column option:CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) COLLATE 'utf8mb4_bin' );
Insert some data into the table:
INSERT INTO db1.tab VALUES (1, 'SkySQL'), (2, 'skysql'), (3, 'SKYSQL'), (4, 'MariaDB'), (5, 'mariadb'), (6, 'MARIADB');
Execute a query that eliminates duplicates and sorts the data:
SELECT DISTINCT str FROM db1.tab ORDER BY str;
+---------+ | str | +---------+ | MARIADB | | MariaDB | | SKYSQL | | SkySQL | | mariadb | | skysql | +---------+
The results show that lower case characters and upper case characters are considered to be different, and upper case characters are sorted before lower case characters, because upper case characters have lower code points (numeric values) than lower case characters.
Case-Insensitive Column Collations
For case-insensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as equal.
For example, to see how the utf8mb4_unicode_ci
collation would handle sorting strings that use the utf8mb4
character set:
Create a table with a column that uses the
utf8mb4_unicode_ci
collation by executing the CREATE TABLE statement and specifying theCOLLATE
column option:CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) COLLATE 'utf8mb4_unicode_ci' );
Insert some data into the table:
INSERT INTO db1.tab VALUES (1, 'SkySQL'), (2, 'skysql'), (3, 'SKYSQL'), (4, 'MariaDB'), (5, 'mariadb'), (6, 'MARIADB');
Execute a query that eliminates duplicates and sorts the data:
SELECT DISTINCT str FROM db1.tab ORDER BY str;
+---------+ | str | +---------+ | mariadb | | skysql | +---------+
The results show that lower case characters and upper case characters are considered to be the same, so many of the strings were considered to be duplicates.
Case-Sensitive Column Collations
For case-sensitive column collations, the collation compares the column's character data by treating upper case and lower case characters of the same character as different.
For example, to see how the latin1_general_cs
collation would handle sorting strings that use the latin1
character set:
Create a table with a column that uses the
latin1_general_cs
collation by executing the CREATE TABLE statement and specifying theCOLLATE
column option:CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) COLLATE 'latin1_general_cs' );
Insert some data into the table:
INSERT INTO db1.tab VALUES (1, 'SkySQL'), (2, 'skysql'), (3, 'SKYSQL'), (4, 'MariaDB'), (5, 'mariadb'), (6, 'MARIADB');
Execute a query that eliminates duplicates and sorts the data:
SELECT DISTINCT str FROM db1.tab ORDER BY str;
+---------+ | str | +---------+ | MARIADB | | MariaDB | | mariadb | | SKYSQL | | SkySQL | | skysql | +---------
The results show that lower case characters and upper case characters are considered to be different, and strings are sorted in lexicographical order.
Precedence
The collation precedence is different in different contexts.
Executing a Query
When Xpand executes a query, the collation of string literals depends on whether the literal has a character set introducer.
If a string literal contains a character set introducer, Xpand uses default collation for the character set specified for the string literal:
SELECT _latin1'this is a latin1 string';
+-------------------------+
| this is a latin1 string |
+-------------------------+
| this is a latin1 string |
+-------------------------+
If a string literal does not contain a character set introducer, Xpand uses the default collation for the character set specified by the character_set_client
system variable.
Xpand currently ignores the collation_connection
system variable.
Xpand does not support the COLLATE
clause to specify a column's collation in SELECT
queries.
Creating an Object
When Xpand creates an object (such as a database, table, or column), the character set of the object is determined with the following precedence:
Column default:
If the
[DEFAULT] COLLATION
column option is defined, the specified collation is used.If the
[DEFAULT] CHARACTER SET
or[DEFAULT] CHARSET
column option are defined, the default collation for the specified character set is used.
Table default:
If the
[DEFAULT] COLLATION
table option is defined, the specified collation is used.If the
[DEFAULT] CHARACTER SET
or[DEFAULT] CHARSET
table option are defined, the default collation for the specified character set is used.
Database default:
If the
[DEFAULT] COLLATION
database option is defined, the specified collation is used.If the
[DEFAULT] CHARACTER SET
or[DEFAULT] CHARSET
database option are defined, the default collation for the specified character set is used.
Server-wide default (the default collation for the character set defined by the
character_set_server
system variable)
Caveats
collation_database
System Variable
The collation_database
system variable cannot be modified.
Collation of Control Codes
Control codes (for example, white space characters and empty strings), collate differently in Xpand than MySQL. Both MySQL and Xpand trim spaces at the end of strings, but in Xpand, it is assumed that shorter strings always collate before longer strings. MySQL, however assumes a shorter string MAY collate after a longer string if the characters of the longer string contain pre-space characters.