Character Sets in MariaDB Xpand
This page is part of MariaDB's Documentation.
The parent of this page is: Character Sets
Topics on this page:
Overview
MariaDB Xpand in SkySQL supports many different character sets. Character sets allow users to store characters from many different languages in the database. Some character sets also support special characters like currency symbols, mathematical symbols, musical symbols, and emoji.
Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.
Compatibility
The detail provided here applies to:
MariaDB Xpand 5.3
MariaDB Xpand 6.0
MariaDB Xpand 6.1
MariaDB SkySQL Distributed Transactions
Supported Character Sets
For the full list of supported character sets, see Character Sets in Xpand.
The supported character sets can also be queried with the SHOW CHARACTER SET statement:
SHOW CHARACTER SET LIKE 'latin%';
And the supported character sets can also be queried using the system.mysql_character_sets
system table:
SELECT * FROM system.mysql_character_sets;
+---------+-----------------------+--------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+-----------------------+--------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| latin1 | CP1252 West European | latin1_swedish_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
+---------+-----------------------+--------------------+--------+
Server Character Set
The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table.
The server character set is determined by the character_set_server
system variable. The Distributed Transactions topology is the only SkySQL topology that currently uses MariaDB Xpand, which has the following default value:
Topology | Default Value |
---|---|
Distributed Transactions | utf8 |
Checking the Server Character Set
You can check the server character set by executing the SHOW SESSION VARIABLES statement:
SHOW SESSION VARIABLES
LIKE 'character_set_server';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_server | utf8 |
+----------------------+-------+
Effect of the Server Character Set
The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table. To demonstrate this effect:
Confirm the current server character setThe server character set is
utf8mb4
on many SkySQL topologies. This can be confirmed with SHOW SESSION VARIABLES:SHOW SESSION VARIABLES LIKE 'character_set_server';
+----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | character_set_server | utf8 | +----------------------+-------+
Create a database:
CREATE DATABASE db1;
Confirm this database's default character set by querying the
information_schema.SCHEMATA
table:SELECT DEFAULT_CHARACTER_SET_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME='db1';
+----------------------------+ | DEFAULT_CHARACTER_SET_NAME | +----------------------------+ | utf8 | +----------------------------+
Create a table in this database. The table should have at least one column that contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) );
Confirm the table's default character set is the same as the database's default character set by executing the SHOW CREATE TABLE statement and looking for the
CHARACTER SET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) not null, `str` varchar(50) CHARACTER SET utf8, PRIMARY KEY (`id`) /*$ DISTRIBUTE=1 */ ) CHARACTER SET utf8 /*$ SLICES=3 */
Confirm the
str
column's character set by querying theinformation_schema.COLUMNS
table:SELECT CHARACTER_SET_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='str';
+--------------------+ | CHARACTER_SET_NAME | +--------------------+ | utf8 | +--------------------+
Database's Default Character Set
A database's default character set is used to determine the character set of new tables within the database, unless a different character set is specified for a given table.
Checking a Database's Default Character Set
You can check a database's default character set by querying the information_schema.SCHEMATA
table:
SELECT DEFAULT_CHARACTER_SET_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+----------------------------+
| DEFAULT_CHARACTER_SET_NAME |
+----------------------------+
| utf8 |
+----------------------------+
Setting a Database's Default Character Set and Collation
A database's default character set can be set at creation by executing the CREATE DATABASE statement and specifying the DEFAULT CHARACTER SET
database option:
CREATE DATABASE db1
DEFAULT CHARACTER SET 'utf8';
The new database's default collation will be the default collation for the specified character set.
To set the database's default collation along with the default character set, specify the optional DEFAULT COLLATE
database option:
CREATE DATABASE db1
DEFAULT CHARACTER SET 'utf8'
DEFAULT COLLATE 'utf8_bin';
Changing a Database's Default Character Set and Collation
A database's default character set can be changed by executing the ALTER DATABASE statement and specifying the DEFAULT CHARACTER SET
database option:
ALTER DATABASE db1
DEFAULT CHARACTER SET 'utf8';
The database's default collation will be the default collation for the specified character set.
To change the database's default collation along with the default character set, specify the optional DEFAULT COLLATE
database option:
ALTER DATABASE db1
DEFAULT CHARACTER SET 'utf8'
DEFAULT COLLATE 'utf8_bin';
Effect of a Database's Default Character Set
A database's default character set is used to determine the default character set of new tables within the database, unless a different character set is specified for a given table. To demonstrate this effect:
Check the database's default character set by querying the
information_schema.SCHEMATA
table:SELECT DEFAULT_CHARACTER_SET_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME='db1';
+----------------------------+ | DEFAULT_CHARACTER_SET_NAME | +----------------------------+ | utf8 | +----------------------------+
Create a table in this database. The table should have at least one column that contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) );
Confirm that the table's default character set is the same as the database's default character set by executing the SHOW CREATE TABLE statement and looking for the
CHARACTER SET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) not null, `str` varchar(50) CHARACTER SET utf8, PRIMARY KEY (`id`) /*$ DISTRIBUTE=1 */ ) CHARACTER SET utf8 /*$ SLICES=3 */
Confirm that the
str
column is using the database's default character set by querying theinformation_schema.COLUMNS
table:SELECT CHARACTER_SET_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='str';
+--------------------+ | CHARACTER_SET_NAME | +--------------------+ | utf8 | +--------------------+
Table's Default Character Set
A table's default character set is used to determine the character set of new columns within the table, unless a different character set is specified for a given column.
Checking a Table's Default Character Set
To check a table's default character set, execute the SHOW CREATE TABLE statement and looking for the CHARACTER SET
table option:
SHOW CREATE TABLE db1.tab\G
*************************** 1. row ***************************
Table: tab
Create Table: CREATE TABLE `tab` (
`id` int(11) not null,
`str` varchar(50) CHARACTER SET utf8,
PRIMARY KEY (`id`) /*$ DISTRIBUTE=1 */
) CHARACTER SET utf8 /*$ SLICES=3 */
Setting a Table's Default Character Set
A table's default character set can be set at creation by executing the CREATE TABLE statement and specifying the DEFAULT CHARACTER SET
table option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50)
)
DEFAULT CHARACTER SET='utf8';
The table's default collation will be the default collation for the specified character set.
To set the table's default collation along with the default character set, specify the optional DEFAULT COLLATE
table option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50)
)
DEFAULT CHARACTER SET='utf8'
DEFAULT COLLATE='utf8_bin';
Changing a Table's Default Character Set
A table's default character set can be changed by executing the ALTER TABLE statement and specifying the DEFAULT CHARACTER SET
table option:
ALTER TABLE db1.tab
DEFAULT CHARACTER SET='utf8';
The table's default collation will be the default collation for the specified character set.
To change the table's default collation along with the default character set, specify the optional DEFAULT COLLATE
table option:
ALTER TABLE db1.tab
DEFAULT CHARACTER SET='utf8'
DEFAULT COLLATE='utf8_bin';
Effect of a Table's Default Character Set
A table's default character set is used to determine the default character set of new columns within the table, unless a different character set is specified for a given column. To demonstrate this effect:
Check the table's default character set by executing the SHOW CREATE TABLE statement and looking for the
DEFAULT CHARSET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) not null, `str` varchar(50) CHARACTER SET utf8, PRIMARY KEY (`id`) /*$ DISTRIBUTE=1 */ ) CHARACTER SET utf8 /*$ SLICES=3 */
Add a new column in this table by executing the ALTER TABLE ... ADD COLUMN statement. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
ALTER TABLE db1.tab ADD COLUMN new_str VARCHAR(50);
Confirm the column is using the table's default character set by querying the
information_schema.COLUMNS
table:SELECT CHARACTER_SET_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='new_str';
+--------------------+ | CHARACTER_SET_NAME | +--------------------+ | utf8 | +--------------------+
Column's Character Set
A column's character set is used to determine the character set used by the column to store its character data.
Checking a Column's Character Set
You can check a column's character set by querying the information_schema.COLUMNS
table:
SELECT CHARACTER_SET_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
AND TABLE_NAME='tab'
AND COLUMN_NAME='str';
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| utf8 |
+--------------------+
Setting a Column's Character Set and Collation
A column's character set can be set when executing the CREATE TABLE statement and specifying the CHARACTER SET
column option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50) CHARACTER SET 'utf8'
);
The column's collation will be the default collation for the specified character set.
To set the column's collation along with the default character set, specify the optional COLLATE
column option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50) CHARACTER SET 'utf8' COLLATE 'utf8_bin'
);
Changing a Column's Character Set and Collation
A column's character set can be changed by executing the ALTER TABLE statement and specifying the MODIFY
clause with the CHARACTER SET
column option:
ALTER TABLE db1.tab
MODIFY str varchar(50)
CHARACTER SET 'utf8';
The column's collation will be the default collation for the specified character set.
To set the column's collation along with the default character set, specify the optional COLLATE
column option:
ALTER TABLE db1.tab
MODIFY str varchar(50)
CHARACTER SET 'utf8'
COLLATE 'utf8_bin';
Client Character Sets
Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.
Clients have multiple character sets that should typically use the same character set:
The character set the client uses to send statements to the server is shown by the
character_set_client
system variable.The character set the server uses to interpret literal values from the client is shown by the
character_set_connection
system variable.The character set the server uses to send results to the client is shown by the
character_set_results
system variable.
Users should not typically change these system variables manually. Instead, there are methods to change the client character set that ensure that all the relevant client character sets are set to the same value together.
Setting the Client Character Set for an Existing Connection
The SET NAMES statement can be used to set the client character set for an existing connection. When this statement is executed, it sets the character_set_client
, character_set_connection
, and character_set_results
system variables to the same value together. If the statement specifies a collation with the optional COLLATE
clause, the collation_connection
system variable is also changed at the same time. To see how this works:
Confirm the current client character set with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results');
+--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_results | utf8 | +--------------------------+-------+
Change the client character set with the SET NAMES statement:
SET NAMES 'utf8mb4';
Confirm the current client character set has changed with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results');
+--------------------------+---------+ | Variable_name | Value | +--------------------------+---------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_results | utf8mb4 | +--------------------------+---------+
Add the optional
COLLATE
clause to the the SET NAMES statement:SET NAMES 'latin1' COLLATE 'latin1_general_cs';
Confirm the
collation_connection
system variable was also changed:SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection');
+--------------------------+-------------------+ | Variable_name | Value | +--------------------------+-------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_results | latin1 | | collation_connection | latin1_general_cs | +--------------------------+-------------------+
Setting the Character Set for MariaDB CLI Tools
For MariaDB CLI tools, such as MariaDB Client and MariaDB Import, the default character set depends on the operating system.
Note
When you connect to the Distributed Transactions topology with MariaDB Client, you should set the default character set to utf8
.
On Linux, the default character set is determined from the LC_CTYPE
of the console environment. This is influenced by the LC_CTYPE
, LC_ALL
, and LANG
environment variables. Note that Linux's utf8
and utf-8
character sets are currently mapped to the utf8
character set, which is an incomplete 3-byte version of the UTF-8 standard. To get access to the full UTF-8 standard, users should manually set the client's character set to utf8mb4
character set.
On Windows, the default character set is determined by the console's code page.
If the console environment's character set cannot be determined, it will default to latin1
.
MariaDB Client and other MariaDB CLI tools also provide the --default-character-set option, which can be used to manually change the client character set.
For example, to specify the --default-character-set option with the value
utf8
:$ mariadb --host example.skysql.net --port 5001 \ --user db_user --password \ --ssl-ca ~/Downloads/skysql_chain.pem \ --default-character-set utf8
Confirm the client character set has been set with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results');
+--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_results | utf8 | +--------------------------+-------+
Precedence
The character set precedence is different in different contexts.
Receiving a Query
When Xpand receives a query, the character set of the statement is determined by the character_set_client
system variable.
Executing a Query
When Xpand executes a query, the character set of string literals depends on whether the literal has a character set introducer.
If a string literal contains a character set introducer, Xpand uses the character set specified for the string literal:
SELECT _latin1'this is a latin1 string';
+-------------------------+
| this is a latin1 string |
+-------------------------+
| this is a latin1 string |
+-------------------------+
If a string literal does not contain a character set introducer, Xpand uses the character set specified by the character_set_client
system variable.
Xpand currently ignores the character_set_connection
system variable.
Sending Query Results
When Xpand sends query results, the character set is determined by the character_set_results
system variable.
Creating an Object
When Xpand creates an object (such as a database, table, or column), the character set of the object is determined with the following precedence:
Column default:
If the
[DEFAULT] CHARACTER SET
or[DEFAULT] CHARSET
column option are defined, the specified character set is used.If the
[DEFAULT] COLLATION
column option is defined, the default character set for the specified collation is used.
Table default:
If the
[DEFAULT] CHARACTER SET
or[DEFAULT] CHARSET
table option are defined, the specified character set is used.If the
[DEFAULT] COLLATION
table option is defined, the default character set for the specified collation is used.
Database default:
If the
[DEFAULT] CHARACTER SET
or[DEFAULT] CHARSET
database option are defined, the specified character set is used.If the
[DEFAULT] COLLATION
database option is defined, the default character set for the specified collation is used.
Server-wide default (defined by the
character_set_server
system variable)
Caveats
Inbound Replication and Character Sets
If you are using replication, you should ensure that your tables in the slave cluster use the same character set as the tables in the master. The best way to ensure that the tables use the right character set is to explicitly define the character set at table creation.
If you are using inbound replication into a SkySQL service, you should ensure that your tables in the SkySQL service use the same character set as the tables in the external primary server. The best way to ensure that the tables use the right character set is to explicitly define the character set at table creation.
For example, you can use the following process to migrate tables to SkySQL prior to setting inbound replication:
On the external primary server, check the table's default character set by executing the SHOW CREATE TABLE statement and looking for the
DEFAULT CHARSET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) NOT NULL, `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Migrate the table to SkySQL with the proper character set by executing the same CREATE TABLE statement provided from the external primary server, after making adjustments for MariaDB Xpand:
CREATE TABLE db1.`tab` ( `id` int(11) NOT NULL, `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
UTF-8 Code Points
UTF8 code points in Xpand are not same as that of MySQL due to the fact that the internal implementation of UTF8 code point validity in Xpand varies from that of MySQL.
SHOW CREATE TABLE
Xpand will display column-level character set information as part of SHOW CREATE TABLE
output, even if it does not differ from the character set defined for the table.