Character Sets in MariaDB Enterprise Server
This page is part of MariaDB's Documentation.
The parent of this page is: Character Sets
Topics on this page:
Overview
MariaDB Enterprise Server
supports many different character sets. Character sets allow users to store characters from many different languages in the database. Some character sets also support special characters like currency symbols, mathematical symbols, musical symbols, and emoji.Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.
Compatibility
The detail provided here applies to:
MariaDB Enterprise Server 10.2
MariaDB Enterprise Server 10.3
MariaDB Enterprise Server 10.4
MariaDB Enterprise Server 10.5
MariaDB Enterprise Server 10.6
Popular Character Sets
Character Set | Supported Languages | Standard | Bytes Per Character |
---|---|---|---|
utf8mb4 | Almost all languages | UTF-8 | 1-4 |
utf8 | Almost all languages | UTF-8
(Without supplementary characters)
| 1-3 |
latin1 | Western European languages
(Including English)
| CP-1252 | 1 |
Supported Character Sets
For the full list of supported character sets, see Character Sets in Enterprise Server 10.6, Character Sets in Enterprise Server 10.5, Character Sets in Enterprise Server 10.4, Character Sets in Enterprise Server 10.3, and Character Sets in Enterprise Server 10.2.
The supported character sets can also be queried with the SHOW CHARACTER SET statement:
SHOW CHARACTER SET
LIKE 'latin%';
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
+----------+-----------------------------+---------------------+--------+
Server Character Set
The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table.
The server character set is determined by the character_
Checking the Server Character Set
You can check the server character set by executing the SHOW SESSION VARIABLES statement:
SHOW SESSION VARIABLES
LIKE 'character_set_server';
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
Effect of the Server Character Set
The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table. To demonstrate this effect:
Confirm the current server character set with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES LIKE 'character_set_server';
+----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | character_set_server | utf8mb4 | +----------------------+---------+
Create a database:
CREATE DATABASE db1;
Confirm this database's default character set by querying the information_
schema.SCHEMATA table:SELECT DEFAULT_CHARACTER_SET_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME='db1';
+----------------------------+ | DEFAULT_CHARACTER_SET_NAME | +----------------------------+ | utf8mb4 | +----------------------------+
Create a table in this database. The table should have at least one column that contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) );
Confirm the table's default character set is the same as the database's default character set by executing the SHOW CREATE TABLE statement and looking for the
DEFAULT CHARSET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) NOT NULL, `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Confirm the
str
column's character set by querying the information_schema.COLUMNS table:SELECT CHARACTER_SET_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='str';
+--------------------+ | CHARACTER_SET_NAME | +--------------------+ | utf8mb4 | +--------------------+
Database's Default Character Set
A database's default character set is used to determine the character set of new tables within the database, unless a different character set is specified for a given table.
Checking a Database's Default Character Set
You can check a database's default character set by querying the information_
SELECT DEFAULT_CHARACTER_SET_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+----------------------------+
| DEFAULT_CHARACTER_SET_NAME |
+----------------------------+
| utf8mb4 |
+----------------------------+
Setting a Database's Default Character Set and Collation
A database's default character set can be set at creation by executing the CREATE DATABASE statement and specifying the DEFAULT CHARACTER SET
database option:
CREATE DATABASE db1
DEFAULT CHARACTER SET 'utf8';
The new database's default collation will be the default collation for the specified character set.
To set the database's default collation along with the default character set, specify the optional DEFAULT COLLATE
database option:
CREATE DATABASE db1
DEFAULT CHARACTER SET 'utf8'
DEFAULT COLLATE 'utf8_bin';
Changing a Database's Default Character Set and Collation
A database's default character set can be changed by executing the ALTER DATABASE statement and specifying the DEFAULT CHARACTER SET
database option:
ALTER DATABASE db1
DEFAULT CHARACTER SET 'utf8';
The database's default collation will be the default collation for the specified character set.
To change the database's default collation along with the default character set, specify the optional DEFAULT COLLATE
database option:
ALTER DATABASE db1
DEFAULT CHARACTER SET 'utf8'
DEFAULT COLLATE 'utf8_bin';
Effect of a Database's Default Character Set
A database's default character set is used to determine the default character set of new tables within the database, unless a different character set is specified for a given table. To demonstrate this effect:
Check the database's default character set by querying the information_
schema.SCHEMATA table:SELECT DEFAULT_CHARACTER_SET_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME='db1';
+----------------------------+ | DEFAULT_CHARACTER_SET_NAME | +----------------------------+ | utf8mb4 | +----------------------------+
Create a table in this database. The table should have at least one column that contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
CREATE TABLE db1.tab ( id int PRIMARY KEY, str varchar(50) );
Confirm that the table's default character set is the same as the database's default character set by executing the SHOW CREATE TABLE statement and looking for the
DEFAULT CHARSET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) NOT NULL, `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Confirm that the
str
column is using the database's default character set by querying the information_schema.COLUMNS table:SELECT CHARACTER_SET_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='str';
+--------------------+ | CHARACTER_SET_NAME | +--------------------+ | utf8mb4 | +--------------------+
Table's Default Character Set
A table's default character set is used to determine the character set of new columns within the table, unless a different character set is specified for a given column.
Checking a Table's Default Character Set
To check a table's default character set, execute the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET
table option:
SHOW CREATE TABLE db1.tab;
*************************** 1. row ***************************
Table: tab
Create Table: CREATE TABLE `tab` (
`id` int(11) NOT NULL,
`str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Setting a Table's Default Character Set
A table's default character set can be set at creation by executing the CREATE TABLE statement and specifying the DEFAULT CHARACTER SET
table option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50)
)
DEFAULT CHARACTER SET='utf8';
The table's default collation will be the default collation for the specified character set.
To set the table's default collation along with the default character set, specify the optional DEFAULT COLLATE
table option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50)
)
DEFAULT CHARACTER SET='utf8'
DEFAULT COLLATE='utf8_bin';
Changing a Table's Default Character Set
A table's default character set can be changed by executing the ALTER TABLE statement and specifying the DEFAULT CHARACTER SET
table option:
ALTER TABLE db1.tab
DEFAULT CHARACTER SET='utf8';
The table's default collation will be the default collation for the specified character set.
To change the table's default collation along with the default character set, specify the optional DEFAULT COLLATE
table option:
ALTER TABLE db1.tab
DEFAULT CHARACTER SET='utf8'
DEFAULT COLLATE='utf8_bin';
Effect of a Table's Default Character Set
A table's default character set is used to determine the default character set of new columns within the table, unless a different character set is specified for a given column. To demonstrate this effect:
Check the table's default character set by executing the SHOW CREATE TABLE statement and looking for the
DEFAULT CHARSET
table option:SHOW CREATE TABLE db1.tab;
*************************** 1. row *************************** Table: tab Create Table: CREATE TABLE `tab` ( `id` int(11) NOT NULL, `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Add a new column in this table by executing the ALTER TABLE ... ADD COLUMN statement. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:
ALTER TABLE db1.tab ADD COLUMN new_str VARCHAR(50);
Confirm the column is using the table's default character set by querying the information_
schema.COLUMNS table:SELECT CHARACTER_SET_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA='db1' AND TABLE_NAME='tab' AND COLUMN_NAME='new_str';
+--------------------+ | CHARACTER_SET_NAME | +--------------------+ | utf8mb4 | +--------------------+
Column's Character Set
A column's character set is used to determine the character set used by the column to store its character data.
Checking a Column's Character Set
You can check a column's character set by querying the information_
SELECT CHARACTER_SET_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
AND TABLE_NAME='tab'
AND COLUMN_NAME='str';
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| utf8mb4 |
+--------------------+
Setting a Column's Character Set and Collation
A column's character set can be set when executing the CREATE TABLE statement and specifying the CHARACTER SET
column option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50) CHARACTER SET 'utf8'
);
The column's collation will be the default collation for the specified character set.
To set the column's collation along with the default character set, specify the optional COLLATE
column option:
CREATE TABLE db1.tab (
id int PRIMARY KEY,
str varchar(50) CHARACTER SET 'utf8' COLLATE 'utf8_bin'
);
Changing a Column's Character Set and Collation
A column's character set can be changed by executing the ALTER TABLE statement and specifying the MODIFY
clause with the CHARACTER SET
column option:
ALTER TABLE db1.tab
MODIFY str varchar(50)
CHARACTER SET 'utf8';
The column's collation will be the default collation for the specified character set.
To set the column's collation along with the default character set, specify the optional COLLATE
column option:
ALTER TABLE db1.tab
MODIFY str varchar(50)
CHARACTER SET 'utf8'
COLLATE 'utf8_bin';
Client Character Sets
Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.
Clients have multiple character sets that should typically use the same character set:
The character set the client uses to send statements to the server is shown by the character_
set_ system variable.client The character set the server uses to interpret literal values from the client is shown by the character_
set_ system variable.connection The character set the server uses to send results to the client is shown by the character_
set_ system variable.results
Users should not typically change these system variables manually. Instead, there are methods to change the client character set that ensure that all the relevant client character sets are set to the same value together.
Setting the Client Character Set for an Existing Connection
The SET NAMES statement can be used to set the client character set for an existing connection. When this statement is executed, it sets the character_COLLATE
clause, the collation_
Confirm the current client character set with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results');
+--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_results | utf8 | +--------------------------+-------+
Change the client character set with the SET NAMES statement:
SET NAMES 'utf8mb4';
Confirm the current client character set has changed with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results');
+--------------------------+---------+ | Variable_name | Value | +--------------------------+---------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_results | utf8mb4 | +--------------------------+---------+
Add the optional
COLLATE
clause to the the SET NAMES statement:SET NAMES 'latin1' COLLATE 'latin1_general_cs';
Confirm the collation_
connection system variable was also changed:SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection');
+--------------------------+-------------------+ | Variable_name | Value | +--------------------------+-------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_results | latin1 | | collation_connection | latin1_general_cs | +--------------------------+-------------------+
Setting the Character Set for MariaDB CLI Tools
For MariaDB CLI tools, such as MariaDB Client and MariaDB Import, the default character set depends on the operating system.
On Linux, the default character set is determined from the LC_CTYPE
of the console environment. This is influenced by the LC_CTYPE
, LC_ALL
, and LANG
environment variables. Note that Linux's utf8
and utf-8
character sets are currently mapped to Enterprise Server's utf8
character set, which is an incomplete 3-byte version of the UTF-8 standard. To get access to the full UTF-8 standard, users should manually set the client's character set to utf8mb4
character set.
On Windows, the default character set is determined by the console's code page.
If the console environment's character set cannot be determined, it will default to latin1
.
MariaDB Client and other MariaDB CLI tools also provide the --default-character-set option, which can be used to manually change the client character set.
For example, to specify the --default-character-set option with the value
utf8mb4
:$ mariadb --host example.skysql.net --port 5001 \ --user db_user --password \ --ssl-ca ~/Downloads/skysql_chain.pem \ --default-character-set utf8mb4
Confirm the client character set has been set with SHOW SESSION VARIABLES:
SHOW SESSION VARIABLES WHERE Variable_name IN('character_set_client', 'character_set_connection', 'character_set_results');
+--------------------------+---------+ | Variable_name | Value | +--------------------------+---------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_results | utf8mb4 | +--------------------------+---------+
Inbound Replication and Character Sets
If you are using replication, you should ensure that your tables on the replica server use the same character set as the tables in the primary server. The best way to ensure that the tables use the right character set is to explicitly define the character set at table creation.