Character Sets in MariaDB Enterprise Server

MariaDB Enterprise Server in SkySQL supports many different character sets. Character sets allow users to store characters from many different languages in the database. Some character sets also support special characters like currency symbols, mathematical symbols, musical symbols, and emoji.

Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.

The detail provided here applies to:

  • MariaDB Platform for Transactions

  • MariaDB Platform for Smart Transactions

Supported Character Sets

For the full list of supported character sets, see Character Sets in Enterprise Server 10.5 and Character Sets in Enterprise Server 10.4.

SHOW CHARACTER SET
   LIKE 'latin%';
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
+----------+-----------------------------+---------------------+--------+

Server Character Set

The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table.

Topology

Default Value

Transactional Standalone

utf8mb4

HA (Primary/Replica)

utf8mb4

Galera

utf8mb4

Analytical

utf8

HTAP

utf8

Checking the Server Character Set

SHOW SESSION VARIABLES
   LIKE 'character_set_server';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+

Effect of the Server Character Set

The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table. To demonstrate this effect:

Database's Default Character Set

A database's default character set is used to determine the character set of new tables within the database, unless a different character set is specified for a given table.

Checking a Database's Default Character Set

SELECT DEFAULT_CHARACTER_SET_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+----------------------------+
| DEFAULT_CHARACTER_SET_NAME |
+----------------------------+
| utf8mb4                    |
+----------------------------+

Setting a Database's Default Character Set and Collation

CREATE DATABASE db1
   DEFAULT CHARACTER SET 'utf8';

The new database's default collation will be the default collation for the specified character set.

To set the database's default collation along with the default character set, specify the optional DEFAULT COLLATE database option:

CREATE DATABASE db1
   DEFAULT CHARACTER SET 'utf8'
   DEFAULT COLLATE 'utf8_bin';

Changing a Database's Default Character Set and Collation

ALTER DATABASE db1
   DEFAULT CHARACTER SET 'utf8';

The database's default collation will be the default collation for the specified character set.

To change the database's default collation along with the default character set, specify the optional DEFAULT COLLATE database option:

ALTER DATABASE db1
   DEFAULT CHARACTER SET 'utf8'
   DEFAULT COLLATE 'utf8_bin';

Effect of a Database's Default Character Set

A database's default character set is used to determine the default character set of new tables within the database, unless a different character set is specified for a given table. To demonstrate this effect:

Table's Default Character Set

A table's default character set is used to determine the character set of new columns within the table, unless a different character set is specified for a given column.

Checking a Table's Default Character Set

SHOW CREATE TABLE db1.tab;
*************************** 1. row ***************************
       Table: tab
Create Table: CREATE TABLE `tab` (
  `id` int(11) NOT NULL,
  `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Setting a Table's Default Character Set

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
)
DEFAULT CHARACTER SET='utf8';

The table's default collation will be the default collation for the specified character set.

To set the table's default collation along with the default character set, specify the optional DEFAULT COLLATE table option:

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
)
DEFAULT CHARACTER SET='utf8'
DEFAULT COLLATE='utf8_bin';

Changing a Table's Default Character Set

ALTER TABLE db1.tab
   DEFAULT CHARACTER SET='utf8';

The table's default collation will be the default collation for the specified character set.

To change the table's default collation along with the default character set, specify the optional DEFAULT COLLATE table option:

ALTER TABLE db1.tab
   DEFAULT CHARACTER SET='utf8'
   DEFAULT COLLATE='utf8_bin';

Effect of a Table's Default Character Set

A table's default character set is used to determine the default character set of new columns within the table, unless a different character set is specified for a given column. To demonstrate this effect:

Column's Character Set

A column's character set is used to determine the character set used by the column to store its character data.

Checking a Column's Character Set

SELECT CHARACTER_SET_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab'
   AND COLUMN_NAME='str';
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| utf8mb4            |
+--------------------+

Setting a Column's Character Set and Collation

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) CHARACTER SET 'utf8'
);

The column's collation will be the default collation for the specified character set.

To set the column's collation along with the default character set, specify the optional COLLATE column option:

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) CHARACTER SET 'utf8' COLLATE 'utf8_bin'
);

Changing a Column's Character Set and Collation

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   CHARACTER SET 'utf8';

The column's collation will be the default collation for the specified character set.

To set the column's collation along with the default character set, specify the optional COLLATE column option:

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   CHARACTER SET 'utf8'
   COLLATE 'utf8_bin';

Client Character Sets

Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.

Clients have multiple character sets that should typically use the same character set:

Users should not typically change these system variables manually. Instead, there are methods to change the client character set that ensure that all the relevant client character sets are set to the same value together.

Setting the Client Character Set for an Existing Connection

Setting the Character Set for MariaDB CLI Tools

For MariaDB CLI tools, such as MariaDB Client and MariaDB Import, the default character set depends on the operating system.

On Linux, the default character set is determined from the LC_CTYPE of the console environment. This is influenced by the LC_CTYPE, LC_ALL, and LANG environment variables. Note that Linux's utf8 and utf-8 character sets are currently mapped to SkySQL's utf8 character set, which is an incomplete 3-byte version of the UTF-8 standard. To get access to the full UTF-8 standard, users should manually set the client's character set to utf8mb4 character set.

If the console environment's character set cannot be determined, it will default to latin1.

MariaDB Client and other MariaDB CLI tools also provide the --default-character-set option, which can be used to manually change the client character set.

Inbound Replication and Character Sets

If you are using inbound replication into a SkySQL service, you should ensure that your tables in the SkySQL service use the same character set as the tables in the external primary server. The best way to ensure that the tables use the right character set is to explicitly define the character set at table creation.

For example, you can use the following process to migrate tables to SkySQL prior to setting inbound replication:

  1. On the external primary server, check the table's default character set by executing the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET table option:

    SHOW CREATE TABLE db1.tab;
    
    *************************** 1. row ***************************
           Table: tab
    Create Table: CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
    
  2. Migrate the table to SkySQL with the proper character set by executing the exact CREATE TABLE statement provided from the external primary server:

    USE db1;
    CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci