Character Sets in MariaDB Enterprise Server

Overview

MariaDB Enterprise Server in SkySQL supports many different character sets. Character sets allow users to store characters from many different languages in the database. Some character sets also support special characters like currency symbols, mathematical symbols, musical symbols, and emoji.

Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.

Compatibility

The detail provided here applies to:

  • MariaDB Enterprise Server 10.2

  • MariaDB Enterprise Server 10.3

  • MariaDB Enterprise Server 10.4

  • MariaDB Enterprise Server 10.5

  • MariaDB Enterprise Server 10.6

  • Multi-Node Analytics

  • Replicated Transactions

  • Singe Node Analytics

  • Single Node Transactions

Supported Character Sets

The supported character sets can also be queried with the SHOW CHARACTER SET statement:

SHOW CHARACTER SET
   LIKE 'latin%';
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
+----------+-----------------------------+---------------------+--------+

Server Character Set

The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table.

The server character set is determined by the character_set_server system variable. This system variable has different default values in different SkySQL topologies:

Topology

Default Value

Multi-Node Analytics

utf8

Replicated Transactions (Primary/Replica)

utf8mb4

Single Node Analytics

utf8

Single Node Transactions

utf8mb4

Checking the Server Character Set

You can check the server character set by executing the SHOW SESSION VARIABLES statement:

SHOW SESSION VARIABLES
   LIKE 'character_set_server';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+

Effect of the Server Character Set

The server character set is used to determine the default character set of new databases and tables, unless a different character set is specified for a given database or table. To demonstrate this effect:

  1. Confirm the current server character setThe server character set is utf8mb4 on many SkySQL topologies. This can be confirmed with SHOW SESSION VARIABLES:

    SHOW SESSION VARIABLES
       LIKE 'character_set_server';
    
    +----------------------+---------+
    | Variable_name        | Value   |
    +----------------------+---------+
    | character_set_server | utf8mb4 |
    +----------------------+---------+
    
  2. Create a database:

    CREATE DATABASE db1;
    
  3. Confirm this database's default character set by querying the information_schema.SCHEMATA table:

    SELECT DEFAULT_CHARACTER_SET_NAME
    FROM information_schema.SCHEMATA
    WHERE SCHEMA_NAME='db1';
    
    +----------------------------+
    | DEFAULT_CHARACTER_SET_NAME |
    +----------------------------+
    | utf8mb4                    |
    +----------------------------+
    
  4. Create a table in this database. The table should have at least one column that contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:

    CREATE TABLE db1.tab (
       id int PRIMARY KEY,
       str varchar(50)
    );
    
  5. Confirm the table's default character set is the same as the database's default character set by executing the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET table option:

    SHOW CREATE TABLE db1.tab;
    
    *************************** 1. row ***************************
           Table: tab
    Create Table: CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
    
  6. Confirm the str column's character set by querying the information_schema.COLUMNS table:

    SELECT CHARACTER_SET_NAME
    FROM information_schema.COLUMNS
    WHERE TABLE_SCHEMA='db1'
       AND TABLE_NAME='tab'
       AND COLUMN_NAME='str';
    
    +--------------------+
    | CHARACTER_SET_NAME |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    

Database's Default Character Set

A database's default character set is used to determine the character set of new tables within the database, unless a different character set is specified for a given table.

Checking a Database's Default Character Set

You can check a database's default character set by querying the information_schema.SCHEMATA table:

SELECT DEFAULT_CHARACTER_SET_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME='db1';
+----------------------------+
| DEFAULT_CHARACTER_SET_NAME |
+----------------------------+
| utf8mb4                    |
+----------------------------+

Setting a Database's Default Character Set and Collation

A database's default character set can be set at creation by executing the CREATE DATABASE statement and specifying the DEFAULT CHARACTER SET database option:

CREATE DATABASE db1
   DEFAULT CHARACTER SET 'utf8';

The new database's default collation will be the default collation for the specified character set.

To set the database's default collation along with the default character set, specify the optional DEFAULT COLLATE database option:

CREATE DATABASE db1
   DEFAULT CHARACTER SET 'utf8'
   DEFAULT COLLATE 'utf8_bin';

Changing a Database's Default Character Set and Collation

A database's default character set can be changed by executing the ALTER DATABASE statement and specifying the DEFAULT CHARACTER SET database option:

ALTER DATABASE db1
   DEFAULT CHARACTER SET 'utf8';

The database's default collation will be the default collation for the specified character set.

To change the database's default collation along with the default character set, specify the optional DEFAULT COLLATE database option:

ALTER DATABASE db1
   DEFAULT CHARACTER SET 'utf8'
   DEFAULT COLLATE 'utf8_bin';

Effect of a Database's Default Character Set

A database's default character set is used to determine the default character set of new tables within the database, unless a different character set is specified for a given table. To demonstrate this effect:

  1. Check the database's default character set by querying the information_schema.SCHEMATA table:

    SELECT DEFAULT_CHARACTER_SET_NAME
    FROM information_schema.SCHEMATA
    WHERE SCHEMA_NAME='db1';
    
    +----------------------------+
    | DEFAULT_CHARACTER_SET_NAME |
    +----------------------------+
    | utf8mb4                    |
    +----------------------------+
    
  2. Create a table in this database. The table should have at least one column that contains characters. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:

    CREATE TABLE db1.tab (
       id int PRIMARY KEY,
       str varchar(50)
    );
    
  3. Confirm that the table's default character set is the same as the database's default character set by executing the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET table option:

    SHOW CREATE TABLE db1.tab;
    
    *************************** 1. row ***************************
           Table: tab
    Create Table: CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
    
  4. Confirm that the str column is using the database's default character set by querying the information_schema.COLUMNS table:

    SELECT CHARACTER_SET_NAME
    FROM information_schema.COLUMNS
    WHERE TABLE_SCHEMA='db1'
       AND TABLE_NAME='tab'
       AND COLUMN_NAME='str';
    
    +--------------------+
    | CHARACTER_SET_NAME |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    

Table's Default Character Set

A table's default character set is used to determine the character set of new columns within the table, unless a different character set is specified for a given column.

Checking a Table's Default Character Set

To check a table's default character set, execute the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET table option:

SHOW CREATE TABLE db1.tab;
*************************** 1. row ***************************
       Table: tab
Create Table: CREATE TABLE `tab` (
  `id` int(11) NOT NULL,
  `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Setting a Table's Default Character Set

A table's default character set can be set at creation by executing the CREATE TABLE statement and specifying the DEFAULT CHARACTER SET table option:

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
)
DEFAULT CHARACTER SET='utf8';

The table's default collation will be the default collation for the specified character set.

To set the table's default collation along with the default character set, specify the optional DEFAULT COLLATE table option:

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50)
)
DEFAULT CHARACTER SET='utf8'
DEFAULT COLLATE='utf8_bin';

Changing a Table's Default Character Set

A table's default character set can be changed by executing the ALTER TABLE statement and specifying the DEFAULT CHARACTER SET table option:

ALTER TABLE db1.tab
   DEFAULT CHARACTER SET='utf8';

The table's default collation will be the default collation for the specified character set.

To change the table's default collation along with the default character set, specify the optional DEFAULT COLLATE table option:

ALTER TABLE db1.tab
   DEFAULT CHARACTER SET='utf8'
   DEFAULT COLLATE='utf8_bin';

Effect of a Table's Default Character Set

A table's default character set is used to determine the default character set of new columns within the table, unless a different character set is specified for a given column. To demonstrate this effect:

  1. Check the table's default character set by executing the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET table option:

    SHOW CREATE TABLE db1.tab;
    
    *************************** 1. row ***************************
           Table: tab
    Create Table: CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
    
  2. Add a new column in this table by executing the ALTER TABLE ... ADD COLUMN statement. This column could be a data type such as VARCHAR, CHAR, TEXT, or any of the aliases for those types:

    ALTER TABLE db1.tab
       ADD COLUMN new_str VARCHAR(50);
    
  3. Confirm the column is using the table's default character set by querying the information_schema.COLUMNS table:

    SELECT CHARACTER_SET_NAME
    FROM information_schema.COLUMNS
    WHERE TABLE_SCHEMA='db1'
       AND TABLE_NAME='tab'
       AND COLUMN_NAME='new_str';
    
    +--------------------+
    | CHARACTER_SET_NAME |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    

Column's Character Set

A column's character set is used to determine the character set used by the column to store its character data.

Checking a Column's Character Set

You can check a column's character set by querying the information_schema.COLUMNS table:

SELECT CHARACTER_SET_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA='db1'
   AND TABLE_NAME='tab'
   AND COLUMN_NAME='str';
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| utf8mb4            |
+--------------------+

Setting a Column's Character Set and Collation

A column's character set can be set when executing the CREATE TABLE statement and specifying the CHARACTER SET column option:

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) CHARACTER SET 'utf8'
);

The column's collation will be the default collation for the specified character set.

To set the column's collation along with the default character set, specify the optional COLLATE column option:

CREATE TABLE db1.tab (
   id int PRIMARY KEY,
   str varchar(50) CHARACTER SET 'utf8' COLLATE 'utf8_bin'
);

Changing a Column's Character Set and Collation

A column's character set can be changed by executing the ALTER TABLE statement and specifying the MODIFY clause with the CHARACTER SET column option:

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   CHARACTER SET 'utf8';

The column's collation will be the default collation for the specified character set.

To set the column's collation along with the default character set, specify the optional COLLATE column option:

ALTER TABLE db1.tab
   MODIFY str varchar(50)
   CHARACTER SET 'utf8'
   COLLATE 'utf8_bin';

Client Character Sets

Character sets are configured independently on the server and the client. The server and the client should typically use the same character set, otherwise there can be corruption or loss of data.

Clients have multiple character sets that should typically use the same character set:

  • The character set the client uses to send statements to the server is shown by the character_set_client system variable.

  • The character set the server uses to interpret literal values from the client is shown by the character_set_connection system variable.

  • The character set the server uses to send results to the client is shown by the character_set_results system variable.

Users should not typically change these system variables manually. Instead, there are methods to change the client character set that ensure that all the relevant client character sets are set to the same value together.

Setting the Client Character Set for an Existing Connection

The SET NAMES statement can be used to set the client character set for an existing connection. When this statement is executed, it sets the character_set_client, character_set_connection, and character_set_results system variables to the same value together. If the statement specifies a collation with the optional COLLATE clause, the collation_connection system variable is also changed at the same time. To see how this works:

  1. Confirm the current client character set with SHOW SESSION VARIABLES:

    SHOW SESSION VARIABLES
    WHERE Variable_name
       IN('character_set_client',
       'character_set_connection',
       'character_set_results');
    
    +--------------------------+-------+
    | Variable_name            | Value |
    +--------------------------+-------+
    | character_set_client     | utf8  |
    | character_set_connection | utf8  |
    | character_set_results    | utf8  |
    +--------------------------+-------+
    
  2. Change the client character set with the SET NAMES statement:

    SET NAMES 'utf8mb4';
    
  3. Confirm the current client character set has changed with SHOW SESSION VARIABLES:

    SHOW SESSION VARIABLES
    WHERE Variable_name
       IN('character_set_client',
       'character_set_connection',
       'character_set_results');
    
    +--------------------------+---------+
    | Variable_name            | Value   |
    +--------------------------+---------+
    | character_set_client     | utf8mb4 |
    | character_set_connection | utf8mb4 |
    | character_set_results    | utf8mb4 |
    +--------------------------+---------+
    
  4. Add the optional COLLATE clause to the the SET NAMES statement:

    SET NAMES 'latin1' COLLATE 'latin1_general_cs';
    
  5. Confirm the collation_connection system variable was also changed:

    SHOW SESSION VARIABLES
    WHERE Variable_name
       IN('character_set_client',
       'character_set_connection',
       'character_set_results',
       'collation_connection');
    
    +--------------------------+-------------------+
    | Variable_name            | Value             |
    +--------------------------+-------------------+
    | character_set_client     | latin1            |
    | character_set_connection | latin1            |
    | character_set_results    | latin1            |
    | collation_connection     | latin1_general_cs |
    +--------------------------+-------------------+
    

Setting the Character Set for MariaDB CLI Tools

For MariaDB CLI tools, such as MariaDB Client and MariaDB Import, the default character set depends on the operating system.

On Linux, the default character set is determined from the LC_CTYPE of the console environment. This is influenced by the LC_CTYPE, LC_ALL, and LANG environment variables. Note that Linux's utf8 and utf-8 character sets are currently mapped to Enterprise Server's SkySQL's utf8 character set, which is an incomplete 3-byte version of the UTF-8 standard. To get access to the full UTF-8 standard, users should manually set the client's character set to utf8mb4 character set.

If the console environment's character set cannot be determined, it will default to latin1.

MariaDB Client and other MariaDB CLI tools also provide the --default-character-set option, which can be used to manually change the client character set.

  1. For example, to specify the --default-character-set option with the value utf8mb4:

    $ mariadb --host example.skysql.net --port 5001 \
          --user db_user --password \
          --ssl-ca ~/Downloads/skysql_chain.pem \
          --default-character-set utf8mb4
    
  2. Confirm the client character set has been set with SHOW SESSION VARIABLES:

    SHOW SESSION VARIABLES
    WHERE Variable_name
       IN('character_set_client',
       'character_set_connection',
       'character_set_results');
    
    +--------------------------+---------+
    | Variable_name            | Value   |
    +--------------------------+---------+
    | character_set_client     | utf8mb4 |
    | character_set_connection | utf8mb4 |
    | character_set_results    | utf8mb4 |
    +--------------------------+---------+
    

Inbound Replication and Character Sets

If you are using replication, you should ensure that your tables on the replica server use the same character set as the tables in the primary server. The best way to ensure that the tables use the right character set is to explicitly define the character set at table creation.

If you are using inbound replication into a SkySQL service, you should ensure that your tables in the SkySQL service use the same character set as the tables in the external primary server. The best way to ensure that the tables use the right character set is to explicitly define the character set at table creation.

For example, you can use the following process to migrate tables to SkySQL prior to setting inbound replication:

  1. On the external primary server, check the table's default character set by executing the SHOW CREATE TABLE statement and looking for the DEFAULT CHARSET table option:

    SHOW CREATE TABLE db1.tab;
    
    *************************** 1. row ***************************
           Table: tab
    Create Table: CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
    
  2. Migrate the table to SkySQL with the proper character set by executing the exact CREATE TABLE statement provided from the external primary server:

    USE db1;
    CREATE TABLE `tab` (
      `id` int(11) NOT NULL,
      `str` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci