Unicode

Explore MariaDB's support for Unicode, covering the differences between the utf8mb3 and utf8mb4 character sets for multi-byte storage.

Unicode is a standard for encoding text across multiple writing systems. MariaDB supports a number of character sets for storing Unicode data:

Character Set
Description

ucs2

UCS-2, each character is represented by a 2-byte code with the most significant byte first. Fixed-length 16-bit encoding.

utf8

utf8 is an alias for utf8mb3, but this can changed to ut8mb4 by changing the default value of the old_mode system variable.

utf8mb3

UTF-8 encoding using one to three bytes per character. Basic Latin letters, numbers and punctuation use one byte. European and Middle East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese ideographs use 3-bytes. No supplementary characters are stored. Until MariaDB 10.5, this was an alias for utf8. From MariaDB 10.6arrow-up-right, utf8 is by default an alias for utf8mb3, but this can changed to ut8mb4 by changing the default value of the old_mode system variable.

utf8mb4

UTF-8 encoding the same as utf8mb3 but which stores supplementary characters in four bytes.

utf16

UTF-16, same as ucs2, but stores supplementary characters in 32 bits. 16 or 32-bits.

utf32

UTF-32, fixed-length 32-bit encoding.

Support for the UCA-14.0.0 collations was added in MariaDB 10.10 (MDEV-27009arrow-up-right).

Support for the MySQL 8.0 UCA-9-0-0 (utf8mb4_0900_...) collations will be added to MariaDB 11.4.5arrow-up-right.

This page is licensed: CC BY-SA / Gnu FDL

spinner

Last updated

Was this helpful?