Adding collation to utf8mb4 charset
Hello experts,
i wonder how to create a custom collation for utf8mb4-charsets:
If you want to add a custom collation in mysql/mariaDB, for utf-8 charsets you can modify .../charsets/Index.xml and extend the charset with the LDML-Syntax:
<charset name="utf8"> ... <collation name="utf8_myown_ci" id="1234"> <rules> <reset>\u0000</reset> <i>\u0020</i> ... </rules> </collation> ... </charset>
But there is not charset-tag with name "utf8mb4". So I created one with name="utf8mb4" and added the base collation tags and my own collation.
<charset name="utf8mb4"> <family>Unicode</family> <description>UTF-8 MB4 Unicode</description> <collation name="utf8mb4_general_ci" id="45"> <flag>primary</flag> <flag>compiled</flag> </collation> <collation name="utf8mb4_bin" id="46"> <flag>binary</flag> <flag>compiled</flag> </collation> <collation name="utf8mb4_myown_ci" id="213"> </collation> </charset>
In phpmyadmin i could choose the newly created collation. But i couldn't inserts four byte characters; i get the error
"#1366 - Incorrect string value: '\xF0\x9F\x8D\xB5\xF0\x9F...' for field ..."
(with a built-in mb4-collation like utf8mb4_general_ci it works).
To be more precise: I have one column (a) with the bulit-in collation utf8mb4_general_ci and one column (b) with my own collation utf8mb4_myown_ci(defined in Index.xml). I insert the same data in both columns and in column a there is no error and in column b i'll get the error as described above.
It seems to be no problem to have the collation-tag empty, because i created an empty utf8_myown_ci inside charset="utf-8" and this works.
In the column with utf8mb4_myown_ci i can also insert 3 Byte Chars, so it seems it is interpreted as an utf8 collation.
I tried google multiple times and didn't find anything here, but i couldn't find any hints, how to add collations to charsets, which aren't present in Index.xml.
Any Ideas how to do it? Thank you for any hints!
Answer
Turns out, i used an occupied collation-ID. If i use e.g. 501 instead of 213, it works.