Charset Narrowing optimization
Charset Narrowing optimization handles equality comparisons like:
utf8mb3_key_column=utf8mb4_expression
It enables the optimizer to construct ref
access to utf8mb3_key_column
based on this equality. The optimization supports comparisons of columns that use utf8mb3_general_ci
to expressions that use utf8mb4_general_ci
.
The optimization was introduced in MariaDB 10.6.16, MariaDB 10.10.7, MariaDB 10.11.6, MariaDB 11.0.4, MariaDB 11.1.3, MariaDB 11.2.2 where it is OFF by default, and in MariaDB 11.3.1, where it is ON by default.
Details
MariaDB supports both utf8mb3 and utf8mb4 character sets. It is possible to construct join queries that compare columns that use different collations.
Suppose, we have table users that uses utf8mb4:
create table users ( user_name_mb4 varchar(100) collate utf8mb4_general_ci, ... );
and table orders that uses utf8mb3:
create table orders ( user_name_mb3 varchar(100) collate utf8mb3_general_ci, ..., INDEX idx(user_name_mb3) );
One can join users to orders on user_name:
select * from orders, users where orders.user_name_mb3=users.user_name_mb4;
internally the optimizer will handle the equality by converting the UTF8MB3 value into UTF8MB4 and then doing the comparison. In EXPLAIN FORMAT=JSON or Optimizer Trace one can see that the WHERE clause becomes:
convert(orders.user_name_mb3 using utf8mb4) = users.user_name_mb4
this produces the expected result but the query optimizer is not able to use the index over orders.user_name_mb3
to find matches for values of users.user_name_mb4
.
The EXPLAIN of the above query looks like this:
MariaDB > explain select * from orders, users where orders.user_name_mb3=users.user_name_mb4; +------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+ | 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | | | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 10330 | Using where; Using join buffer (flat, BNL join) | +------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
Charset Narrowing optimization enables the optimizer to perform the comparison between MB3 and MB4 by "narrowing" the UTF8MB4 value to UTF8MB3.
MariaDB > set optimizer_switch='cset_narrowing=on'; Query OK, 0 rows affected (0.000 sec) MariaDB > explain select * from orders, users where orders.user_name_mb3=users.user_name_mb4; +------+-------------+--------+------+---------------+---------------+---------+---------------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+------+---------------+---------------+---------+---------------------------+------+-----------------------+ | 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | SIMPLE | orders | ref | user_name_mb3 | user_name_mb3 | 303 | test2.users.user_name_mb4 | 1 | Using index condition | +------+-------------+--------+------+---------------+---------------+---------+---------------------------+------+-----------------------+ 2 rows in set (0.001 sec)
Controlling the optimization
The optimization is controlled by an optimizer_switch
flag. Specify
set optimizer_switch='cset_narrowing=on';
to enable the optimization.
References
- MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref access
- Blog post: Making “tbl.utf8mb3_key_column=utf8mb4_expr” sargable