Comments - Connecting

7 months, 4 weeks ago Daniël van Eeden

I believe the "client's default character set and collation" in the "Handshake Response Packet" is an "int<2>" instead of an "int<1>". So 2 bytes instead of one.

 
7 months, 4 weeks ago Vladislav Vaintroub

I think it is one byte, MySQL calls it "int<1> character_set client charset a_protocol_character_set, only the lower 8-bits" in its documentation

https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase_packets_protocol_handshake_response.html#sect_protocol_connection_phase_packets_protocol_handshake_response41

 
7 months, 3 weeks ago Daniël van Eeden

That info is not correct. I've created this PR for that: https://github.com/mysql/mysql-server/pull/541

I think this originally (way back when) might have been 1 byte with the byte before it being reserved and always 0x0. But that never changed while connectors etc. started to use 2 bytes.

 
7 months, 3 weeks ago Vladislav Vaintroub

Actually, my reading of MySQL (8.0.37) code is that neither the client nor the server use 2 byte charsets, they both use 1 byte.

I did not debug, but I doubt debugging will change what's in there

https://github.com/mysql/mysql-server/blob/mysql-8.4.0/sql/auth/sql_authentication.cc#L2975

charset_code = (uint)(uchar) * (end + 8);

https://github.com/mysql/mysql-server/blob/mysql-8.4.0/sql/auth/sql_authentication.cc#L3051

ssl_charset_code = (uint)(uchar) * ((char *)protocol->get_net()->read_pos + 8);

Both SSL and non-SSL case are 1 byte, initialized from unsigned char, read from position 8.

 
7 months, 3 weeks ago Vladislav Vaintroub

I do not know which connectors started to use 2 bytes.

So far, all documentation so far said 1 byte, and wireshark in your PR also says "1 byte". Apparently, they would not work with MariaDB server correctly, that does not read 2 bytes from that packet.

Usually, protocol changes are not done this quietly, and need to be announced, and documented before release. Otherwise, none of 3rd party connectors or protocol-compatible servers would be prepared to handle that.

 
7 months, 3 weeks ago Daniël van Eeden

Thanks for looking into this. It indeed looks like you are right and the server only reads one byte. The problem with this is that with collations >255 this means the server can't pick the correct collation. This might be solved by clients by sending SET NAMES (which Connector/Python at least in some cases seems to do twice...). However that means this field becomes useless and at least one extra roundtrip is needed. This would mean the protocol needs to be updated to avoid the roundtrip. And while I agree that this should be done in a official way, the thing that Connector/Python does would be an obvious solution. Other options are to put the second byte elsewhere in the protocol.

I think this issue isn't affecting many people as collations >255 aren't very common and SET NAMES also hides this issue.

I've updated the bugreport with MySQL. Let's see what they have to say about this.

 
7 months, 3 weeks ago Vladislav Vaintroub

I think that an extension like you describe might make sense, but there needs to be a consensus, and server should signal the client that it understands whatever client sends, once it understands it. Usually, capability bits are used for this kind of signal, but MySQL is running out of these bits, and needs some extension.

Otherwise, I believe any form of utf8mb4 is good enough for majority of clients and their use cases, e.g utf8mb4_bin. The connection's collation only affects comparison of literal strings, which is not what everyone would need from database, and for rare cases, SET NAMES could be appropriate.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.