Comments - Connecting

7 months ago Daniël van Eeden

Thanks for looking into this. It indeed looks like you are right and the server only reads one byte. The problem with this is that with collations >255 this means the server can't pick the correct collation. This might be solved by clients by sending SET NAMES (which Connector/Python at least in some cases seems to do twice...). However that means this field becomes useless and at least one extra roundtrip is needed. This would mean the protocol needs to be updated to avoid the roundtrip. And while I agree that this should be done in a official way, the thing that Connector/Python does would be an obvious solution. Other options are to put the second byte elsewhere in the protocol.

I think this issue isn't affecting many people as collations >255 aren't very common and SET NAMES also hides this issue.

I've updated the bugreport with MySQL. Let's see what they have to say about this.

 
7 months ago Vladislav Vaintroub

I think that an extension like you describe might make sense, but there needs to be a consensus, and server should signal the client that it understands whatever client sends, once it understands it. Usually, capability bits are used for this kind of signal, but MySQL is running out of these bits, and needs some extension.

Otherwise, I believe any form of utf8mb4 is good enough for majority of clients and their use cases, e.g utf8mb4_bin. The connection's collation only affects comparison of literal strings, which is not what everyone would need from database, and for rare cases, SET NAMES could be appropriate.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.