Replication slave loses connection and does not recover

A recurring problem on a replicated MariaDB 10.3.8 instance (running on Ubuntu Bionic):

2018-08-10  8:00:06 84104 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-08-10  8:00:06 84104 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.029418' at position 416926994
2018-08-10  8:03:38 112025 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000092, 65535596)
2018-08-10  8:03:38 51 [Warning] Aborted connection 51 to db: 'unconnected' user: '***' host: '***' (A slave with the same server_uuid/server_id as this slave has co)
2018-08-10  8:25:52 84104 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-08-10  8:25:52 84104 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.029418' at position 442288020
2018-08-10  8:26:11 84104 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-08-10  8:26:11 84104 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.029418' at position 444689776
2018-08-10  8:28:22 84104 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-08-10  8:28:22 84104 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.029418' at position 449388134
2018-08-10  8:29:13 84104 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-08-10  8:29:13 84104 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.029418' at position 449972531
2018-08-10  8:29:23 84104 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-08-10  8:29:23 84104 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.029418' at position 457783200
2018-08-10  8:29:23 84104 [ERROR] Slave IO thread did not receive an expected Rows-log end-of-statement for event starting at log 'mysql-bin.029418' position 451205827 whose last block was seen at log 'mysql-bin.029418' position 457775009. The end-of-statement should have been delivered before the current one at log 'mysql-bin.029418' position 457783200
2018-08-10  8:29:23 84104 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595
2018-08-10  8:29:23 84104 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.029418', position 457783200

Note that despite the message about "A slave with the same server_uuid/server_id as this slave...", all nodes /do/ have their own unique server_id.

Manually restarting the replication (with STOP SLAVE; START SLAVE) is enough to get replication running again.

Any ideas what might cause this issue?

Answer Answered by Ian Light in this comment.

Symptom: Starting the Slave manually causes replication to initially succeed (for a minute or two), but the Slave subsequently stops after reporting a failure to connect to the Master, similar to the error below:

2018-09-17 16:03:12 13 [Warning] Slave I/O: Notifying master by SET @master_binlog_checksum= @@global.binlog_checksum failed with error: Unknown system variable 'binlog_checksum', Internal MariaDB error code: 1193
2018-09-17 16:04:12 13 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)

Double Checks: Confirm that both Master and Slave are using the same character sets. This may not be obvious if you are replicating between a much older version of MySql (let's say 5.1) and a current MariaDB 10.3.X instance. The older instance of MySql may not be aware of the utf8mb4 versus utf8 issue. utf8mb4 may be the default in newer MySql and MariaDB instances, whereas utf8 may be the default in older versions.

Remediation Example: If replicating between a Master with a much older version of MySql, and a Slave using a newer MariaDB (or MySql) instance, in your my.ini (or equivalent), make sure that you comment out the utf8mb4, and explicitly set the utf8 options on the Slave.

i.e.comment out the use of utf8mb4 on the slave IF the server is using utf8
# character-set-server=utf8mb4
# collation-server=utf8mb4_general_ci
character-set-server=utf8
collation-server=utf8_general_ci

If replicating between newer servers, please just confirm that the character sets in use are the same on Master and Slave (in this instance, it's preferable for both servers to use utf8mb4.

The key is to make sure that the Slave is configured to use the same character set that the Master is using.

Post-Mortem Considerations:Consider using utf8mb4 on both Master and Slave.
If this is not possible due to version constraints, then start the process of designing and scheduling a testing program to facilitate an upgrade of the Master to a newer version (preferably MariaDB).

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.