Comments - Introduction to State Snapshot Transfers (SSTs)

4 years, 3 months ago Testing User A Test

I have 2-node cluster running on MariaDB 10.4.12 and Galera 4 v26.4.3. Actually they are running smoothly as a cluster. After reboot both servers, I started node 1 using /usr/bin/galera_new_cluster command, starting node 2 with command 'systemctl start mariadb'. But node 2 could not join the cluster, with the following error messages on /var/log/messages:

May 20 15:54:18 uodbdb2 mysqld: 2020-05-20 15:54:18 1 [Note] WSREP: GCache history reset: old(8a371dc2-6762-11ea-b859-274830d9bb21:1261 -> 8a371dc2-6762-11ea-b859-274830d9bb21:1324 May 20 15:54:18 uodbdb2 mysqld: 2020-05-20 15:54:18 1 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset May 20 15:54:19 uodbdb2 mysqld: 2020-05-20 15:54:19 0 [Warning] WSREP: 1.0 (JOAL_node1): State transfer to 0.0 (JOAL_node2) failed: -255 (Unknown error 255) May 20 15:54:19 uodbdb2 mysqld: 2020-05-20 15:54:19 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1178: Will never receive state. Need to abort. May 20 15:54:19 uodbdb2 mysqld: 2020-05-20 15:54:19 0 [Note] WSREP: gcomm: terminating thread May 20 15:54:19 uodbdb2 mysqld: 2020-05-20 15:54:19 0 [Note] WSREP: gcomm: joining thread May 20 15:54:19 uodbdb2 mysqld: 2020-05-20 15:54:19 0 [Note] WSREP: gcomm: closing backend May 20 15:54:20 uodbdb2 mysqld: 2020-05-20 15:54:20 0 [Note] WSREP: view(view_id(NON_PRIM,1b90d2ca,62) memb { May 20 15:54:20 uodbdb2 mysqld: 1b90d2ca,0 May 20 15:54:20 uodbdb2 mysqld: } joined { May 20 15:54:20 uodbdb2 mysqld: } left { May 20 15:54:20 uodbdb2 mysqld: } partitioned { May 20 15:54:20 uodbdb2 mysqld: 834dc803,0 May 20 15:54:20 uodbdb2 mysqld: }) May 20 15:54:20 uodbdb2 mysqld: 2020-05-20 15:54:20 0 [Note] WSREP: PC protocol downgrade 1 -> 0 May 20 15:54:20 uodbdb2 mysqld: 2020-05-20 15:54:20 0 [Note] WSREP: view((empty)) May 20 15:54:20 uodbdb2 mysqld: 2020-05-20 15:54:20 0 [Note] WSREP: gcomm: closed May 20 15:54:20 uodbdb2 mysqld: 2020-05-20 15:54:20 0 [Note] WSREP: /usr/sbin/mysqld: Terminated. May 20 15:54:20 uodbdb2 systemd: mariadb.service: main process exited, code=killed, status=6/ABRT May 20 15:54:20 uodbdb2 mysqld: Terminated May 20 15:54:20 uodbdb2 mysqld: WSREP_SST: [INFO] Joiner cleanup. rsync PID: 10392 (20200520 15:54:20.647) May 20 15:54:21 uodbdb2 rsyncd[10392]: sent 0 bytes received 0 bytes total size 0 May 20 15:54:21 uodbdb2 mysqld: WSREP_SST: [INFO] Joiner cleanup done. (20200520 15:54:21.155)

What's wrong?

 
4 years, 3 months ago Geoff Montee

Hi,

This is an important clue:

2020-05-20 15:54:20 0 [Note] WSREP: /usr/sbin/mysqld: Terminated.
May 20 15:54:20 uodbdb2 systemd: mariadb.service: main process exited, code=killed, status=6/ABRT 

It appears that the mysqld process was killed by systemd. To check for sure, execute this:

sudo journalctl -u mariadb

This usually happens because your SST timed out. You most likely have to increase the systemd timeout. e.g.:

sudo tee /etc/systemd/system/mariadb.service.d/timeoutsec.conf <<EOF
[Service]

TimeoutStartSec=0
TimeoutStopSec=0
EOF
sudo systemctl daemon-reload

See here: https://mariadb.com/kb/en/systemd/#configuring-the-systemd-service-timeout

 
4 years, 3 months ago Testing User A Test

Thank you for your advice. I didn't update and change any clustering configuration or timeoutsec.conf. I found there are selinux issues and got them fixed. Then node 2 could successfully join the cluster.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.