Comments - Manual SST of Galera Cluster Node With Mariabackup
Content reproduced on this site is the property of its respective owners,
and this content is not reviewed in advance by MariaDB. The views, information and opinions
expressed by this content do not necessarily represent those of MariaDB or any other party.
I prefer to use pure scp, it is faster and straightforward, no need to play with grastate.dat. Something like this:
Here we assume that we have one node (node1) running, started with galera_new_cluster. All other nodes are stopped. And MaxScales stopped
Perform this on all stopped nodes 2-5: Wipe datadirs on those stopped nodes (nodes 2-5), NOT on that running node! (node1). Also on slave if existing.
Perform this on all stopped nodes 2-5: Give permissions for your account to /var/lib/mysql. This is needed because root cannot scp files between the nodes. Use id <loginname> too see your group. Also on slave if existing.
Stop running node (node1)
Start screens as user root for the scp’s on that node (node1) which was running Galera. You may want to utilise 4 terminals.
And on every screen:
Wait until scp’s are finished. (watch -n 30 -d /tmp/)
Clear logs on node 2-5 if writing logs to /var/lib/mysql. Also on slave if existing
Set permissions back for mysql on nodes 2-5, also on slave if existing
Bootstrap Galera Cluster from node (node1)
Start other nodes (2-5) one-by-one. Wait that node has joined to the cluster with IST before proceeding to start next node.
Start MaxScales Start replication
SST is a big headache. Although i've been able to follow the entire procedure (after some modifications because you assume that people use standard folders, standard hosts and standard ports) joiners kept asking data from donor, erasing data folder. I probably had strange problems with network. At the end I went back to rsync sst restarting servers one by one. One node is retriving data (I hope it wont stop in the middle of the phase)
i write here the problem that i had because i'm pretty sure someone could have the same problem and it can be useful: When you have a big database, invoking a service restart on joiner nodes from a remote ssh shell can cause internally a timeout during the sync phase. I've solved using a remote desktop service