Node Joining Failure When wsrep_sst_method=mariabackup

Hello,

Simplest possible test environment for evaluating MariaDB clustering as follows (aka Steps to Reproduce):

Two fresh installs of CentOS 7 minimal, updated to all latest packages. Remove all firewall rules (non-internet facing test servers).

Imported MariaDB yum repo details.

yum install MariaDB-server MariaDB-backup perl-DBD-MySQL socat

As of this writing, MariaDB-server and MariaDB-backup vs 10.3.8 gets installed, all to default locations.

/etc/my.cnf.d.server.cnf on both machines include:

wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://10.0.0.1,10.0.0.2
wsrep_sst_method=mariabackup
wsrep_sst_auth=******:******

Bring up first MariaDB server with: galera_new_cluster

Create SST user on first MariaDB server using the same credentials specified above for wsrep_sst-auth:

GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT, SUPER ON *.* TO ******@'localhost' identified by '******';

Bring up second MariaDB server with: systemctl start mariadb

File innobackup.backup.log on first MariaDB (the donor) includes lines such as:

Straming ./mysql/event.MYD to STDOUT
...done

Last line in innobackup.backup.log on that machine is: completed OK!

Second server still fails to join cluster. Log file on second MariaDB server ends with:

WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql/bin as well (20180723 22:16:40.849)
removed ‘/var/log/mysql/bin/mariadb-master-bin.000001’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20180723 22:16:40.855)
2018-07-23 22:16:41 0 [Note] WSREP: (11d8586e, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-07-23 22:16:52 0 [Note] WSREP: 1.0 (mariadb1.local): State transfer to 0.0 (mariadb2.local) complete.
2018-07-23 22:16:52 0 [Note] WSREP: Member 1.0 (mariadb1.local) synced with group.
WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20180723 22:16:52.240)
WSREP_SST: [INFO] Evaluating mariabackup --innobackupex   --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20180723 22:16:52.244)
rm: cannot remove ‘/var/lib/mysql//innobackup.prepare.log’: No such file or directory
rm: cannot remove ‘/var/lib/mysql//innobackup.move.log’: No such file or directory
WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20180723 22:16:52.644)
WSREP_SST: [INFO] Evaluating mariabackup --innobackupex       --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20180723 22:16:52.647)
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20180723 22:16:52.662)
2018-07-23 22:16:52 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '10.0.0.2' --datadir '/var/lib/mysql/'   --parent '5115' --binlog '/var/log/mysql/bin/mariadb-master-bin' : 1 (Operation not permitted)
2018-07-23 22:16:52 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2018-07-23 22:16:52 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2018-07-23 22:16:52 0 [ERROR] Aborting

No [ERROR] lines on the first MariaDB server at all.

Have been running MariaDB clusters with rsync SST method for years just fine. Above example using rsync SST method instead works perfectly. Cannot get even this most simple test scenario to work with mariaback method. Grateful for any suggestions please.

Answer Answered by Geoff Montee in this comment.

First, setting the SST user's GRANT line to include the IP addresses instead of localhost. Previously, I was using 'sst'@'localhost' as is the normal/basic setup. However, MariaBackup seems to require an IP address for each node (e.g. Node1 needs a grant for 'sst'@'node2', 'sst'@'node3', etc). As the current setup has a private RFC1918 /24 subnet available, I used 'sst'@'10.20.30.%' for the GRANT.

This is not correct. The 'sst'@'localhost' user account on the donor node is sufficient. If this were a privilege issue, then your SST would have failed on the donor node--not the joiner node. See here:

https://mariadb.com/kb/en/library/mariabackup-sst-method/#authentication-and-privileges

Finally, the 'datadir' entry under [mysqld] MUST exist or MariaBackup will make a mess of itself.

This is most likely what caused your problems. However, keep in mind that this bug has been fixed in MariaDB 10.3.10 and later. See here:

https://mariadb.com/kb/en/library/mariabackup-overview/#no-default-datadir

No [ERROR] lines on the first MariaDB server at all.

In the future, remember to check the SST logs on the donor node and the joiner node! See here:

https://mariadb.com/kb/en/library/mariabackup-sst-method/#logs

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.