Comments - Problem with the Galera wsrep_sst_method xtrabackup-v2

8 years, 9 months ago Nirbhay Choubey

Do you also have "password" under the client section in your donor's my.cnf? If so, that could be the problem.

https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1505184

This issue was fixed in MariaDB 10.1.12.

 
8 years, 9 months ago Andy Ferretti

I upgraded the donor node to 10.1.12 and tested xtrabackup 2.4.1 with a full backup to disk. I did a clean install of 10.1.12 on the second node and tried to start it. I'm still getting errors. I'll paste the logs below. I then decided to try switching from wsrep_sst_method=xtrabackup-v2 to wsrep_sst_method=rsync. This worked OK and synched the second node up with the donor node. For now, I'm planning to add the third node and do some benchmarking. I plan to come back to testing with wsrep_sst_method=xtrabackup-v2. Below are the mysqld logs from the failed attempt to sync with xtrabackup:

Donor node:
2016-03-05 15:59:31 140018316211968 [Note] WSREP: (15d87d90, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2016-03-05 15:59:31 140018316211968 [Note] WSREP: declaring 3f4f9a4f at tcp://130.9.221.48:4567 stable
2016-03-05 15:59:31 140018316211968 [Note] WSREP: Node 15d87d90 state prim
2016-03-05 15:59:31 140018316211968 [Note] WSREP: view(view_id(PRIM,15d87d90,2) memb {
2016-03-05 15:59:31 140018316211968 [Note] WSREP: save pc into disk
2016-03-05 15:59:31 140018305722112 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2016-03-05 15:59:31 140018305722112 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270
2016-03-05 15:59:31 140018305722112 [Note] WSREP: STATE EXCHANGE: sent state msg: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270
2016-03-05 15:59:31 140018305722112 [Note] WSREP: STATE EXCHANGE: got state msg: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270 from 0 (zlx70605.vci.att.com)
2016-03-05 15:59:32 140018305722112 [Note] WSREP: STATE EXCHANGE: got state msg: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270 from 1 (zlx70604.vci.att.com)
2016-03-05 15:59:32 140018305722112 [Note] WSREP: Quorum results:
2016-03-05 15:59:32 140018305722112 [Note] WSREP: Flow-control interval: [23, 23]
2016-03-05 15:59:32 140018278459136 [Note] WSREP: New cluster view: global state: db762c6b-e246-11e5-90df-4adee5e6b24e:267, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
2016-03-05 15:59:32 140018278459136 [ERROR] WSREP: Process completed with error: /opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/support-files/wsrep_notify --status Synced --uuid db762c6b-e246-11e5-90df-4adee5e6b24e --primary yes --index 0 --members 15d87d90-e2ea-11e5-8755-6f8d951d7f3f/zlx70605.vci.att.com/130.9.221.46:3306,3f4f9a4f-e2eb-11e5-8059-523034986974/zlx70604.vci.att.com/130.9.221.48:3306: 1 (Operation not permitted)
2016-03-05 15:59:32 140018278459136 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/support-files/wsrep_notify --status Synced --uuid db762c6b-e246-11e5-90df-4adee5e6b24e --primary yes --index 0 --members 15d87d90-e2ea-11e5-8755-6f8d951d7f3f/zlx70605.vci.att.com/130.9.221.46:3306,3f4f9a4f-e2eb-11e5-8059-523034986974/zlx70604.vci.att.com/130.9.221.48:3306"
2016-03-05 15:59:32 140018278459136 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-03-05 15:59:32 140018371782400 [Note] WSREP: Service thread queue flushed.
2016-03-05 15:59:32 140018278459136 [Note] WSREP: Assign initial position for certification: 267, protocol version: 3
2016-03-05 15:59:32 140018371782400 [Note] WSREP: Service thread queue flushed.
2016-03-05 15:59:34 140018316211968 [Note] WSREP: (15d87d90, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-03-05 15:59:35 140018316211968 [Note] WSREP: forgetting 3f4f9a4f (tcp://130.9.221.48:4567)
2016-03-05 15:59:35 140018316211968 [Note] WSREP: Node 15d87d90 state prim
2016-03-05 15:59:35 140018316211968 [Note] WSREP: view(view_id(PRIM,15d87d90,3) memb {
2016-03-05 15:59:35 140018316211968 [Note] WSREP: save pc into disk
2016-03-05 15:59:35 140018316211968 [Note] WSREP: forgetting 3f4f9a4f (tcp://130.9.221.48:4567)
2016-03-05 15:59:35 140018305722112 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-05 15:59:35 140018305722112 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 41bd523f-e2eb-11e5-873d-a304ee11139f
2016-03-05 15:59:35 140018305722112 [Note] WSREP: STATE EXCHANGE: sent state msg: 41bd523f-e2eb-11e5-873d-a304ee11139f
2016-03-05 15:59:35 140018305722112 [Note] WSREP: STATE EXCHANGE: got state msg: 41bd523f-e2eb-11e5-873d-a304ee11139f from 0 (zlx70605.vci.att.com)
2016-03-05 15:59:35 140018305722112 [Note] WSREP: Quorum results:
2016-03-05 15:59:35 140018305722112 [Note] WSREP: Flow-control interval: [16, 16]
2016-03-05 15:59:35 140018278459136 [Note] WSREP: New cluster view: global state: db762c6b-e246-11e5-90df-4adee5e6b24e:267, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 3
2016-03-05 15:59:35 140018278459136 [ERROR] WSREP: Process completed with error: /opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/support-files/wsrep_notify --status Synced --uuid db762c6b-e246-11e5-90df-4adee5e6b24e --primary yes --index 0 --members 15d87d90-e2ea-11e5-8755-6f8d951d7f3f/zlx70605.vci.att.com/130.9.221.46:3306: 1 (Operation not permitted)
2016-03-05 15:59:35 140018278459136 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/support-files/wsrep_notify --status Synced --uuid db762c6b-e246-11e5-90df-4adee5e6b24e --primary yes --index 0 --members 15d87d90-e2ea-11e5-8755-6f8d951d7f3f/zlx70605.vci.att.com/130.9.221.46:3306"
2016-03-05 15:59:35 140018278459136 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-03-05 15:59:35 140018371782400 [Note] WSREP: Service thread queue flushed.
2016-03-05 15:59:35 140018278459136 [Note] WSREP: Assign initial position for certification: 267, protocol version: 3
2016-03-05 15:59:35 140018371782400 [Note] WSREP: Service thread queue flushed.
2016-03-05 15:59:40 140018316211968 [Note] WSREP:  cleaning up 3f4f9a4f (tcp://130.9.221.48:4567)
Second Node:
160305 15:59:25 mysqld_safe Starting mysqld daemon with databases from //opt/app/mysql/data
160305 15:59:25 mysqld_safe WSREP: Running position recovery with --log_error='//opt/app/mysql/data/wsrep_recovery.Zvw0Zn' --pid-file='//opt/app/mysql/data/zlx70604-recover.pid'
2016-03-05 15:59:25 140289610856288 [Note] Using unique option prefix 'ignore_db_dir' is error-prone and can break in the future. Please use the full name 'ignore_db_dirs' instead.
2016-03-05 15:59:25 140289610856288 [Note] //opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/bin/mysqld (mysqld 10.1.12-MariaDB) starting as process 20792 ...
160305 15:59:30 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
2016-03-05 15:59:30 139663019247456 [Note] Using unique option prefix 'ignore_db_dir' is error-prone and can break in the future. Please use the full name 'ignore_db_dirs' instead.
2016-03-05 15:59:30 139663019247456 [Note] //opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/bin/mysqld (mysqld 10.1.12-MariaDB) starting as process 20841 ...
2016-03-05 15:59:30 139663019247456 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-03-05 15:59:30 139663019247456 [Note] WSREP: wsrep_load(): loading provider library '/opt/app/mysql/product/mariadb-10.1.12-linux-x86_64/lib/galera/libgalera_smm.so'
2016-03-05 15:59:30 139663019247456 [Note] WSREP: wsrep_load(): Galera 25.3.14(r3560) by Codership Oy <[email protected]> loaded successfully.
2016-03-05 15:59:30 139663019247456 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-03-05 15:59:30 139663019247456 [Warning] WSREP: Could not open state file for reading: '//opt/app/mysql/data//grastate.dat'
2016-03-05 15:59:30 139663019247456 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-03-05 15:59:31 139663019247456 [Note] WSREP: Passing config to GCS: base_dir = //opt/app/mysql/data/; base_host = 130.9.221.48; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = //opt/app/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = //opt/app/mysql/data//galera.cache; gcache.page_size = 1G; gcache.size = 2G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.igno
2016-03-05 15:59:31 139660818175744 [Note] WSREP: Service thread queue flushed.
2016-03-05 15:59:31 139663019247456 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-03-05 15:59:31 139663019247456 [Note] WSREP: wsrep_sst_grab()
2016-03-05 15:59:31 139663019247456 [Note] WSREP: Start replication
2016-03-05 15:59:31 139663019247456 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-03-05 15:59:31 139663019247456 [Note] WSREP: protonet asio version 0
2016-03-05 15:59:31 139663019247456 [Note] WSREP: Using CRC-32C for message checksums.
2016-03-05 15:59:31 139663019247456 [Note] WSREP: backend: asio
2016-03-05 15:59:31 139663019247456 [Warning] WSREP: access file(//opt/app/mysql/data//gvwstate.dat) failed(No such file or directory)
2016-03-05 15:59:31 139663019247456 [Note] WSREP: restore pc from disk failed
2016-03-05 15:59:31 139663019247456 [Note] WSREP: GMCast version 0
2016-03-05 15:59:31 139663019247456 [Note] WSREP: (3f4f9a4f, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-03-05 15:59:31 139663019247456 [Note] WSREP: (3f4f9a4f, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-03-05 15:59:31 139663019247456 [Note] WSREP: EVS version 0
2016-03-05 15:59:31 139663019247456 [Note] WSREP: gcomm: connecting to group 'DBAAS_ECLOUD_CLUST01', peer '130.9.221.46:,130.9.221.48:,130.9.220.190:'
2016-03-05 15:59:31 139663019247456 [Warning] WSREP: (3f4f9a4f, 'tcp://0.0.0.0:4567') address 'tcp://130.9.221.48:4567' points to own listening address, blacklisting
2016-03-05 15:59:31 139663019247456 [Note] WSREP: (3f4f9a4f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2016-03-05 15:59:31 139663019247456 [Note] WSREP: declaring 15d87d90 at tcp://130.9.221.46:4567 stable
2016-03-05 15:59:31 139663019247456 [Note] WSREP: Node 15d87d90 state prim
2016-03-05 15:59:31 139663019247456 [Note] WSREP: view(view_id(PRIM,15d87d90,2) memb {
2016-03-05 15:59:31 139663019247456 [Note] WSREP: save pc into disk
2016-03-05 15:59:31 139663019247456 [Note] WSREP: discarding pending addr without UUID: tcp://130.9.220.190:4567
2016-03-05 15:59:32 139663019247456 [Note] WSREP: gcomm: connected
2016-03-05 15:59:32 139663019247456 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-03-05 15:59:32 139663019247456 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-03-05 15:59:32 139663019247456 [Note] WSREP: Opened channel 'DBAAS_ECLOUD_CLUST01'
2016-03-05 15:59:32 139660745500416 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2016-03-05 15:59:32 139660745500416 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-03-05 15:59:32 139660745500416 [Note] WSREP: STATE EXCHANGE: sent state msg: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270
2016-03-05 15:59:32 139660745500416 [Note] WSREP: STATE EXCHANGE: got state msg: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270 from 0 (zlx70605.vci.att.com)
2016-03-05 15:59:32 139663019247456 [Note] WSREP: Waiting for SST to complete.
2016-03-05 15:59:32 139660745500416 [Note] WSREP: STATE EXCHANGE: got state msg: 3fa17dbd-e2eb-11e5-a1c4-23cb5b5ae270 from 1 (zlx70604.vci.att.com)
2016-03-05 15:59:32 139660745500416 [Note] WSREP: Quorum results:
2016-03-05 15:59:32 139660745500416 [Note] WSREP: Flow-control interval: [23, 23]
2016-03-05 15:59:32 139660745500416 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 267)
2016-03-05 15:59:32 139660721907456 [Note] WSREP: State transfer required:
2016-03-05 15:59:32 139660721907456 [Note] WSREP: New cluster view: global state: db762c6b-e246-11e5-90df-4adee5e6b24e:267, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
2016-03-05 15:59:32 139660721907456 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-03-05 15:59:32 139660709852928 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '130.9.221.48' --datadir '//opt/app/mysql/data/'   --parent '20841' --binlog '//opt/app/mysql/binlogs/mysql-bin' '
2016-03-05 15:59:32 139660709852928 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '130.9.221.48' --datadir '//opt/app/mysql/data/'   --parent '20841' --binlog '//opt/app/mysql/binlogs/mysql-bin'
2016-03-05 15:59:32 139660709852928 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '130.9.221.48' --datadir '//opt/app/mysql/data/'   --parent '20841' --binlog '//opt/app/mysql/binlogs/mysql-bin' : 1 (Operation not permitted)
2016-03-05 15:59:32 139660721907456 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2016-03-05 15:59:32 139660721907456 [ERROR] Aborting
 
8 years, 8 months ago Ian Gilfillan

If you still didn't find the solution I suggest reporting as a bug - they can see if it's repeatable at least, and if not perhaps find help find out why it's failing in your case.

 
8 years, 8 months ago Andy Ferretti

Just want to let you know that we did solve the problem. I will be repeating the tests that I performed with rsync using xtrabackup now. To fix the xtrabackup problem, we had two issues:

  1. . socat install problem. We went back a level to 1.7.2.4 and it installed OK.
  2. . We have MariaDB and xtrabackup installed in non-default locations. We had to setup environmental variables in wsrep_sst_xtrabackup-v2. We added a couple lines
    1. $(dirname $0)/wsrep_sst_common << after this line we added 2 lines below:
## Execute environmental variables for MySQL
. ${HOME}/.local 

Here are the contents of our .local file:

$ cat .local
export LD_LIBRARY_PATH=${HOME}/local/lib/
export LD_LIBRARY_PATH=:${HOME}/local/lib/
export LD_LIB_PATH=${HOME}/local/lib/
export LD_LIB_PATH=:${HOME}/local/lib/
export MB=//opt/app/mysql/product/percona-xtrabackup-2.4.1-Linux-x86_64
export MD=//opt/app/mysql/data
export MH=//opt/app/mysql/product/mariadb-10.1.12-linux-x86_64
export ML=//opt/app/mysql/log
export MYSQL_BACKUP_HOME=//opt/app/mysql/product/percona-xtrabackup-2.4.1-Linux-x86_64
export MYSQL_HOME=//opt/app/mysql/product/mariadb-10.1.12-linux-x86_64
export MYSQL_LOCAL_HOME=//opt/app/mysql/local
export PATH=${HOME}/local/bin:$PATH
export PATH=${MYSQL_BACKUP_HOME}/bin:${PATH}
export PATH=${MYSQL_HOME}/bin:${PATH}
export PATH=${MYSQL_LOCAL_HOME}/bin:${PATH}
export PYTHONPATH=${HOME}/local/lib
 
8 years, 8 months ago Andy Ferretti

OK, as information, I've been doing performance testing with the Galera cluster that I created using rsync instead of xtrabackup. I do want to get back to the xtrabackup. I noticed that the socat utility that is required for xtrabackup was not correctly installed. I'm not sure if this might be my problem. In order to test xtrabackup, I have to tear down all nodes except primary and rebuild with xtrabackup. Since I'm in the middle of my performance testing, I'll have to wait before I can get back to the xtrabackup testing. When I get back to the xtrabackup testing and if I cannot fix the xtrabackup problem by fixing the socat install, then I'll submit a bug as you suggest.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.