Hybrid replication - issue

Hi, I have been trying to implement a Hybrid replication. There is Async replication configured between two galera clusters. To begin with I have been trying this out on two datacenters dc1 and dc2.

dc1 has two node galera cluster -- node1_dc1 , node2_dc1

dc2 has one node - just one primary galera node . node1_dc2

I have configured async replication between node1_dc1 and node1_dc2. But I am facing strange problem here. I could see replication happening nicely on all the three nodes involved. But When i tried to restart node2_dc1 ,(the secondary node from dc1's galera cluster) - i am not able to login to this node's mysql with root user. Or any user for that matter. Upon investigation using skip grant, found that the mysql.user table is emptied. (You can refer to the attached logs of node2_dc1 where I have highlighted removal messages in log). My doubts/concerns - 1) I am not sure what could have triggered this mysql table's removal in first place. 2) Inspite of this - I could get this node join the cluster. (Atleast as per the wsrep_cluster_size attribute on primary node)

sharing my configurations on all the three nodes - 1) node1_dc1 (Primary Node) galera_cnf : [mysqld]

  1. mysql settings binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 query_cache_size=0 query_cache_type=0 bind-address=0.0.0.0 log_slave_updates=1
  1. galera settings wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_name=london_galera_cluster wsrep_cluster_address=gcomm: wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="root:XXX" wsrep_restart_slave=1
  2. Galera Node Configuration wsrep_node_address="192.168.XXX.YYY" wsrep_node_name="192.168.XXX.YYY"
    • wan.cnf:

[mysqld] server-id=101 gtid-domain-id=1 binlog-format=ROW log-slave-updates=1 log-bin=binlog wsrep-restart-slave=1 slave-skip-errors=1396

2) node2_dc1 (Secondary node): galera.cnf:

[mysqld]

  1. mysql settings binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 query_cache_size=0 query_cache_type=0 bind-address=0.0.0.0 log_slave_updates=1
  1. galera settings wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_name=london_galera_cluster wsrep_cluster_address=gcomm:192.168.XXX.YYY,192.168.YYY.ZZ wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="root:XXX" wsrep_restart_slave=1
  2. Galera Node Configuration wsrep_node_address="192.168.YYY.ZZ" wsrep_node_name="192.168.YYY.ZZ"

wan.cnf: [mysqld] server-id=102 gtid-domain-id=1 binlog-format=ROW log-slave-updates=1 log-bin=binlog wsrep-restart-slave=1 slave-skip-errors=1396

Node1_dc2 - Single node on datacenter-2 : galera.cnf:

[mysqld]

  1. mysql settings binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 query_cache_size=0 query_cache_type=0 bind-address=0.0.0.0 log_slave_updates=1
  1. galera settings wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_name=calif_galera_cluster wsrep_cluster_address=gcomm: wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="root:XXX wsrep_restart_slave=1
  2. Galera Node Configuration wsrep_node_address="192.168.ZZZ.XXX" wsrep_node_name="192.168.ZZZ.XXX"

wan_repl.cnf: [mysqld] server-id=201 gtid-domain-id=1 binlog-format=ROW log-slave-updates=1 log-bin=binlog wsrep-restart-slave=1 slave-skip-errors=1396 ~

Note unique server-ids and common gtid-domain-id.

Attaching logs of both the nodes in galera cluster in datacenter 1.

lines to look for in node2.txt - file *** Sep 10 21:06:21 localhost mysqld: 150910 21:06:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0) Sep 10 21:06:21 localhost mysqld: 150910 21:06:21 [Note] WSREP: Requesting state transfer: success, donor: 0 Sep 10 21:06:22 localhost mysqld: WSREP_SST: [INFO] Proceeding with SST (20150910 21:06:22.394) Sep 10 21:06:22 localhost mysqld: WSREP_SST: [INFO] Cleaning the existing datadir (20150910 21:06:22.400) Sep 10 21:06:22 localhost mysqld: removed ?@X/var/lib/mysql/mysql_upgrade_info?@Y Sep 10 21:06:22 localhost mysqld: removed ?@X/var/lib/mysql/multi-master.info?@Y Sep 10 21:06:22 localhost mysqld: removed ?@X/var/lib/mysql/mysql/procs_priv.MYI?@Y Sep 10 21:06:22 localhost mysqld: removed ?@X/var/lib/mysql/mysql/user.MYI?@Y *** Sep 10 21:25:51 localhost /etc/mysql/debian-start[16787]: Triggering myisam-recover for all MyISAM tables Sep 10 21:25:53 localhost mysqld: 150910 21:25:53 [Note] WSREP: (81dce617, 'tcp:0.0.0.0:4567') turning message relay requesting off Sep 10 21:32:01 localhost mysqld: 150910 21:32:01 [ERROR] Slave SQL: Error 'Can't find any matching row in the user table' on query. Default database: 'mysql'. Query: 'SET PASSWORD FOR 'root'@'localhost'='*FD9D25929BE2AE3A18896D5D93AB9A7DD78D22E4'', Internal MariaDB error code: 1133 Sep 10 21:32:01 localhost mysqld: 150910 21:32:01 [Warning] WSREP: RBR event 1 Query apply warning: 1, 19504 ****

Answer Answered by Daniel Black in this comment.

Its not hybrid, it just sync and async are used.

a two node cluster is susceptible to split brain - don't do it.

a single node isn't a cluster so don't pretend it is.

The direction of your async replication isn't clear. If it goes in both you can't write to both at the same time.

You could just make a 3 node cluster across the WAN and you may not have any troubles at all. Don't make assumptions that it won't work.

How you initialised the slave isn't clear. Guessing a restore from mysqldump into a dc1 node that wasn't the replication master. However that occurred, taking a dump of nodeX_dc1 and loading into dc2 will probably clear your problem.

This more of an architectural mess/fail question that is better asked on a mailing list rather an asking a specific question on an existing article or functionality that would benefit users of this wiki.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.