Simple cluster de dos nodos.
Buenas comunidad vengo a solicitar la ayuda de ustedes, tengo un simple caso de uso en que tengo dos nodos, con los cuales quiero simular una especie de failover con un script que paso a mostrar.
----------------------------------------------------------------------------------------------------------------------------------
- !/bin/bash
if [ $1 = "start" ] then
PidMaria=$(ps -ef | grep wsrep_cluster | grep -v grep | awk {'print $2'}) Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'}) TipoPid=$(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1") TipoPidLocal=$(ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1)
if [ -z $PidMaria ] then
- sleep 20 Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'}) if [ -z $Server2 ] then if [ -z $TipoPid ] then if [ -z $(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep " | awk {'print $2'}) ] then /usr/sbin/mysqld --wsrep_cluster_address=gcomm: --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi else /usr/sbin/mysqld --wsrep_cluster_address=gcomm:ipnodo2 --wsrep_sst_auth=root:root --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi else if [ -z $TipoPid ] then /usr/sbin/mysqld --wsrep_cluster_address=gcomm:ipnodo2 --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi fi
else if [ -z $Server2 ] then if [ -z $TipoPidLocal ] then echo "hola" else kill -9 $PidMaria /usr/sbin/mysqld --wsrep_cluster_address=gcomm: --wsrep_sst_auth=root:root --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi else if [ -z $PidMaria ] then /usr/sbin/mysqld --wsrep_cluster_address=gcomm:ipnodo2 --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi fi
fi fi --------------------------------------------------------------------------------------------------------------------------------
Este script lo que hace es simple, esta croneado cada 1 minuto, si el master se cae, mata el pid local (slave) y lo levanta como master, lo que haría en el nodo1 es lo mismo, solo que al estar caído y detectar que el otro server esta como master levantarse así mismo como slave, acá el error.
--------------------------------------------------------------------------------------------------------------------------- + /usr/sbin/mysqld --wsrep_cluster_address=gcomm:IPMASTER --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so 131211 20:04:35 [Note] WSREP: Read nil XID from storage engines, skipping position init 131211 20:04:35 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 131211 20:04:35 [Note] WSREP: wsrep_load(): Galera 23.2.7(r157) by Codership Oy <[email protected]> loaded succesfully. 131211 20:04:35 [Note] WSREP: Found saved state: 1503cc31-6281-11e3-abfc-5bf96ca010d8:-1 131211 20:04:35 [Note] WSREP: Reusing existing '/var/lib/mysqlgalera.cache'. 131211 20:04:35 [Note] WSREP: Passing config to GCS: base_host = IPLOCALHOST; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysqlgalera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3 131211 20:04:35 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 131211 20:04:35 [Note] WSREP: wsrep_sst_grab() 131211 20:04:35 [Note] WSREP: Start replication 131211 20:04:35 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 131211 20:04:35 [Note] WSREP: protonet asio version 0 131211 20:04:35 [Note] WSREP: backend: asio 131211 20:04:35 [Note] WSREP: GMCast version 0 131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6, 'tcp:0.0.0.0:4567') listening at tcp:0.0.0.0:4567 131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6, 'tcp:0.0.0.0:4567') multicast: , ttl: 1 131211 20:04:35 [Note] WSREP: EVS version 0 131211 20:04:35 [Note] WSREP: PC version 0 131211 20:04:35 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '162.243.62.104:' 131211 20:04:35 [Note] WSREP: declaring 15031b4b-6281-11e3-9ad7-8a59b1f0cba0 stable 131211 20:04:35 [Note] WSREP: view(view_id(NON_PRIM,15031b4b-6281-11e3-9ad7-8a59b1f0cba0,8) memb { 15031b4b-6281-11e3-9ad7-8a59b1f0cba0, 74e621f3-629f-11e3-a86f-0adb496b3ff6, } joined { } left { } partitioned { 18299990-6281-11e3-a268-975810126780, a6d47aea-6281-11e3-b007-16991d53e685, d52af178-6281-11e3-871d-03c1e15f1ac4, }) 131211 20:05:05 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():139 131211 20:05:05 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out) 131211 20:05:05 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1289: Failed to open channel 'my_wsrep_cluster' at 'gcomm:IPMASTER': -110 (Connection timed out) 131211 20:05:05 [ERROR] WSREP: gcs connect failed: Connection timed out 131211 20:05:05 [ERROR] WSREP: wsrep::connect() failed: 6 131211 20:05:05 [ERROR] Aborting
131211 20:05:05 [Note] WSREP: Service disconnected. 131211 20:05:06 [Note] WSREP: Some threads may fail to exit. 131211 20:05:06 [Note] /usr/sbin/mysqld: Shutdown complete ------------------------------------------------------------------------------------------------
Por otro lado, el master al levantar con el mismo script levanta mal, por eso supongo que el slave no se conecta, si bien el pid esta arriba, cuando entro a la db puedo listar bases pero no modificar nada ni escribir, me tira esto.
[root@xxx ]# mysql -uxxx -pxxxx Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 24 Server version: 5.5.33a-MariaDB
Copyright © 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> create database TESTEOKILL2; ERROR 1047 (08S01): Unknown command MariaDB [(none)]>
Alguna idea de que puede estar pasando ?. Gracias!.