Simple cluster de dos nodos.

Buenas comunidad vengo a solicitar la ayuda de ustedes, tengo un simple caso de uso en que tengo dos nodos, con los cuales quiero simular una especie de failover con un script que paso a mostrar.

----------------------------------------------------------------------------------------------------------------------------------

  1. !/bin/bash

if [ $1 = "start" ] then

PidMaria=$(ps -ef | grep wsrep_cluster | grep -v grep | awk {'print $2'}) Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'}) TipoPid=$(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1") TipoPidLocal=$(ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1)

if [ -z $PidMaria ] then

  1. sleep 20 Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'}) if [ -z $Server2 ] then if [ -z $TipoPid ] then if [ -z $(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep " | awk {'print $2'}) ] then /usr/sbin/mysqld --wsrep_cluster_address=gcomm: --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi else /usr/sbin/mysqld --wsrep_cluster_address=gcomm:ipnodo2 --wsrep_sst_auth=root:root --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi else if [ -z $TipoPid ] then /usr/sbin/mysqld --wsrep_cluster_address=gcomm:ipnodo2 --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi fi

else if [ -z $Server2 ] then if [ -z $TipoPidLocal ] then echo "hola" else kill -9 $PidMaria /usr/sbin/mysqld --wsrep_cluster_address=gcomm: --wsrep_sst_auth=root:root --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi else if [ -z $PidMaria ] then /usr/sbin/mysqld --wsrep_cluster_address=gcomm:ipnodo2 --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so fi fi

fi fi --------------------------------------------------------------------------------------------------------------------------------

Este script lo que hace es simple, esta croneado cada 1 minuto, si el master se cae, mata el pid local (slave) y lo levanta como master, lo que haría en el nodo1 es lo mismo, solo que al estar caído y detectar que el otro server esta como master levantarse así mismo como slave, acá el error.

--------------------------------------------------------------------------------------------------------------------------- + /usr/sbin/mysqld --wsrep_cluster_address=gcomm:IPMASTER --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so 131211 20:04:35 [Note] WSREP: Read nil XID from storage engines, skipping position init 131211 20:04:35 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 131211 20:04:35 [Note] WSREP: wsrep_load(): Galera 23.2.7(r157) by Codership Oy <info@codership.com> loaded succesfully. 131211 20:04:35 [Note] WSREP: Found saved state: 1503cc31-6281-11e3-abfc-5bf96ca010d8:-1 131211 20:04:35 [Note] WSREP: Reusing existing '/var/lib/mysqlgalera.cache'. 131211 20:04:35 [Note] WSREP: Passing config to GCS: base_host = IPLOCALHOST; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysqlgalera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3 131211 20:04:35 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 131211 20:04:35 [Note] WSREP: wsrep_sst_grab() 131211 20:04:35 [Note] WSREP: Start replication 131211 20:04:35 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 131211 20:04:35 [Note] WSREP: protonet asio version 0 131211 20:04:35 [Note] WSREP: backend: asio 131211 20:04:35 [Note] WSREP: GMCast version 0 131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6, 'tcp:0.0.0.0:4567') listening at tcp:0.0.0.0:4567 131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6, 'tcp:0.0.0.0:4567') multicast: , ttl: 1 131211 20:04:35 [Note] WSREP: EVS version 0 131211 20:04:35 [Note] WSREP: PC version 0 131211 20:04:35 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '162.243.62.104:' 131211 20:04:35 [Note] WSREP: declaring 15031b4b-6281-11e3-9ad7-8a59b1f0cba0 stable 131211 20:04:35 [Note] WSREP: view(view_id(NON_PRIM,15031b4b-6281-11e3-9ad7-8a59b1f0cba0,8) memb { 15031b4b-6281-11e3-9ad7-8a59b1f0cba0, 74e621f3-629f-11e3-a86f-0adb496b3ff6, } joined { } left { } partitioned { 18299990-6281-11e3-a268-975810126780, a6d47aea-6281-11e3-b007-16991d53e685, d52af178-6281-11e3-871d-03c1e15f1ac4, }) 131211 20:05:05 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():139 131211 20:05:05 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out) 131211 20:05:05 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1289: Failed to open channel 'my_wsrep_cluster' at 'gcomm:IPMASTER': -110 (Connection timed out) 131211 20:05:05 [ERROR] WSREP: gcs connect failed: Connection timed out 131211 20:05:05 [ERROR] WSREP: wsrep::connect() failed: 6 131211 20:05:05 [ERROR] Aborting

131211 20:05:05 [Note] WSREP: Service disconnected. 131211 20:05:06 [Note] WSREP: Some threads may fail to exit. 131211 20:05:06 [Note] /usr/sbin/mysqld: Shutdown complete ------------------------------------------------------------------------------------------------

Por otro lado, el master al levantar con el mismo script levanta mal, por eso supongo que el slave no se conecta, si bien el pid esta arriba, cuando entro a la db puedo listar bases pero no modificar nada ni escribir, me tira esto.

[root@xxx ]# mysql -uxxx -pxxxx Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 24 Server version: 5.5.33a-MariaDB

Copyright © 2000, 2013, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create database TESTEOKILL2; ERROR 1047 (08S01): Unknown command MariaDB [(none)]>

Alguna idea de que puede estar pasando ?. Gracias!.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.