MaxScale 2.2: Failover, Switchover, Auto Rejoin

Failure tolerance and recoverability are essential for a high availability (HA) database setup. Although modern systems are quite reliable, hardware errors or software bugs (not necessarily in the database itself) can bring a system down. MariaDB HA setups use master-slave replication to copy the data to multiple servers, which may be located in different datacenters. Should the master server fail the application can be directed to use one of the slave servers. This operation either requires manual interference from a dba or a custom automated script. Depending on time of day and personnel, manual operation may be slow. Custom scripts may lack testing and flexibility. Clearly, recovery should be automatic, thoroughly tested and preferably included in existing database scalability software.

To answer this demand, MariaDB MaxScale 2.2.2. adds the following master-slave replication cluster management features:

Failover: replace a failed master with the most up-to-date slave
Switchover: swap the running master with a designated slave
Rejoin: rejoin a standalone server to the cluster as a slave

MariaDB MaxScale is an advanced database proxy for MariaDB database servers. It sits between client applications and the database servers, routing client queries and server responses. MaxScale also monitors the servers, so it will quickly notice any changes in server status or replication topology. This makes MaxScale a natural choice for controlling failover and similar features.

Failover for the master-slave cluster can and often should be set to activate automatically. Switchover must be activated manually through MaxAdmin, MaxCtrl or the REST interface. Rejoin can be set to automatic or activated manually. These features are implemented in the mariadbmonitor-module. This module replaces the old mysqlmonitor (MaxScale is still backwards compatible with the old name). All three operations require GTID-based replication and are intended for simple single-master replication topologies. Additionally, failover and switchover expect the topology to be one-layer deep. The cluster master may be replicating from an external master, in which case a promoted master server is instructed to replicate from the external master.

In this blog post, we present an example setup and experiment with the new features. The database setup for this example is:

One VM for MariaDB MaxScale 2.2.2
One VM for the master MariaDB Server
One VM for the slave MariaDB Server

_MaxScale Cluster.jpg

[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
box02              | 192.168.50.12   |  3306 |           0 | Master, Running
box03              | 192.168.50.13   |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

Here is the vagrantfile used for the examples of this blog:

# -*- mode: ruby maxscale222
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

 #: adding ansible stuff
 config.ssh.insert_key = false

 #: maxscale box
 config.vm.define "box01" do |box01|
 box01.vm.hostname="box01"
 box01.vm.box = "centos7.2_x86_64"
 box01.vm.network "private_network", ip: "192.168.50.11", virtualbox__intnet: "XY"
 end

 ######: MASTER / SLAVE SERVERS :######
 #: master, async && semisync replication
 config.vm.define "box02" do |box02|
box02.vm.hostname="box02"
       box02.vm.box = "centos7.2_x86_64"
       box02.vm.network "private_network", ip: "192.168.50.12", virtualbox__intnet: "XY"
 end

 #: slave01, async && semisync replication
 config.vm.define "box03" do |box03|
       box03.vm.hostname="box03"
       box03.vm.box = "centos7.2_x86_64"
       box03.vm.network "private_network", ip: "192.168.50.13", virtualbox__intnet: "XY"
 end
end

Setting up MariaDB MaxScale 2.2.2

In these examples we are running CentOS 7.2. If you are running a Debian based Linux distribution, check here for the best MaxScale package for your system. After downloading, install the packages and configure MaxScale as presented below.

#: packages you want to download
[root@box01 ~]# ls -lhS
total 15M
-rw-r--r--  1 root root 7.8M Jan 10 20:44 maxscale-client-2.2.2-1.centos.7.x86_64.rpm
-rw-r--r--  1 root root 7.0M Jan 10 20:44 maxscale-2.2.2-1.centos.7.x86_64.rpm

#: set up them
[root@box01 ~]# rpm -ivh *.rpm
Preparing...                          ################################# [100%]
Updating / installing...
  1:maxscale-client-2.2.2-1          ################################# [ 50%]
  2:maxscale-2.2.2-1                 ################################# [100%]

#: checking the version
[root@box01 ~]# maxscale --version-full
MaxScale 2.2.2 - eda82881619388a3512d6cfcbcf9ad83ea930339
[...snip...]

#: basic configuration - /etc/maxscale.cnf
[maxscale]
threads=auto
log_info=true

[rwsplit-service]
type=service
router=readwritesplit
user=maxuser
passwd=D96E8B61A569EDD7C625D822203932FA

[CLI]
type=service
router=cli

[CLI Unix Listener]
type=listener
service=CLI
protocol=maxscaled
socket=default

[CLI Inet Listener]
type=listener
service=CLI
protocol=maxscaled
address=localhost
port=6603

Above, the password for the service user is encrypted. An encrypted password can be generated with the maxkeys and maxpasswd utilities. For more information, check maxkeys/maxpasswd. Once configuration is complete, start MaxScale:

[root@box01 ~]# systemctl enable maxscale.service
Created symlink from /etc/systemd/system/multi-user.target.wants/maxscale.service to /usr/lib/systemd/system/maxscale.service.
[root@box01 ~]# systemctl is-enabled maxscale.service
enabled
[root@box01 ~]# systemctl start maxscale.service
[root@box01 ~]# systemctl status maxscale.service
● maxscale.service - MariaDB MaxScale Database Proxy
  Loaded: loaded (/usr/lib/systemd/system/maxscale.service; enabled; vendor preset: disabled)
  Active: active (running) since Fri 2018-01-12 00:24:21 GMT; 5s ago
 Process: 4035 ExecStart=/usr/bin/maxscale (code=exited, status=0/SUCCESS)
 Process: 4032 ExecStartPre=/usr/bin/install -d /var/run/maxscale -o maxscale -g maxscale (code=exited, status=0/SUCCESS)
Main PID: 4038 (maxscale)
  CGroup: /system.slice/maxscale.service
          └─4038 /usr/bin/maxscale

The following script demonstrates MaxScale’s runtime configuration management. These items could have been added to the configuration file instead. The commands generate a server cluster monitor within MaxScale and set it up for automatic cluster management. The individual parameters set here are presented in the next section.

#!/bin/bash

#: creating the monitor
maxadmin create monitor cluster-monitor mariadbmon

#: adding more features for the MariaDBMon monitor
maxadmin alter monitor cluster-monitor user=maxuser password=ACEEF153D52F8391E3218F9F2B259EAD monitor_interval=1000 replication_user=mariadb replication_password=ACEEF153D52F8391E3218F9F2B259EAD failcount=5 auto_failover=true auto_rejoin=true

#: restarting the monitor
maxadmin restart monitor cluster-monitor

#: creating the service listener
maxadmin create listener rwsplit-service rwsplit-listener 0.0.0.0 53310

#: creating and adding the servers
maxadmin create server prod_mariadb01 192.168.50.12 3306
maxadmin create server prod_mariadb02 192.168.50.13 3306
maxadmin add server prod_mariadb01 cluster-monitor rwsplit-service
maxadmin add server prod_mariadb02 cluster-monitor rwsplit-service

Before executing the script above, you should generate the users maxuser and mariadb (or whatever usernames were generated by the script) on the backends. Again, their encrypted passwords for the script should be generated with maxpasswd.

#: script execution output

[root@box01 ~]# ./mxs222_configs_rwsplit.sh
Created monitor 'cluster-monitor'
Listener 'rwsplit-listener' created
Created server 'prod_mariadb01'
Created server 'prod_mariadb02'
Added server 'prod_mariadb01' to 'cluster-monitor'
Added server 'prod_mariadb01' to 'rwsplit-service'
Added server 'prod_mariadb02' to 'cluster-monitor'
Added server 'prod_mariadb02' to 'rwsplit-service'

The monitor is now running. To check the status of Maxscale, execute the following:

#: listing servers after creating the configurations

[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Master, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

#: listing monitors

[root@box01 ~]# maxadmin list monitors

---------------------+---------------------
Monitor              | Status
---------------------+---------------------
cluster-monitor      | Running
---------------------+---------------------

Finally, you may check the listener and the open port:

[root@box01 ~]# maxadmin list listeners | grep rwsplit-listener
Rwsplit-listener | rwsplit-service | MariaDBClient | 0.0.0.0 | 53310 | Running 

[root@box01 ~]# netstat -l | grep 53310
tcp        0      0 0.0.0.0:53310           0.0.0.0:*               LISTEN

Monitor configuration parameters

The following parameters enable and control the cluster management features.

replication_user and replication_password: These are the username and the password used by MariaDBMonitor when generating a CHANGE MASTER TO-command.
auto_failover: Enables automatic failover. Failover can be activated manually regardless of this setting
failover_timeout: Time limit (in seconds) for executing a failover, measured from the moment failover (automatic or manual) is activated. If time runs out, an event is logged, and automatic failover is disabled. Typically, the timeout is only reached if the selected new master server cannot consume its relay log quickly enough.
auto_rejoin: Enable automatic rejoin. When enabled, two types of servers are set to replicate from the current cluster master:
- Standalone servers (no slave thread)
- Any server replicating from (or attempting to) from a server which is not the cluster master server.
failcount: How many times (during different monitoring passes) a server must fail to respond to status query before it is declared down and an automatic failover may be triggered if enabled.
verify_master_failure: This enables an additional criteria for triggering an automatic failover. The monitor will look at the master binlog file positions of the slave servers and if they have advanced within a configured timeout, failover is not activated even if the monitor cannot connect to the master. This means that at least one slave still receives events even if MaxScale cannot connect to the master.
master_failure_timeout: The timeout for verify_master_failure.
switchover_timeout: Similar to failover_timeout, just for switchover.

An example configuration file section for a monitor with these settings is below.

[wb@maxscale maxscale.cnf.d]$ cat /var/lib/maxscale/maxscale.cnf.d/cluster-monitor.cnf

[cluster-monitor]
type=monitor
module=mariadbmon
servers=prod_mariadb01,prod_mariadb02
user=maxuser
password=ACEEF153D52F8391E3218F9F2B259EAD
replication_user=mariadb
replication_password=ACEEF153D52F8391E3218F9F2B259EAD
monitor_interval=1000 #: it should be >= 5000 for production
auto_failover=1
failover_timeout=5    #: it should be >= 10 for production
auto_rejoin=true
failcount=5
master_failure_timeout=2
verify_master_failure=true
switchover_timeout=90

Switchover

If the current master is showing any issues, you may want to promote a slave to take its place. The switchover-command takes three arguments: the monitor name, the slave to be promoted and the current master.

#: switchover process
#: listing servers and current status
[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Master, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

#: command to make the current slave a new master
[root@box01 ~]# maxadmin call command mariadbmon switchover cluster-monitor prod_mariadb02 prod_mariadb01

#: what Maxscale logs says, default location /var/log/maxscale/maxscale.log
2018-01-12 20:00:28   info   : (2) Started CLI client session [8] for 'root' from localhost
2018-01-12 20:00:28   info   : (8) [cli] MaxAdmin: call command "mariadbmon" "switchover" "cluster-monitor" "prod_mariadb02" "prod_mariadb01"
2018-01-12 20:00:29   notice : (8) [mariadbmon] Stopped the monitor cluster-monitor for the duration of switchover.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Demoting server 'prod_mariadb01'.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Promoting server 'prod_mariadb02' to master.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Old master 'prod_mariadb01' starting replication from 'prod_mariadb02'.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Redirecting slaves to new master.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Switchover prod_mariadb01 -> prod_mariadb02 performed.2018-01-12 20:00:29   info   : Stopped CLI client session [8]

The warning messages suggest activating gtid_strict_mode on the servers, as this enables some additional checks when a server is starting replication.

#: listing servers again
[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Slave, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------

Switchover uses the server global setting read_only to freeze the master server when preparing to switch. Users with SUPER-privilege bypass read_only, which allows them to modify data during a switchover. This often causes replication to break as different servers have different events. To prevent this, make sure that any users who regularly do write queries do not have SUPER.

Failover

Failover is activated when the master crashes or becomes unavailable. MariaDB Monitor will detect that the master is out of reach, will wait for a while in case the master quickly comes back (wait time is configurable), and finally begins failover to replace the failed master with a slave.

For example, if failcount is 5 and monitor_interval is 1000, the failover requires 5 monitor passes without master server connection, with one second waits between monitor passes.

Let’s demonstrate by shutting down the current master with systemctl.

#: failover, let’s kill the current master
[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Slave, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------

[root@box03 mysql]# systemctl stop mariadb
[root@box03 mysql]# systemctl status mariadb
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
          └─migrated-from-my.cnf-settings.conf
Active: inactive (dead) since Fri 2018-01-12 20:19:39 GMT; 12s ago
Process: 4295 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 4259 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 4223 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 4221 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 4259 (code=exited, status=0/SUCCESS)
Status: "MariaDB server is down"

Below is an excerpt of the MaxScale log. mariadbmon detects that the current master has gone away and after 2 monitor passes failover activates.

#: what Maxscale logs says, default location /var/log/maxscale/maxscale.log
2018-01-12 20:19:39   error  : Monitor was unable to connect to server [192.168.50.13]:3306 : "Can't connect to MySQL server on '192.168.50.13' (115)"
2018-01-12 20:19:39   notice : [mariadbmon] Server [192.168.50.13]:3306 lost the master status.
2018-01-12 20:19:39   notice : Server changed state: prod_mariadb02[192.168.50.13:3306]: master_down. [Master, Running] -> [Down]
2018-01-12 20:19:39   warning: [mariadbmon] Master has failed. If master status does not change in 2 monitor passes, failover begins.
2018-01-12 20:19:39   error  : [mariadbmon] No Master can be determined. Last known was 192.168.50.13:3306
2018-01-12 20:19:41   notice : [mariadbmon] Performing automatic failover to replace failed master 'prod_mariadb02'.
2018-01-12 20:19:41   notice : [mariadbmon] Promoting server 'prod_mariadb01' to master.
2018-01-12 20:19:41   notice : [mariadbmon] Redirecting slaves to new master.
2018-01-12 20:19:42   warning: [mariadbmon] Setting standalone master, server 'prod_mariadb01' is now the master.
2018-01-12 20:19:42   notice : Server changed state: prod_mariadb01[192.168.50.12:3306]: new_master. [Slave, Running] -> [Master, Running]
#: checking the server's status
[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Master, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Down
-------------------+-----------------+-------+-------------+--------------------

Automatic rejoin

When auto_rejoin is enabled, the monitor will rejoin any standalone database servers or any slaves replicating from a relay master to the main cluster. The typical use case for this feature is rejoining the old master after a failover. Should the master come back online after a slave was already promoted to its place, it would not be immediately replicating. Auto-rejoin will detect this and redirect the master. This is not certain to succeed as the master may have conflicting events. In this case the slave thread will end in an error.

Below is an example of a successful operation:

#: let’s test the auto_rejoin now as we will back up with

#: the server we put down on the failover exercise
[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Master, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Down
-------------------+-----------------+-------+-------------+--------------------

#: what Maxscale logs says, default location /var/log/maxscale/maxscale.log
2018-01-12 20:22:43   notice : Server changed state: prod_mariadb02[192.168.50.13:3306]: server_up. [Down] -> [Running]
2018-01-12 20:22:43   notice : [mariadbmon] Directing standalone server 'prod_mariadb02' to replicate from 'prod_mariadb01'.
2018-01-12 20:22:43   notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster.
2018-01-12 20:22:44   notice : Server changed state: prod_mariadb02[192.168.50.13:3306]: new_slave. [Running] -> [Slave, Running]

Above, the server backend prod_master02 has returned and was joined to the cluster as a slave of the current master.

#: checking the server's status
[root@box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
prod_mariadb01     | 192.168.50.12   |  3306 |           0 | Master, Running
prod_mariadb02     | 192.168.50.13   |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

Additional Comments

If you omit the replication_user and the replication_password on the monitor configurations, the username and password used by the monitor to check the current state of the backends will be used instead. In this case the monitor user should have, in addition to its normal rights, the ability to connect among the backends as well. Usually the user for MariaDBMon is restricted to connections only from the MaxScale host.
If you use an encrypted password for the monitor user, the replication_password should be encrypted as well. Otherwise, the CHANGE MASTER TO query will fail.
MariaDB Servers forming a cluster should be configured with gtid_strict_mode enabled to make sure databases have the same binary log order among the instances.