MaxScale HA setup using Keepalived and MaxCtrl

MariaDB MaxScale is a database proxy which does load balancing and query routing from client applications to backend database servers. In a basic configuration, MaxScale is a single point of failure. In this blog post we show how to setup a more resilient MaxScale HA cluster using Keepalived and MaxCtrl.
Keepalived is a routing software for load balancing and high-availability. It has several applications, but for this tutorial the goal is to set up a simple IP failover between two machines running MaxScale. If the main server fails the backup machine takes over, receiving any new connections. The Keepalived settings used in this tutorial follow the example given in simple keepalived failover setup on Ubuntu 14.04.
The configuration examples in this blog are for a setup where two MaxScales are monitoring one database cluster. Two hosts and one client machine are used, all in the same LAN. Hosts run MaxScale and Keepalived. The backend servers may be running on one of the hosts, e.g. in docker containers, or on separate machines for a more realistic setup. Clients connect to the virtual IP (VIP), which is claimed by the current master host.
Once configured and running, the different Keepalived nodes continuously broadcast their status to the network and listen for each other. If a node does not receive a status message from another node with a higher priority than itself, it will claim the VIP, effectively becoming the master. Thus, a node can be put online or removed by starting and stopping the Keepalived service.
If the current master node is removed (e.g. by stopping the service or pulling the network cable) the remaining nodes will quickly elect a new master and future traffic to the VIP will be directed to that node. Any connections to the old master node will naturally break. If the old master comes back online, it will again claim the VIP, breaking any connections to the backup machine.
MaxScale has no knowledge of this even happening. Both MaxScales are running normally, monitoring the backend servers and listening for client connections. Since clients are connecting through the VIP, only the machine claiming the VIP will receive incoming connections. The connections between MaxScale and the backends are using real IPs and are unaffected by the VIP.
Configuration
MaxScale does not require any specific configuration to work with Keepalived in this simple setup, it just needs to be running on both hosts. The MaxScale configurations should be roughly similar on both hosts if you plan on synchronizing any changes between the MaxScale instances. Specifically, both instances should have the same services and listeners so they appear identical to client applications. Setting the service-level setting “version_string” to different values on the MaxScale nodes is recommended, as it will be printed to any connecting clients indicating which node was connected to.
[Read-Write Service] type=service router=readwritesplit version_string=PrimaryMaxScale ...
Keepalived requires specific setups on both machines. On the primary host, the /etc/keepalived/keepalived.conf-file should be as follows.
vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 150 advert_int 1 authentication { auth_type PASS auth_pass mypass } virtual_ipaddress { 192.168.1.123 } }
The state must be MASTER on both hosts. virtual_router_id and auth_pass must be identical on all hosts. The interface defines the network interface used. This depends on the system, but often the correct value is eth0, enp0s12f3 or similar. priority defines the voting strength between different Keepalived instances when negotiating on which should be the master. The instances should have different values of priority. In this example, the backup host(s) could have priority 149, 148 and so on. advert_int is the interval between a host “advertising” its existence to other Keepalived host. One second is a reasonable value.
virtual_ipaddress (VIP) is the IP the different Keepalived hosts try to claim and must be identical between the hosts. For IP negotiation to work, the VIP must be in the local network address space and unclaimed by any other machine in the LAN.
An example keepalived.conf-file for a backup host is listed below.
vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass mypass } virtual_ipaddress { 192.168.1.123 } }
Once the Keepalived service is running, recent log entries can be printed with the command service keepalived status.
Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Received higher prio advert Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Entering BACKUP STATE Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) removing protocol VIPs.
MariaDB MaxScale Health Check
So far, none of this tutorial has been MaxScale-specific and the health of the MaxScale process has been ignored. To ensure that MaxScale is running on the current master host, a check script should be set. Keepalived runs the script regularly and if the script returns an error value, the Keepalived node will assume that it has failed, stops broadcasting its state and relinquishes the VIP. This allows another node to take the master status and claim the VIP.
To define a check script, modify the configuration as follows. The example is for the primary node. See Keepalived Check and Notify Scripts for more information.
vrrp_script chk_myscript { script "/home/scripts/is_maxscale_running.sh" interval 2 # check every 2 seconds fall 2 # require 2 failures for KO rise 2 # require 2 successes for OK } vrrp_instance VI_1 { state MASTER interface wlp2s0 virtual_router_id 51 priority 150 advert_int 1 authentication { auth_type PASS auth_pass mypass } virtual_ipaddress { 192.168.1.13 } track_script { chk_myscript } }
An example script, is_maxscale_running.sh, is listed below. The script uses MaxAdmin to try to contact the locally running MaxScale and request a server list, then check that the list has at least some expected elements. The timeout command ensures the MaxAdmin call exits in reasonable time. The script detects if MaxScale has crashed, is stuck or is totally overburdened and no longer responds to connections. Simply checking that the MaxScale process is running would be a simple yet likely an adequate option.
#!/bin/bash fileName="maxadmin_output.txt" rm $fileName timeout 2s maxadmin list servers > $fileName to_result=$? if [ $to_result -ge 1 ] then echo Timed out or error, timeout returned $to_result exit 3 else echo MaxAdmin success, rval is $to_result echo Checking maxadmin output sanity grep1=$(grep server1 $fileName) grep2=$(grep server2 $fileName) if [ "$grep1" ] && [ "$grep2" ] then echo All is fine exit 0 else echo Something is wrong exit 3 fi fi
Aug 11 10:51:56 maxscale2 Keepalived_vrrp[20257]: VRRP_Script(chk_myscript) failed Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Entering FAULT STATE Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) removing protocol VIPs. Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Now in FAULT state
MaxScale active/passive-setting
MariaDB MaxScale 2.2.2 introduced master/slave replication cluster management features (failover, switchover and rejoin). When running a setup with multiple MaxScales, only one MaxScale instance should be allowed to modify the master/slave replication cluster at any given time. This instance should be the one with MASTER Keepalived status. MaxScale does not know its Keepalived state, but MaxCtrl (a replacement for MaxAdmin) can set a MaxScale instance to passive mode. A passive MaxScale behaves similar to an active one with the exception that it won’t perform failover, switchover or rejoin. Even manual versions of these commands will end in error. The passive/active mode differences may be expanded in the future.
To have Keepalived modify the MaxScale operating mode, a notify script is needed. This script is ran whenever Keepalived changes its state. The script file is defined in the Keepalived configuration file as notify.
... virtual_ipaddress { 192.168.1.13 } track_script { chk_myscript } notify /home/scripts/notify_script.sh ...
Keepalived calls the script with three parameters. In our case, only the third parameter, STATE, is relevant. An example script is below.
#!/bin/bash TYPE=$1 NAME=$2 STATE=$3 OUTFILE=/home/user/state.txt case $STATE in "MASTER") echo "Setting this MaxScale node to active mode" > $OUTFILE maxctrl alter maxscale passive false exit 0 ;; "BACKUP") echo "Setting this MaxScale node to passive mode" > $OUTFILE maxctrl alter maxscale passive true exit 0 ;; "FAULT") echo "MaxScale failed the status check." > $OUTFILE maxctrl alter maxscale passive true exit 0 ;; *) echo "Unknown state" > $OUTFILE exit 1 ;; esac
The script logs the current state to a text file and sets the operating mode of MaxScale. The FAULT case also attempts to set MaxScale to passive mode, although the MaxCtrl command will likely fail.
If all MaxScale/Keepalived instances have a similar notify script, only one MaxScale should ever be in active mode. The mode of a MaxScale instance can be checked with the command maxctrl show maxscale, shown below. This MaxScale is “active”. A later blog post will show MaxCtrl use in more detail.
[vagrant@maxscale1 ~]$ maxctrl show maxscale ┌──────────────┬────────────────────────────────────────────────────────┐ │ Version │ 2.2.2 │ ├──────────────┼────────────────────────────────────────────────────────┤ . . . ├──────────────┼────────────────────────────────────────────────────────┤ │ Parameters │ { │ │ │ "libdir": "/usr/lib64/maxscale", │ │ │ "datadir": "/var/lib/maxscale", . . . │ │ "passive": false, │ │ │ "query_classifier": "" │ │ │ } │
Get started with MariaDB MaxScale—download it today!
Post a Comment
Log into your MariaDB ID account to post a comment.