MariaDB MaxScale High Availability: Active-Standby Cluster
High availability is one of the most important aspects of running a business. Whether it is a simple app for a small group of users or a large service used by millions, downtime costs a lot of money and ruins everyone’s day. No matter how hard we try, downtime is inevitable. The easiest way to handle downtime is to be prepared for it and design your infrastructure to be highly available from the start.
From a HA point of view, MariaDB MaxScale is quite easy to deploy in an Active-Standby setup. It can be treated as a simple resource (it either works or not) due to the algorithmic monitoring employed by MaxScale. This method of monitoring guarantees that all MaxScale instances will act the same way and route traffic to the right server. Both traditional master-slave clusters and Galera clusters can be used with MaxScale in an HA setup but for this blog, we’ll focus on a MariaDB 10.1 master-slave cluster.
In this blog post, we deploy MaxScale in an Active-Standby style setup with Corosync/Pacemaker. This setup was tested with CentOS 7 using MaxScale 2.0.1 and MariaDB 10.1.14.
We'll cover the topic of backend database HA in a followup blog post.
The high availability cluster we are creating will have one active MaxScale and one standby MaxScale. The active instance will be assigned a virtual IP address provided by Corosync. This VIP will act as our gateway into the database cluster.
Installing Corosync/Pacemaker and MaxScale
There is an excellent guide on how to install Corosync/Pacemaker for CentOS 7 on the ClusterLabs website, you can follow it to set up your cluster. For this blog, we installed the clustering software, added hostnames for two nodes (node1 and node2) and configured the firewall to allow communication between the two nodes.
We’ll use the RPM package of MaxScale for this. Just download it and install it.
sudo yum install ./maxscale-beta-2.0.0-1.centos.7.x86_64.rpm
We need to authenticate on all the nodes and start the cluster. The hacluster user is created when corosync is installed.
sudo pcs cluster auth -u hacluster node1 node2 sudo pcs cluster start --all
The first thing that we have to do is to disable the STONITH feature of corosync. Simply put, it’s not good for testing two node setups. Read the STONITH Chapter of the ClusterLabs guide for more details on how to configure it properly for a production environment.
sudo pcs property set stonith-enabled=false
The next step is to create a cluster setup. Node1 and node2 are the hostnames of the two nodes where are are deploying the MaxScale instances.
sudo pcs cluster setup --name maxscale_cluster node1 node2
Then we add MaxScale as a resource. We’ll tell pcs to use MaxScale via systemd and monitor the resource every second.
sudo pcs resource create maxscale systemd:maxscale op monitor interval=1s
Clone the maxscale resource so that all nodes will keep an instance of MaxScale started. This will create the maxscale-clone resource.
sudo pcs resource clone maxscale
The next thing to do is to create the virtual IP address.
sudo pcs resource create clusterip ocf:heartbeat:IPaddr2 ip=192.168.56.220 cidr_netmask=24 op monitor interval=20s
Configure the cluster so that failed resources stay on the node where they were moved to.
sudo pcs resource meta clusterip migration-threshold=1 failure-timeout=60s resource-stickiness=100 sudo pcs resource meta maxscale-clone migration-threshold=1 failure-timeout=60s resource-stickiness=100
We’ll also add a colocation constraint for the VIP. This constraint just tells that the clusterip resource requires a working maxscale-clone resource. This way we’ll always have the VIP and MaxScale running on the same node.
sudo pcs constraint colocation add clusterip with maxscale-clone INFINITY
Here’s a simple configuration file that we’ll use for testing the setup. Copy it into both server’s /etc folder. Notice that the <INSERT NAME HERE> part should be replaced with the hostname of the node in question. For this test, the values for it are node1 and node2. We’ll later see it as the connection string given to the client and we’ll use it to distinguish the nodes from each other.
[maxscale] threads=2 [server1] type=server address=192.168.56.1 port=3000 protocol=MySQLBackend [server2] type=server address=192.168.56.1 port=3001 protocol=MySQLBackend [server3] type=server address=192.168.56.1 port=3002 protocol=MySQLBackend [server4] type=server address=192.168.56.1 port=3003 protocol=MySQLBackend [MySQL Monitor] type=monitor module=mysqlmon servers=server1,server2,server3,server4 user=maxuser passwd=maxpwd monitor_interval=1000 [Read-Write Service] type=service router=readwritesplit servers=server1,server2,server3,server4 version_string=5.5.5-10.1.14 <INSERT NAME HERE> user=maxuser passwd=maxpwd [MaxAdmin Service] type=service router=cli [Read-Write Listener] type=listener service=Read-Write Service protocol=MySQLClient port=4006 [MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default
Testing the Cluster
We start off with a fully functional cluster.
[user@node2 ~]$ sudo pcs resource clusterip (ocf::heartbeat:IPaddr2): Started node1 Clone Set: maxscale-clone [maxscale] Started: [ node1 node2 ] Stopped: [ localhost.localdomain ] [user@node2 ~]$ sudo pcs resource show clusterip Resource: clusterip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.56.220 Meta Attrs: migration-threshold=1 failure-timeout=60s resource-stickiness=100 Operations: start interval=0s timeout=20s (clusterip-start-interval-0s) stop interval=0s timeout=20s (clusterip-stop-interval-0s) monitor interval=5s (clusterip-monitor-interval-5s) [user@node2 ~]$ mysql -u maxuser -pmaxpwd -h 192.168.56.220 -P 4006 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 29651 Server version: 10.1.14 node1 mariadb.org binary distribution
We can see that the MaxScale-clone resource is running on both nodes. The VIP is currently assigned to node1 and all queries to the clusterip resource address at 192.168.56.220 go through node1. Next, we’ll shut down the node1 cluster.
sudo pcs cluster stop node1
Now, when we execute the same commands that we executed before, we’ll see that the VIP has moved to the standby MaxScale on node2.
[user@node2 ~]$ sudo pcs resource clusterip (ocf::heartbeat:IPaddr2): Started node2 Clone Set: MaxScale-clone [MaxScale] Started: [ node2 ] Stopped: [ localhost.localdomain node1 ] [user@node2 ~]$ mysql -u maxuser -pmaxpwd -h 192.168.56.220 -P 4006 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 29651 Server version: 10.0.24 node2 mariadb.org binary distribution
And that’s it, we have a highly available active-standby MaxScale setup ready for testing.
There are other ways to achieve high availability other than using cluster management software, although it is probably the most common. Keepalived is one of the popular alternatives for Corosync/Pacemaker. It’s similar but it provides a somewhat simpler approach. There are plenty of good tutorials and articles on keepalived but personally I like the DigitalOcean one. It’s simple and clean and it could be applied to MaxScale with small changes.
In the future (and perhaps even now), this could be handled with containers and automatically scaling clusters. Containers work quite nicely with the MaxScale ideology of statelessness and being able to scale up as many MaxScale instances as needed at a moment's notice is an idea I like.
Stay tuned for the second part of this blog post where we make the whole cluster highly available.