October 4, 2016

MariaDB MaxScale High Availability: Active-Standby Cluster

High availability is one of the most important aspects of running a business. Whether it is a simple app for a small group of users or a large service used by millions, downtime costs a lot of money and ruins everyone’s day. No matter how hard we try, downtime is inevitable. The easiest way to handle downtime is to be prepared for it and design your infrastructure to be highly available from the start.

From a HA point of view, MariaDB MaxScale is quite easy to deploy in an Active-Standby setup. It can be treated as a simple resource (it either works or not) due to the algorithmic monitoring employed by MaxScale. This method of monitoring guarantees that all MaxScale instances will act the same way and route traffic to the right server. Both traditional master-slave clusters and Galera clusters can be used with MaxScale in an HA setup but for this blog, we’ll focus on a MariaDB 10.1 master-slave cluster.

In this blog post, we deploy MaxScale in an Active-Standby style setup with Corosync/Pacemaker. This setup was tested with CentOS 7 using MaxScale 2.0.1 and MariaDB 10.1.14.

We'll cover the topic of backend database HA in a followup blog post.

Cluster Layout

The high availability cluster we are creating will have one active MaxScale and one standby MaxScale. The active instance will be assigned a virtual IP address provided by Corosync. This VIP will act as our gateway into the database cluster.

Installing Corosync/Pacemaker and MaxScale

There is an excellent guide on how to install Corosync/Pacemaker for CentOS 7 on the ClusterLabs website, you can follow it to set up your cluster. For this blog, we installed the clustering software, added hostnames for two nodes (node1 and node2) and configured the firewall to allow communication between the two nodes.

We’ll use the RPM package of MaxScale for this. Just download it and install it.

 sudo yum install ./maxscale-beta-2.0.0-1.centos.7.x86_64.rpm 

Configuring Corosync/Pacemaker

We need to authenticate on all the nodes and start the cluster. The hacluster user is created when corosync is installed.

 sudo pcs cluster auth -u hacluster node1 node2
 sudo pcs cluster start --all 

The first thing that we have to do is to disable the STONITH feature of corosync. Simply put, it’s not good for testing two node setups. Read the STONITH Chapter of the ClusterLabs guide for more details on how to configure it properly for a production environment.

 sudo pcs property set stonith-enabled=false 

The next step is to create a cluster setup. Node1 and node2 are the hostnames of the two nodes where are are deploying the MaxScale instances.

 sudo pcs cluster setup --name maxscale_cluster node1 node2 

Then we add MaxScale as a resource. We’ll tell pcs to use MaxScale via systemd and monitor the resource every second.

 sudo pcs resource create maxscale systemd:maxscale op monitor interval=1s 

Clone the maxscale resource so that all nodes will keep an instance of MaxScale started. This will create the maxscale-clone resource.

 sudo pcs resource clone maxscale 

The next thing to do is to create the virtual IP address.

 sudo pcs resource create clusterip ocf:heartbeat:IPaddr2 ip=192.168.56.220 cidr_netmask=24 op monitor interval=20s 

Configure the cluster so that failed resources stay on the node where they were moved to.

 sudo pcs resource meta clusterip migration-threshold=1 failure-timeout=60s resource-stickiness=100
sudo pcs resource meta maxscale-clone migration-threshold=1 failure-timeout=60s resource-stickiness=100 

We’ll also add a colocation constraint for the VIP. This constraint just tells that the clusterip resource requires a working maxscale-clone resource. This way we’ll always have the VIP and MaxScale running on the same node.

 sudo pcs constraint colocation add clusterip with maxscale-clone INFINITY 

Here’s a simple configuration file that we’ll use for testing the setup. Copy it into both server’s /etc folder. Notice that the <INSERT NAME HERE> part should be replaced with the hostname of the node in question. For this test, the values for it are node1 and node2. We’ll later see it as the connection string given to the client and we’ll use it to distinguish the nodes from each other.

 [maxscale]
threads=2

[server1]
type=server
address=192.168.56.1
port=3000
protocol=MySQLBackend

[server2]
type=server
address=192.168.56.1
port=3001
protocol=MySQLBackend

[server3]
type=server
address=192.168.56.1
port=3002
protocol=MySQLBackend

[server4]
type=server
address=192.168.56.1
port=3003
protocol=MySQLBackend

[MySQL Monitor]
type=monitor
module=mysqlmon
servers=server1,server2,server3,server4
user=maxuser
passwd=maxpwd
monitor_interval=1000

[Read-Write Service]
type=service
router=readwritesplit
servers=server1,server2,server3,server4
version_string=5.5.5-10.1.14 <INSERT NAME HERE>
user=maxuser
passwd=maxpwd

[MaxAdmin Service]
type=service
router=cli


[Read-Write Listener]
type=listener
service=Read-Write Service
protocol=MySQLClient
port=4006

[MaxAdmin Listener]
type=listener
service=MaxAdmin Service
protocol=maxscaled
socket=default 

Testing the Cluster

We start off with a fully functional cluster.

 [user@node2 ~]$ sudo pcs resource

clusterip (ocf::heartbeat:IPaddr2): Started node1

Clone Set: maxscale-clone [maxscale]

    Started: [ node1 node2 ]

    Stopped: [ localhost.localdomain ]

[user@node2 ~]$ sudo pcs resource show clusterip

Resource: clusterip (class=ocf provider=heartbeat type=IPaddr2)

 Attributes: ip=192.168.56.220

 Meta Attrs: migration-threshold=1 failure-timeout=60s resource-stickiness=100

 Operations: start interval=0s timeout=20s (clusterip-start-interval-0s)

             stop interval=0s timeout=20s (clusterip-stop-interval-0s)

             monitor interval=5s (clusterip-monitor-interval-5s)

[user@node2 ~]$ mysql -u maxuser -pmaxpwd -h 192.168.56.220 -P 4006
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 29651
Server version: 10.1.14 node1 mariadb.org binary distribution 

We can see that the MaxScale-clone resource is running on both nodes. The VIP is currently assigned to node1 and all queries to the clusterip resource address at 192.168.56.220 go through node1. Next, we’ll shut down the node1 cluster.

 sudo pcs cluster stop node1 

Now, when we execute the same commands that we executed before, we’ll see that the VIP has moved to the standby MaxScale on node2.

 [user@node2 ~]$ sudo pcs resource
clusterip (ocf::heartbeat:IPaddr2): Started node2
Clone Set: MaxScale-clone [MaxScale]
    Started: [ node2 ]
    Stopped: [ localhost.localdomain node1 ]
[user@node2 ~]$ mysql -u maxuser -pmaxpwd -h 192.168.56.220 -P 4006
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 29651
Server version: 10.0.24 node2 mariadb.org binary distribution 

And that’s it, we have a highly available active-standby MaxScale setup ready for testing.

Summary

There are other ways to achieve high availability other than using cluster management software, although it is probably the most common. Keepalived is one of the popular alternatives for Corosync/Pacemaker. It’s similar but it provides a somewhat simpler approach. There are plenty of good tutorials and articles on keepalived but personally I like the DigitalOcean one. It’s simple and clean and it could be applied to MaxScale with small changes.

In the future (and perhaps even now), this could be handled with containers and automatically scaling clusters. Containers work quite nicely with the MaxScale ideology of statelessness and being able to scale up as many MaxScale instances as needed at a moment's notice is an idea I like.

Stay tuned for the second part of this blog post where we make the whole cluster highly available.

About Markus Mäkelä

Markus Mäkelä is a Software Engineer working on MariaDB MaxScale. He graduated from Metropolia University of Applied Sciences in Helsinki, Finland.

Read all posts by Markus Mäkelä