MariaDB MaxScale High Availability: Active-Standby Cluster

High availability is one of the most important aspects of running a business. Whether it is a simple app for a small group of users or a large service used by millions, downtime costs a lot of money and ruins everyone’s day. No matter how hard we try, downtime is inevitable. The easiest way to handle downtime is to be prepared for it and design your infrastructure to be highly available from the start.

From a HA point of view, MariaDB MaxScale is quite easy to deploy in an Active-Standby setup. It can be treated as a simple resource (it either works or not) due to the algorithmic monitoring employed by MaxScale. This method of monitoring guarantees that all MaxScale instances will act the same way and route traffic to the right server. Both traditional master-slave clusters and Galera clusters can be used with MaxScale in an HA setup but for this blog, we’ll focus on a MariaDB 10.1 master-slave cluster.

In this blog post, we deploy MaxScale in an Active-Standby style setup with Corosync/Pacemaker. This setup was tested with CentOS 7 using MaxScale 2.0.1 and MariaDB 10.1.14.

We’ll cover the topic of backend database HA in a followup blog post.

Cluster Layout

The high availability cluster we are creating will have one active MaxScale and one standby MaxScale. The active instance will be assigned a virtual IP address provided by Corosync. This VIP will act as our gateway into the database cluster.

Installing Corosync/Pacemaker and MaxScale

There is an excellent guide on how to install Corosync/Pacemaker for CentOS 7 on the ClusterLabs website, you can follow it to set up your cluster. For this blog, we installed the clustering software, added hostnames for two nodes (node1 and node2) and configured the firewall to allow communication between the two nodes.

We’ll use the RPM package of MaxScale for this. Just download it and install it.

 sudo yum install ./maxscale-beta-2.0.0-1.centos.7.x86_64.rpm

Configuring Corosync/Pacemaker

We need to authenticate on all the nodes and start the cluster. The hacluster user is created when corosync is installed.

 sudo pcs cluster auth -u hacluster node1 node2
 sudo pcs cluster start --all

The first thing that we have to do is to disable the STONITH feature of corosync. Simply put, it’s not good for testing two node setups. Read the STONITH Chapter of the ClusterLabs guide for more details on how to configure it properly for a production environment.

 sudo pcs property set stonith-enabled=false

The next step is to create a cluster setup. Node1 and node2 are the hostnames of the two nodes where are are deploying the MaxScale instances.

 sudo pcs cluster setup --name maxscale_cluster node1 node2

Then we add MaxScale as a resource. We’ll tell pcs to use MaxScale via systemd and monitor the resource every second.

 sudo pcs resource create maxscale systemd:maxscale op monitor interval=1s

Clone the maxscale resource so that all nodes will keep an instance of MaxScale started. This will create the maxscale-clone resource.

 sudo pcs resource clone maxscale

The next thing to do is to create the virtual IP address.

 sudo pcs resource create clusterip ocf:heartbeat:IPaddr2 ip= cidr_netmask=24 op monitor interval=20s

Configure the cluster so that failed resources stay on the node where they were moved to.

 sudo pcs resource meta clusterip migration-threshold=1 failure-timeout=60s resource-stickiness=100
sudo pcs resource meta maxscale-clone migration-threshold=1 failure-timeout=60s resource-stickiness=100

We’ll also add a colocation constraint for the VIP. This constraint just tells that the clusterip resource requires a working maxscale-clone resource. This way we’ll always have the VIP and MaxScale running on the same node.

 sudo pcs constraint colocation add clusterip with maxscale-clone INFINITY

Here’s a simple configuration file that we’ll use for testing the setup. Copy it into both server’s /etc folder. Notice that the <INSERT NAME HERE> part should be replaced with the hostname of the node in question. For this test, the values for it are node1 and node2. We’ll later see it as the connection string given to the client and we’ll use it to distinguish the nodes from each other.






[MySQL Monitor]

[Read-Write Service]
version_string=5.5.5-10.1.14 <INSERT NAME HERE>

[MaxAdmin Service]

[Read-Write Listener]
service=Read-Write Service

[MaxAdmin Listener]
service=MaxAdmin Service

Testing the Cluster

We start off with a fully functional cluster.

 [user@node2 ~]$ sudo pcs resource

clusterip (ocf::heartbeat:IPaddr2): Started node1

Clone Set: maxscale-clone [maxscale]

    Started: [ node1 node2 ]

    Stopped: [ localhost.localdomain ]

[user@node2 ~]$ sudo pcs resource show clusterip

Resource: clusterip (class=ocf provider=heartbeat type=IPaddr2)

 Attributes: ip=

 Meta Attrs: migration-threshold=1 failure-timeout=60s resource-stickiness=100

 Operations: start interval=0s timeout=20s (clusterip-start-interval-0s)

             stop interval=0s timeout=20s (clusterip-stop-interval-0s)

             monitor interval=5s (clusterip-monitor-interval-5s)

[user@node2 ~]$ mysql -u maxuser -pmaxpwd -h -P 4006
Welcome to the MariaDB monitor.  Commands end with ; or g.
Your MySQL connection id is 29651
Server version: 10.1.14 node1 binary distribution

We can see that the MaxScale-clone resource is running on both nodes. The VIP is currently assigned to node1 and all queries to the clusterip resource address at go through node1. Next, we’ll shut down the node1 cluster.

 sudo pcs cluster stop node1

Now, when we execute the same commands that we executed before, we’ll see that the VIP has moved to the standby MaxScale on node2.

 [user@node2 ~]$ sudo pcs resource
clusterip (ocf::heartbeat:IPaddr2): Started node2
Clone Set: MaxScale-clone [MaxScale]
    Started: [ node2 ]
    Stopped: [ localhost.localdomain node1 ]
[user@node2 ~]$ mysql -u maxuser -pmaxpwd -h -P 4006
Welcome to the MariaDB monitor.  Commands end with ; or g.
Your MySQL connection id is 29651
Server version: 10.0.24 node2 binary distribution

And that’s it, we have a highly available active-standby MaxScale setup ready for testing.


There are other ways to achieve high availability other than using cluster management software, although it is probably the most common. Keepalived is one of the popular alternatives for Corosync/Pacemaker. It’s similar but it provides a somewhat simpler approach. There are plenty of good tutorials and articles on keepalived but personally I like the DigitalOcean one. It’s simple and clean and it could be applied to MaxScale with small changes.

In the future (and perhaps even now), this could be handled with containers and automatically scaling clusters. Containers work quite nicely with the MaxScale ideology of statelessness and being able to scale up as many MaxScale instances as needed at a moment’s notice is an idea I like.

Stay tuned for the second part of this blog post where we make the whole cluster highly available.