High Availability on AWS with MariaDB Galera Cluster and MariaDB MaxScale

spacer

Running MariaDB on AWS? You have a number of different ways to accomplish this from setting it up yourself to using an out of the box option offered by AWS such as RDS for MariaDB that relies on the community MariaDB version. But when you want something that is highly available, RDS is not very effective. For one thing, this is based on disk-level replication, the replica is passive and has to recover at failover, and failover will be slow. Secondly, as the replica is passive it cannot be used for scale-out, so if you want to scale out your reads, which is possible, you need another read-only replica. Thirdly, even if RDS can provide a scale-out replica, scaling out reads like this is not in any way automatic or transparent, instead the application has to deal with this.

Deploying an Enterprise MariaDB service yourself on Amazon AWS EC2 instances overcomes all those limitations, you get high availability and read scale-out from the same set of servers, and load balancing and failover are fully automated and transparent to the application. In this blog, we will take a detailed look at how MariaDB Enterprise Platform can be deployed for High Availability in Amazon AWS.

MariaDB Enterprise Platform is a comprehensive database solution delivering additional stability, availability, security and scalability for production workloads. MariaDB Enterprise Platform includes MariaDB Enterprise Server, MariaDB MaxScale, MariaDB Enterprise Cluster (powered by Galera) as well as tools and services that deliver peace of mind to organizations running mission critical applications.

Amazon Availability Zones and Regions

Looking at Amazon AWS specifically, there are two hierarchies to deal with, Regions and Availability Zones. A Region is a geographically separate data center , and you may distribute your load across different regions. There are different costs associated with different regions so if there is a high load you are managing, you might want to take cost into the equation. An Availability Zone (AZ) is a set of server resources, including power, cooling, networking etc. and there are multiple Availability Zones in each Region.

You get higher availability by distributing your load across regions, but that is also more costly and is also less performant. Transferring data between regions is more costly and slower than between Availability Zones. In this blog, we are going to look at setting up a MariaDB Galera Cluster with replication between Availability Zones in a single region.

Overall architecture

MariaDB Galera Cluster has to have at least three servers for split-brain detection to work, and in addition to this we need two servers for MariaDB MaxScale, we want to make sure that MariaDB MaxScale isn’t a Single Point Of Failure (SPOF). It will look something like this

Galera Cluster architecture with MariaDB MaxScale

 

As there are two instances of MariaDB MaxScale, we need to failover between these also, and this is handled by the MariaDB Connector. So we have two levels of failover here, MariaDB MaxScale between MariaDB Enterprise Galera Server instances and MariaDB Connector between MariaDB MaxScale instances.

Setting up the infrastructure in AWS

First I will set up the servers in AWS by using the AWS console, but I will not show all the details, only the stuff that might be somewhat different from the usual. I will now create the first EC2 server, so go to EC2 and click on the Launch instance button

Set up infrastructure in AWS: Launch instance

 

The next step is to name the instance, I’ll call it galera1 as this server will host the first MariaDB Galera Cluster instance followed by selecting this as a RedHat instance. I choose the t3.micro instance size as this is a demo and then you have to either select an existing Key pair or alternatively click on Create new key pair and follow the instructions.

The next step is Network settings and maybe you don’t use this much and go with the default, but now you need to click Edit to select the Availability Zone. The first step here is to select the subnet, this might sound contradictory but it is here that you select the Availability Zone, as each Availability Zone is associated with a subnet. So click the down arrow to the right of No preference

Network settings - image 1

A list of subnets with associated subnets is shown now, and I select the subnet for the eu-north-1a AZ

Network settings: Image 2

I also take the opportunity to create a new Security group and name it galera-cluster. Among the Inbound security group rules I keep the Source type to Anywhere as this is a demo, in a production environment this depends on the location of the application and if this is on a fixed address or on AWS, so you want to adjust this setting as appropriate. With that we are ready to create the instance, so click on the Launch instance button. After a few dialogs, you will see your instance status

Instance status

Now wait a bit until the instance is up and running and the Status check is changed from Initializing to something along the lines of 3/3 checks passed and then we have an instance running, the next step is to set up for more instances, maxscale1 and maxscale2 (in my case in AZ eu-north-1a and  eu-north-1c), galera2 (in eu-north-1b) and galera3 (in eu-north-1c).

There is one more thing to do, before we continue, and that is to massage the Security Group we used above. As that stands now, it only allows SSH connections between the servers but we need to allow MariaDB Client connections at least, and in addition to that we have to allow Galera traffic between the three Galera servers. We will allow all that for all servers now, as they use the same Security group, even though the MaxScale servers don’t need the Galera ports open, but that is not a big issue as we will open these ports only for our servers anyway, not for traffic from the outside.

Go to the AWS console and in the menu to the left click on Security groups and then on the Security group ID that belongs to the galera-cluster Security group. We only need to worry about Inbound rules so click on the Edit inbound rules button and in the new form click Add rule and fill them out until you have something like this

Edit inbound rules

Note in particular that the source for the rules you add is set to the same Security group as we are working on here. When this is in place, click the Save rules button and you are set.

Once the EC2 servers are up and running you can try connecting to them and for this, you first need the private key file that you downloaded in the Key Pair section that you set up above when creating the servers. Then you can connect using SSH from Linux or Putty if you are connecting from a Windows machine, in which case you need to convert the key file to a format that Putty understands, using puttygen. The username you use when connecting is ec2-user and you do not need a password, only the private key.

If you are using Linux, you must set the permissions of the private key file to 600 before using ssh to connect to a server, for example

$ chmod 600 myawkkey.ppk
$ ssh -i myawkkey.ppk \
  ec2-user@ec2-16-21-12-20.eu-north-1.compute.amazonaws.com

Preparing servers

For this example to work smoothly, I decided to set up a local hosts file on all the servers, defining the galera1, galera2, galera3, maxscale1 and maxscale2 servers. In my case, I added this to /etc/hosts

172.31.29.75    galera1
172.31.32.109   galera2
172.31.0.82     galera3
172.31.24.251   maxscale1
172.31.11.71    maxscale2

With this in place, we can set up the repository to install MariaDB Enterprise Server and MariaDB MaxScale and for this, you need to locate your MariaDB Enterprise download token and the instructions for this are here: https://mariadb.com/docs/server/deploy/token/

Once you have the token, follow the instructions further down on the same page, and do this on all the servers created in the steps above, for your reference this is what I did on my galera1 server to install MariaDB Enterprise Server 10.6 (although you could also just as easily choose the recently released version MariaDB Enterprise Server 11.4)

$ curl -LsSO \
  https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup
$ chmod +x mariadb_es_repo_setup
$ sudo ./mariadb_es_repo_setup \
  --token="xxxxxxxxxxxxx" –apply --skip-tools \
  --mariadb-server-version="10.6"

With this done on all of our servers we are now ready to install MariaDB.

Installing and configuring MariaDB Galera Cluster

Let’s start on the galera1 server, we to install MariaDB Server and Backup, Galera is included with MariaDB Server, so the appropriate command to run on galera1, galera2 and galera3 is

$ sudo yum install -y MariaDB-server MariaDB-backup

With this in place, we can configure MariaDB Enterprise Server and here I will only configure what is necessary for our cluster to work. Let’s get started with the MariaDB Server configuration file, which is in /etc/my.cnf.d/server.cnf on Red Hat / CentOS. There are two things we are to fix there, one is to enable MariaDB MaxScale to be used as a proxy, so we set this up in the [mysqld] section. The value is the IP address of the MariaDB MaxScale servers that will connect to the cluster, but one can also specify a netmask or an asterisk. We will use the latter as we rely on AWS security to ensure that only the servers in the given Security group can access the cluster

[mysqld]
proxy-protocol-networks=*

Secondly, you also have to configure Galera, and this is the minimum you need to set up for that to work

[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_enterprise_smm.so
wsrep_cluster_address=gcomm://galera1,galera2,galera3
binlog_format=row
innodb_autoinc_lock_mode=2

I would also recommend that you set up error logging in the [mysqld] section, using the node name as a name, for example

log_error=galera1

With all this in place, we can start the cluster. To do this, we have to bootstrap the first node in the cluster using the galera_new_cluster command, installed with MariaDB Server. Let’s start galera1 first

$ sudo galera_new_cluster

Test that the server is running by connecting to it

$ sudo mysql -u root

And check that Galera is active

MariaDB> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 1     |
+--------------------+-------+
1 row in set (0.001 sec)

If you get 0 as a result, then Galera wasn’t configured correctly and you should check again, else you can start MariaDB Enterprise Server on galera2 and galera3, in the usual way

$ sudo systemctl start mariadb

If you run SHOW GLOBAL STATUS like above again, this time the result should be 3. If there are issues starting MariaDB Server on nodes galera2 or galera3 check your Security group settings one more time and check the MariaDB Error log in /var/lib/mysql.

Testing the cluster

With this in place, we are ready to test that the cluster works appropriately and begin with creating a test user and a database. On the galera1 node run

$ sudo mariadb -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.6.19-15-MariaDB-enterprise MariaDB Enterprise Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB> CREATE DATABASE appdb;
Query OK, 1 row affected (0.007 sec)

MariaDB> CREATE USER 'appuser'@'%' IDENTIFIED BY 'OGLom+3J';
Query OK, 0 rows affected (0.008 sec)

MariaDB> GRANT ALL ON appdb.* TO 'appuser'@'%';
Query OK, 0 rows affected (0.008 sec)

And with this let’s try connecting as the user we just created

$ mariadb -u appuser appdb -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 23
Server version: 10.6.19-15-MariaDB-enterprise MariaDB Enterprise Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB> CREATE TABLE customer(cust_id INTEGER NOT NULL PRIMARY KEY,
    -> cust_name VARCHAR(100) NOT NULL);
Query OK, 0 rows affected (0.013 sec)

MariaDB> INSERT INTO customer VALUES(1, 'John Doe'),(2, 'Jane Doe');
Query OK, 2 rows affected (0.004 sec)
Records: 2  Duplicates: 0  Warnings: 0

Installing and configuring MariaDB MaxScale

First, we need to install MariaDB MaxScale from the repo, using yum like this on maxscale1

$ sudo yum install -y maxscale

Next, it is time to configure MariaDB MaxScale for our cluster. MariaDB MaxScale can be configured and managed in a number of different ways, from just using the configuration files to online configuration using the command line, the REST API and a built-in GUI. Here we will mainly be using the configuration file for the initial configuration, which is required anyway and the CLI, combined with a few samples of the API.

I am assuming that you have the /etc/hosts file set up as described above. The MariaDB MaxScale configuration file is /etc/maxscale.cnf, so let’s have a look at that on maxscale1 now. The first thing to do is to remove all default configuration settings, except the [maxscale] section, and then we add the servers

[server1]
type=server
address=galera1
port=3306
priority=1
proxy_protocol=true

[server2]
type=server
address=galera2
port=3306
priority=2
proxy_protocol=true

[server3]
type=server
address=galera3
port=3306
priority=3
proxy_protocol=true

Most of this should be pretty clear, the priority determines which server is selected as the primary, and a node running with the lowest priority is picked. Using priority isn’t strictly necessary, but it is often useful. After this we set up a monitor that checks the status of these servers, the monitor in this case will be galeramon. To make sure that server priorities are used, this has to be enabled.

[galera-cluster]
type=monitor
module=galeramon
servers=server1,server2,server3
use_priority=true
user=maxuser
password="6MT7|EmZ"
monitor_interval=2s

I will reference this as the cluster further on, so I name the monitor like the cluster itself. The next step is to configure the service, using the readwritesplit router module.

[galera-cluster-service]
type=service
router=readwritesplit
cluster=galera-cluster
user=maxuser
password="6MT7|EmZ"

Finally, we need to configure a listener that listens on connections, authenticates users, passes queries to the router and handles results coming back

[galera-cluster-listener]
type=listener
service=galera-cluster-service
protocol=mariadbprotocol
port=3306

There is one more thing we have to do before we can start MariaDB MaxScale and that is to create the MariaDB Enterprise Server users appropriate for MariaDB MaxScale. There is an open source tool that reads the MariaDB MaxScale configuration file and generates these for you (see https://sourceforge.net/projects/maxgrant/) but in this case, you need this (note that the same grants are necessary for the maxscale2 node also).

CREATE USER IF NOT EXISTS 'maxuser'@'maxscale1' IDENTIFIED BY '6MT7|EmZ';
GRANT SELECT ON mysql.user TO 'maxuser'@'maxscale1';
GRANT SELECT ON mysql.db TO 'maxuser'@'maxscale1';
GRANT SELECT ON mysql.tables_priv TO 'maxuser'@'maxscale1';
GRANT SELECT ON mysql.columns_priv TO 'maxuser'@'maxscale1';
GRANT SELECT ON mysql.procs_priv TO 'maxuser'@'maxscale1';
GRANT SELECT ON mysql.proxies_priv TO 'maxuser'@'maxscale1';
GRANT SELECT ON mysql.roles_mapping TO 'maxuser'@'maxscale1';
GRANT SHOW DATABASES ON *.* TO 'maxuser'@'maxscale1';
GRANT CONNECTION ADMIN ON *.* TO 'maxuser'@'maxscale1';

And finally, with this in place, we can start MariaDB MaxScale on maxscale1

$ sudo systemctl start maxscale

With MariaDB MaxScale running, check that the Cluster is running normally

Check that Cluster is running normally

Testing connecting to MariaDB MaxScale

With this in place, let us now try connecting using MariaDB MaxScale

$ mariadb -h maxscale1 -u appuser -p appdb
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.6.19-15-MariaDB-enterprise MariaDB Enterprise Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB> SELECT * FROM customer;
+---------+-----------+
| cust_id | cust_name |
+---------+-----------+
|       1 | John Doe  |
|       2 | Jane Doe  |
+---------+-----------+
2 rows in set (0.002 sec)

So we seem to be up and running now, but we still have to make MariaDB MaxScale itself highly available.

Setting up the second MariaDB MaxScale instance

To make MariaDB MaxScale itself highly available we first have to install the second instance of MariaDB MaxScale on maxscale2 like this

$ sudo yum install -y maxscale

MariaDB MaxScale has the ability to synchronize configuration between two servers and we are going to use this here. The basic configuration will initially be the same on both instances of MariaDB MaxScale, but as we are using server priorities, these need to be synced between the instances should we change them.

With MariaDB MaxScale installed on maxscale2, we can configure it but we should prepare for synchronization first. The way this works is by using a table in a named cluster, so we need a user and password for this. The table is always called maxscale_config and by default, it resides in the mysql database. Both instances of MaxScale need access to this table, so on galera1 we do this

$ sudo mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 25
Server version: 10.6.19-15-MariaDB-enterprise MariaDB Enterprise Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB> CREATE OR REPLACE USER 'maxsyncuser'@'maxscale1'
  IDENTIFIED BY '~5U0hN^G';
Query OK, 0 rows affected (0.008 sec)

MariaDB> CREATE OR REPLACE USER 'maxsyncuser'@'maxscale2'
  IDENTIFIED BY '~5U0hN^G';
Query OK, 0 rows affected (0.008 sec)

MariaDB> GRANT ALL ON mysql.maxscale_config TO 'maxsyncuser'@'maxscale1';
Query OK, 0 rows affected (0.008 sec)
MariaDB> GRANT ALL ON mysql.maxscale_config TO 'maxsyncuser'@'maxscale2';
Query OK, 0 rows affected (0.008 sec)

The next step is to configure MariaDB MaxScale on maxscale1 to use this, so open the file /etc/maxscale.cnf with an editor and edit the [maxscale] section to look like this

[maxscale]
threads=auto
config_sync_cluster=galera-cluster
config_sync_user=maxsyncuser
config_sync_password="~5U0hN^G"

After this restart MariaDB MaxScale on maxscale1

$ sudo systemctl restart maxscale

Now we can move back to maxscale2 and copy the /etc/maxscale.cnf file from  maxscale1  to /etc/maxscale.cnf on maxscale2 and then we can start MariaDB MaxScale on maxscale2

$ sudo systemctl restart maxscale

And then we can check that MariaDB MaxScale on maxscale2 has found the cluster

Check that MaraiDB MaxScale has found the cluster

 

Failing over primary server by modifying server priority

MariaDB MaxScale allows configuration to be changed online, without a restart, and this can be done either through the maxctrl utility, the REST API or the GUI, but here we are going to use the maxctrl utility, but first we will use the API to check the current value. As explained above we are using Server priority to determine which server is the Primary, so let’s see what the server priority is for the current primary, server1. There are a couple of ways of doing this and one way is to use the maxctrl tool, so on maxscale1 run

$ maxctrl show server server1

Which will show you all information for server1 and in the Parameters section you find the priority value

priority value

If you want to extract the specific priority value you would use the REST API instead and as this will return JSON the jq tool comes in handy, so we will use curl to get the data and then jq to parse it and get the priority value

$ curl -s -u admin:mariadb http://localhost:8989/v1/servers \
 | jq '.data[] | select(.attributes.name == "server1")' \
 | jq '.attributes.parameters.priority'
1

Here we get the priority for server1, and it is 1 as expected. Then we can set the priority for this to some higher value so we get a different primary, and we use the maxctrl utility for this and then use the API to check it

$ maxctrl alter server server1 priority=10
OK
$ curl -s -u admin:mariadb http://localhost:8989/v1/servers \
 | jq '.data[] | select(.attributes.name == "server1")' \
 | jq '.attributes.parameters.priority'
10

So now MariaDB MaxScale should have selected a different server as the primary, let’s have a look

MaxScale should have selected a different server as the primary

Then there is just one more thing to check, and that is to ensure that synchronization of the two MariaDB MaxScale instances work as expected, so pop over to maxscale2 and check the server status there, it should look like it does on maxscale1.

Failing over between MariaDB MaxScale instances

What remains now is to check how you can fail over MariaDB MaxScale itself, and this is best done with the connector in the application. The example we will look at here is using Java and MariaDB Connector/J

import java.sql.*;

public class TestAws {
   public static void main(String[] args) {
      try {
         Connection conn;
         Statement stmt;
         ResultSet rs;

         conn = DriverManager.getConnection(
           "jdbc:mariadb:sequential://maxscale1,maxscale2/appdb",
           "appuser", "OGLom+3J");
         stmt = conn.createStatement();
         for(int i = 0; i < 1000; i++) {
            try {
               rs = stmt.executeQuery(
                 "SELECT cust_id, cust_name FROM customer");
               while(rs.next()) {
                  System.out.println("id: " + rs.getString("cust_id")
                    + " name: " + rs.getString("cust_name"));
               }
               rs.close();
            } catch(SQLException e) {
               System.out.println("MariaDB Error: " + e.getMessage());
            }
            Thread.sleep(1000);
         }
      } catch(Exception e) {
         e.printStackTrace();
      }
   }
}

Here we rely on the JDBC driver to reconnect when one of the MariaDB MaxScale servers fail, all the application needs to do is to retry the failed operation. For more information on how to set up MariaDB Connector/J, see https://mariadb.com/kb/en/about-mariadb-connector-j/

Conclusion

You can get better high availability with MariaDB running on EC2 than when using AWS RDS for MariaDB. In addition, it doesn’t have to be more expensive as although you need an extra server, you get read scaleout and high availability built into the same set of servers. Failover times will also be super fast, so all in all, this should be a good setup.

Note that what I have configured above is the bare minimum to achieve High Availability with MariaDB Galera Cluster and MariaDB MaxScale, you will need to tune the MariaDB Galera Servers for performance and MariaDB MaxScale might need some tuning in terms of failover and performance.

Happy SQLing
/Karlsson