High Availability with Multi-Source Replication in MariaDB Server

Multi-source replication is a handy way to simplify your high availability database setup – especially as compared to the regular MySQL replication approach. Let’s look at how it works.

(Note: This post is a revision of one from 2014, with minor updates made.)

The Basics

For our example scenario, we’ll use a three-server topology, which is commonly employed for simple failover. (I used the same setup in my post on how to enable GTIDs in MariaDB Server.) A is the active master, B is a standby master set up to replicate from A, and C is a multi-purpose slave replicating from A. I also set up A to replicate from B. This way, if A fails, the system will start writing to B, and once A comes back up it will recover all the transactions from B through replication. Typically this setup is used in combination with a tool like Master High Availability Manager (MHA) to migrate the slave from one master to the next.

For this particular case, assuming that the application(s) write to the active master only, we’re going to set the domain ID for all servers to 1.

SET GLOBAL gtid_domain_id = 1;

We’ll do the same in the my.cnf files:

gtid-domain-id=1

Set Up the Slave to Replicate from Two Masters

With the basics taken care of, we now will set up server C to replicate from both server A (active master) and server B (standby master). Keep in mind that now each transaction that reaches server C will come from two different sources. For example, if you issue an INSERT on the active master A, it will replicate to C and B. Since C is also B’s slave, the same transaction will replicate from B to C. By default, C will apply both transactions, which is redundant and can lead to data inconsistencies and errors. To avoid this, it is necessary to set the gtid_ignore_duplicates variable to ON.

SET GLOBAL gtid_ignore_duplicates=ON;

And do the same in the my.cnf files:

gtid-ignore-duplicates=ON

This way, when C receives the same transaction through two different slave connections, it will verify the GTID and only apply it once.

The next step is to set the second slave connection. In my previous article, I set up the gtid_slave_pos global variable. Since server C is already replicating successfully from server A, this variable already has the proper value and there is no need to set it up manually. We only need to define the new connection using CHANGE MASTER TO:

CHANGE MASTER 's-102' TO  MASTER_HOST='192.168.56.102', 
 MASTER_USER='repl', MASTER_PASSWORD='repl',
 master_use_gtid=slave_pos;

Notice that we add the label ‘s-102’ after the MASTER reserved word. This string identifies the second replication connection. This identifier needs to be used for every replication command. So, to start replicating from server B, you would use START SLAVE like this:

START SLAVE 's-102';

In the same way, to check the status of this replication connection you would use:

SHOW SLAVE 's-102' STATUSG

You can also issue the commands to all replication connections by using the keyword ALL; for example:

SHOW ALL SLAVES STATUSG

STOP ALL SLAVES;

START ALL SLAVES;

Notice that when using the keyword ALL, you also have to use SLAVES (plural form). The STOP and START commands will show as many warnings as the number of replication connections:

MariaDB [(none)]> START ALL SLAVES;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

MariaDB [(none)]> SHOW WARNINGS;
+-------+------+-----------------------+
| Level | Code | Message               |
+-------+------+-----------------------+
| Note  | 1937 | SLAVE 's-102' started |
| Note  | 1937 | SLAVE '' started      |
+-------+------+-----------------------+
2 rows in set (0.00 sec)

After completing the setup, the resulting topology will look similar to this:

+-----+         +-----+
|  A  | <- - -> |  B  |
+-----+         +-----+
   |               |
                  /
                 /
     V           V
        +-----+
        |  C  |
        +-----+

Failures

If server A goes away or the connection between server A and server C fails, server C will still receive all the transactions through the alternate replication connection to server B (labeled ‘s-102’). Once server A comes back online again, and/or server C can reconnect to it, server C will compare all the GTIDs coming from server A and ignore those which have already been applied.

The advantage of using this approach over the traditional single-source replication is that now there is no need to re-source the slaves. When the active master fails and the applications start writing to the alternative masters, the slaves will continue to replicate from it, simplifying the management of failover scenarios.