Multi-Master Ring Replication
The terms master and slave have historically been used in replication, and MariaDB has begun the process of adding primary and replica synonyms. The old terms will continue to be used to maintain backward compatibility - see MDEV-18777 to follow progress on this effort.
Contents
What is Multi-Master Ring Replication
Multi-Master "Ring" Replication means that you have two or more masters where each master is replicating its data to another master asynchronously. This is not to be confused with MariaDB Galera Cluster which is a virtually synchronous multi-primary cluster for MariaDB.
The benefit of asynchronous replication compared to Galera Cluster, is that Ring Replication is resilient against bad network connections between the master servers. If a connection goes down, all masters will continue to serve its clients locally and data will automatically be synced when the link is available again.
The following picture shows one of the more advanced Multi-Master setups that is resilient against any master going down but can also handle 'human failures', like an accidental drop table, thanks to the addition of delayed slaves.
One should setup replication on each master like one does in standard MariaDB replication. The replication setup among the masters should be a ring. In other words, each master should replicate to one other master and each master should only have one other master as a slave.
Each master can of course have one or more normal slaves. A master could also be a slave of another master that is not in the ring setup.
All MariaDB servers support Multi-Master Ring Replication. In fact, when MySQL replication was originally designed around the year 2000, it was to be a Multi-Master Ring Replication solution for Yahoo to replicate from the East Coast to the West Coast.
Configuring the Masters
First, follow the instructions in setup replication. The main thing to remember is to use the master_use_gtid=current_pos option for CHANGE MASTER.
The main things that are different for Multi-Master Ring Replication are:
- Give every master and slave in the replication setup a unique server_id. This can be a number from 1 to 4294967295 or 1-255 if one is using uuid_short(). It is a good practice to ensure that you do not have any servers in your system with the same server_id!
- Use global transaction id (as described above)
- Give each master a unique gtid_domain_id. This will allow replication to apply transactions from a different master in parallel independent from other masters.
Add the following into your my.cnf file for all masters and restart the servers.
[mariadb] # Replace the following with a unique ID. All slaves of this master should have the same # gtid_domain_id to allow easy failover to a slave if needed gtid_domain_id=1 # # Let us assume there will never be more than 10 masters in a Multi-Master ring setup auto_increment_increment=10 # Set this to a different value 1-10 for each master. Could be the same as gtid_domain_id # This is to ensure that all masters generate different values for AUTO_INCREMENT keys. auto_increment_offset=1 # # The following is needed to ensure the ALTER TABLE on another master will not # break ring replication slave_type_conversions=ALL_NON_LOSSY,ALL_LOSSY # # We cannot use semi-sync in Ring Replication as the masters need to be resilient against # bad connections rpl_semi_sync_master_enabled=0 # # We have to log updates from other masters to the binary log. log_slave_updates
Limitations when using Ring Replication
- MariaDB does not yet support conflict resolution for conflicting changes. It is up to the application to ensure that there is never a conflicting insert/update/delete between the masters. The easiest setup is having each master server work on a different database or table. If not, one must:
- Ensure you have an id (master-unique-id) for each row that unequally identifies the master who is responsible for this row. This should preferably be short and part of the primary key in each table. A good value for this would be the
gtid_domain_id
as this is unique for each local cluster. - Never insert rows with
PRIMARY KEY
orUNIQUE KEY
values that can be same on another master. This can be avoided by- Have the master-unique-id part of all primary and unique keys.
- In case of AUTO_INCREMENT keys, have a different value for auto_increment_offset on each master.
- Use uuid_short() to generate unique values, like in
create table t1 (a bigint unsigned default(uuid_short()) primary key)
. Note that if one is using uuid_short() in Multi-Master ring replication, one can only useserver_id
in the range 1-255!
- Ensure that UPDATE and DELETE on each master only update rows generated by this master.
- Ensure you have an id (master-unique-id) for each row that unequally identifies the master who is responsible for this row. This should preferably be short and part of the primary key in each table. A good value for this would be the
- If several masters are constantly generating and updating rows for common tables, one has to be extra careful with
ALTER TABLE
to ensure that any change one does will not cause conflicts when theALTER TABLE
is replicated to other servers. In particular one has to ensure that all masters and their slaves are configured withslave_type_conversions=ALL_NON_LOSSY,ALL_LOSSY
. - The
server_id
should be unique for each server. One should not change theserver_id
of an active master, as the ID is used by the master to recognize its own events and stop them from replicating endlessly around the ring (seereplicate_same_server_id
).
How does Multi-Master Ring Replication work
- The main difference between Multi-Master Ring Replication and normal replication is that a change done by a master will eventually replicate back to it. When this happens, the master will see that the binary log event has the same server_id as the master has and will ignore the event. This is why it is critical to ensure that all server_id's are unique and that one does not change the server id.
How to resolve things if they go wrong in Multi-Master Ring Replication
When used correctly, Multi-Master Ring Replication is as resilient to errors as normal MariaDB master-slave replication. If the connection goes down, the replication will stop and will automatically continue when the connection resumes.
What to do when one of the masters dies and has to be replaced by a slave.
- Ensure that the slave is up to date (has digested all relay events).
- Check if there are any events on the old master that have not been sent to the slave. You can use mariadb-binlog to read the old master binary log files and apply them to the slave.
- You can now treat the slave as a new master and put it back in the replication ring. The new master will use its replication GTID position to continue replication from the other master in the ring.
If the slave is not up to date and one cannot access any information of the old master, then one can continue the following way:
- Enable the option
--gtid-ignore-duplicates
on the servers. - Add the slave to the replication ring.
- The two masters (one of which is the old slave now added to the ring) will each replicate the events they are missing from one another. The
--gtid-ignore-duplicates
option is needed to allow the two masters in the ring to start replicating from each other when each server is ahead of the other in one domain and behind in another.
Error applying events
As long as each master handles their own set of data, as described above, there should not be any conflicting data coming from the other master.
If there are conflicts, one should resolve them as one resolves issues with normal replication. The most common way to solve issues is to skip the conflicting log events with SET GLOBAL SQL_SLAVE_SKIP_COUNTER.
Handling duplicate key errors and other conflicts
If things are setup correctly as described earlier, one should never get duplicate key errors in Multi-Master ring replication. Any duplicate key error or data mismatch is usually an application error where it inserts/updates or deletes something it should not have the right to do.
To fix this:
- Use SET GLOBAL SQL_SLAVE_SKIP_COUNTER to skip the error.
- Use
mariadb-binlog --base64-output=decode-rows --verbose --start-position=# binlog-name
to see what is missing and apply the changes missing on the server (minus the conflict).
Multi-Master Ring Replication through slaves
An alternative setup to use for Multi-master ring replications is to replicate to the other masters through slaves. The following setup shows how this can be done.
Benefits of replication through slaves
- Slightly less load on the master
- The slave will never be out of sync compared to other master. This makes failover from master to slave much easier.
- One can use semisynchronous-replication between master and slave.
- Slave configuration is consolidated to one place. That is, if a master's immediate slave and replicating master should share the same configuration (e.g. rewrite rules, filters, etc), only the immediate slave needs them configured, as the replicating master will pull in the effects inherently.
Disadvantages of replication through slaves
- There will be a slightly longer delay for the data to hit the next master as it has to go trough the slave. This can be notable if there is a very large transaction executed on the master.
- If the master OR the slave dies, the replication to other masters will stop.
- A replicating master is subject to the configuration of a slave (e.g. transactions may be incorrectly filtered out).
Setting things up
- Setup is identical to Multi-Master Ring Replication, except that rpl_semi_sync_master_enabled=0 is not required.
What to do when one of the masters dies and has to be replaced by a slave.
- Promote the slave to be the new master (remember to disable replication from the promoting slave to the failed master). Note MaxScale can do this step automatically
- Move replication from the other master's slave to the new master.
- Add a new slave for the newly promoted master (note in some cases, the failed master can be re-used as the slave if recovered, e.g. for semi-sync setups, the old master can be re-used as the slave if re-started with
--init-rpl-role=SLAVE
during recovery; and for non-semi-synchronous setups, option CHANGE MASTER TO MASTER_DEMOTE_TO_SLAVE=1 can be used). - Move replication from the other master to the new slave.