Multi-Master Ring Replication

The terms master and slave have historically been used in replication, and MariaDB has begun the process of adding primary and replica synonyms. The old terms will continue to be used to maintain backward compatibility - see MDEV-18777 to follow progress on this effort.

What is Multi-Master Ring Replication

Multi-Master "Ring" Replication means that you have two or more masters where each master is replicating its data to another master asynchronously. This is not to be confused with MariaDB Galera Cluster which is a virtually synchronous multi-primary cluster for MariaDB.

The benefit of asynchronous replication compared to Galera Cluster, is that Ring Replication is resilient against bad network connections between the master servers. If a connection goes down, all masters will continue to serve its clients locally and data will automatically be synced when the link is available again.

The following picture shows one of the more advanced Multi-Master setups that is resilient against any master going down but can also handle 'human failures', like an accidental drop table, thanks to the addition of delayed slaves.

One should setup replication on each master like one does in standard MariaDB replication. The replication setup among the masters should be a ring. In other words, each master should replicate to one other master and each master should only have one other master as a slave.

Each master can of course have one or more normal slaves. A master could also be a slave of another master that is not in the ring setup.

All MariaDB servers support Multi-Master Ring Replication. In fact, when MySQL replication was originally designed around the year 2000, it was to be a Multi-Master Ring Replication solution for Yahoo to replicate from the East Coast to the West Coast.

Configuring the Masters

First, follow the instructions in setup replication. The main thing to remember is to use the master_use_gtid=current_pos option for CHANGE MASTER.

The main things that are different for Multi-Master Ring Replication are:

  • Give every master and slave in the replication setup a unique server_id. This can be a number from 1 to 4294967295 or 1-255 if one is using uuid_short(). It is a good practice to ensure that you do not have any servers in your system with the same server_id!
  • Use global transaction id (as described above)
  • Give each master a unique gtid_domain_id. This will allow replication to apply transactions from a different master in parallel independent from other masters.

Add the following into your my.cnf file for all masters and restart the servers.

[mariadb]
# Replace the following with a unique ID. All slaves of this master should have the same
# gtid_domain_id to allow easy failover to a slave if needed
gtid_domain_id=1
#
# Let us assume there will never be more than 10 masters in a Multi-Master ring setup
auto_increment_increment=10
# Set this to a different value 1-10 for each master. Could be the same as gtid_domain_id
# This is to ensure that all masters generate different values for AUTO_INCREMENT keys.
auto_increment_offset=1
#
# The following is needed to ensure the ALTER TABLE on another master will not
# break ring replication
slave_type_conversions=ALL_NON_LOSSY,ALL_LOSSY
#
# We cannot use semi-sync in Ring Replication as the masters need to be resilient against
# bad connections
rpl_semi_sync_master_enabled=0
#
# We have to log updates from other masters to the binary log.
log_slave_updates

Limitations when using Ring Replication

  • MariaDB does not yet support conflict resolution for conflicting changes. It is up to the application to ensure that there is never a conflicting insert/update/delete between the masters. The easiest setup is having each master server work on a different database or table. If not, one must:
    • Ensure you have an id (master-unique-id) for each row that unequally identifies the master who is responsible for this row. This should preferably be short and part of the primary key in each table. A good value for this would be the gtid_domain_id as this is unique for each local cluster.
    • Never insert rows with PRIMARY KEY or UNIQUE KEY values that can be same on another master. This can be avoided by
      • Have the master-unique-id part of all primary and unique keys.
      • In case of AUTO_INCREMENT keys, have a different value for auto_increment_increment on each master.
      • Use uuid_short() to generate unique values, like in create table t1 (a bigint unsigned default(uuid_short()) primary key). Note that if one is using uuid_short() in Multi-Master ring replication, one can only use server_id in the range 1-255!
    • Ensure that UPDATE and DELETE on each master only updates rows generated by this master.
  • If several masters are constantly generating and updating rows for common tables, one has to be extra careful with ALTER TABLE to ensure that any change one does will not cause conflicts when the ALTER TABLE is replicated to other servers. In particular one has to ensure that all masters and their slaves are configured with slave_type_conversions=ALL_NON_LOSSY,ALL_LOSSY.
  • One should not change server_id for the lifetime of the server. If a master is replaced by a slave, one should change the slave's server_id to the master's before reconnecting replication. This is needed to ensure that the new master can ignore events created by the slaves old master.

How does Multi-Master Ring Replication work

  • The main difference between Multi-Master Ring Replication and normal replication is that a change done by a master will eventually replicate back to it. When this happens, the master will see that the binary log event has the same server_id as the master has and will ignore the event. This is why it is critical to ensure that all server_id's are unique and that one does not change the server id.

How to resolve things if they go wrong in Multi-Master Ring Replication

When used correctly, Multi-Master Ring Replication is as resilient to errors as normal MariaDB master-slave replication. If the connection goes down, the replication will stop and will automatically continue when the connection resumes.

What to do when one of the masters dies and has to be replaced by a slave.

  • Ensure that the slave is up to date (has digested all relay events).
  • Check if there are any events on the old master that have not been sent to the slave. You can use mariadb-binlog to read the old master binary log files and apply them to the slave.
  • Update the slaves server_id to the old masters server_id. This will ensure that the slave will ignore any old events from its upcoming master sent by the old master.
  • You can now treat the slave as a new master and put it back in the replication ring. Any events from the old master will be ignored.
  • If the new master comes up, set its server_id to the old slaves before connecting it the the new master.

In the unfortunate case where the slave is not up to date and one cannot access any information of the old master, then one can continue the following way:

  • Do not change the server_id of the slave!
  • Add the slave to the replication ring and wait until all the events that the old master sent to the ring are replicated to the slave. One can use mariadb-binlog to monitor the relay logs for this.
  • Stop the slave.
  • Change server_id to the old masters server_id.
  • Start the slave (as a new master)
  • Now you can allow applications to connect to the new master and do updates.
    • This could be allowed earlier if one can be sure that nothing previously sent to the ring will conflict with new changes sent to the new master.

Missing event after failover

There is a possibility that an event that was sent to the other masters but never was sent or applied to the slave.

In this case, the new master will be missing some information that exists on the other slaves. This data will be sent to the new master from the ring, but it will ignored as the server_id's are the same.

In a failover event, is it good to monitor the binary log of the new masters master and see if there are any events newer than the new master's last event that needs to be applied

Error applying events

As long as each master handles their own set of data, as described above, there should not be any conflicting data coming from the other master.

If there are conflicts, one should resolve them as one resolves issues with normal replication. The most common way to solve issues is to skip the conflicting log events with SET GLOBAL SQL_SLAVE_SKIP_COUNTER.

Handling duplicate key errors and other conflicts

If things are setup correctly as described earlier, one should never get duplicate key errors in Multi-Master ring replication. Any duplicate key error or data mismatch is usually an application error where it inserts/updates or deletes something it should not have the right to do.

To fix this:

  • Use SET GLOBAL SQL_SLAVE_SKIP_COUNTER to skip the error.
  • Use mariadb-binlog --base64-output=decode-rows --verbose --start-position=# binlog-name to see what is missing and apply the changes missing on the server (minus the conflict).

See also

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.