Five things you must know about parallel replication in MariaDB 10.x

When MariaDB 10.0 was launched as GA in 2014 it introduced a major feature: Parallel Replication. Parallel Replication is a fantastic addition to the long list of new MariaDB features, along with Global Transaction IDs. However I don’t think like many people understand this feature very well and know how to leverage its potential to the fullest extent. This blog post will explain a few of the gotchas that you could need while setting up Parallel Replication.

1. Single-thread replication can be slower in MariaDB 10.x than in MariaDB 5.5

The addition of new features in a new branch often comes at a cost. If you have a huge replication workload then you may find out that MariaDB 10.x replication is somehow slower than its counterpart in MariaDB 5.5. Thankfully, that can be counterbalanced by setting up Parallel Replication and the benefits will be much bigger than the ones you’d have while sticking with 5.5.

2. Parallel replication is not enabled by default.

Newsflash. You have to enable it, for example in your my.cnf (or SET GLOBAL, that variable is dynamic).

slave_parallel_threads = 8

I’ve seen customers putting it unnecessary large values, like 24 of 32. Don’t do that (unless you have 64 CPU Cores). You might introduce more contention, and the potential for parallel replication is not that big, which leads us to our third bullet point.

3. There needs to be a potential for parallel replication.

I won’t enter in depth in the specifics of parallel replication (I’ll point out to a couple of resources at the end of this blog post). Basically, you must have group commit happening on the master for that, in other words, write events which are committed as a group. To check if you have group commit happening, look up the two following status variables on the master:

MariaDB(db-01)[(none)]> show global status like 'binlog_%commits';
+----------------------+------------+
| Variable_name        | Value      |
+----------------------+------------+
| Binlog_commits       | 3790021298 |
| Binlog_group_commits | 3000740090 |
+----------------------+------------+

The bigger the difference between this two values, the more group commits are happening. For example above I have 1 group commit happening for 1.26 transactions, not fantastic but still OK.

4. You can make parallel replication faster by making commits slower

To increase group commit potential, you might actually need to make commits slower on the master. Don’t close this blog yet by calling me crazy. We can actually introduce a very small amount of latency on write transactions by using two configuration variables: binlog_commit_wait_count and binlog_commit_wait_usec.

The first variable, binlog_commit_wait_count, will delay a given number of transactions until they are ready to commit together. You can control this delay with the second variable, which is a given interval in microseconds which represents the maximum time those transactions will wait. So with those two variables, you can fine tune your group commit very precisely while taking a negligible performance hit on write commit time.

Of course you have to start with sensible values and increase/decrease if needed. Typical values for testing are like binlog_commit_wait_count=10 and binlog_commit_wait_usec=5000. This means that 10 transactions will wait a maximum 5ms to commit together. As you can guess, a 5ms delay is not an issue for most real-world applications.

5. You can make things even better with GTID and Domain Identifiers

Parallel Replication works without GTID, but turning on GTID has major advantages. Among other nice features like crash-safe replication and easy topology management, you can use multiple replication streams, identified each by a different Domain ID. Imagine you have two totally different applications in your server, you just have to implement the following in your code:

Application ACME -> SET gtid_domain_id=1
Application BETA -> SET gtid_domain_id=2

By using different Domain Identifiers, you will make sure those transactions always execute out-of-order, making use of two different replication threads on the slaves.

Bibliography

Try MariaDB 10