Goodbye Replication Lag!

One of my favorite new features of MariaDB 10 is parallel slave replication ( ). With earlier versions of MariaDB and MySQL, the slave runs in a single thread, and applies replication events serially. This was a performance bottleneck in highly concurrent environments, as transactions are able to utilize separate CPU cores on the master, while the slave could only use one CPU core to execute the replication events.

MySQL 5.6 tried to help with this with its version of multi-threaded slave, but it only worked for replication events that happened in different databases/schemas. You wouldn’t be able to benefit from MySQL 5.6‘s multi-threaded replication if you only used one database.

MariaDB 10‘s parallel replication utilizes group commit ( ) to know which replication events can be applied in parallel on the slave. Since they were able to commit together on the master, they were executed in parallel on the master and thus can execute in parallel on the slave. This applies to all events, not just the ones in different databases. You can benefit from multi-threaded slaves even if you’re using just one table.

Just as as simple demonstration, I set up a simple master-slave environment. One master with two identically configured slaves, all running MariaDB 10.0.12. The master is running an OLTP sysbench benchmark with 16 concurrent connections. One slave uses the default setting of just a single threaded slave, and we can quickly see seconds behind master increasing under this load. The other slave is configured to use four threads, and we only see few times it lags behind.

Going up!

Enabling this feature is easy. All you need to do is set slave_parallel_threads (… ). This is a dynamic variable, so you do not need to stop your database but you will need to stop the SQL slave thread first:

SET GLOBAL slave_parallel_threads = 4;
SELECT @@slave_parallel_threads;

You may not want to utilize all the CPU cores on the slave host for parallel replication. A common use case for replication is having a reporting server to run large SELECT queries, where the need for the most up-to-date data is not necessary (batch processing of the previous day, for example). So you will want to be aware of the resources available on the host and tune appropriately.

There are other ways to tune performance with parallel replication. For example, you can tune how long the master can wait (in microseconds) for other transactions before writing them to the binary log. We’ll cover the tuning parameters in a later post.

It’s time to say goodbye to replication lag 🙂