MariaDB starting with 10.0.5

Parallel Replication was introduced in MariaDB 10.0.5.

MariaDB 10.0 can execute some queries in parallel on the slave. This article will explain how it works and how you can tune it.

Parallel replication overview

MariaDB replication in general takes place in three parts:

  • Replication events are read from the master by the IO thread and queued in the relay log
  • Replication events are fetched one at a time by the SQL thread from the relay log
  • Each event is applied on the slave to replicate all changes done on the master.

Before MariaDB 10, the third step was also performed by the SQL thread; this meant that only one event could be executing at a time, and replication was essentially single-threaded. In MariaDB 10, the third step can optionally be performed by a pool of separate replication worker threads, and thereby potentially increase replication performance by applying multiple events in parallel.

How to enable parallel slave

To enable, specify slave-parallel-threads=# in your my.cnf file as an argument to mysql.

The value (#) specifies how many threads will be created in a pool of worker threads used to apply events in parallel for *all* your slaves (this includes multi-source replication). If the value is zero, then no worker threads are created, and old-style replication is used where events are applied inside the SQL thread. Usually the value, if non-zero, should be at least two times the number of multi-source master connections used. It makes little sense to use only a single worker thread for one connection; this will incur some overhead in inter-thread communication between the SQL thread and the worker thread, but with just a single worker thread events can not be applied in parallel anyway.

slave-parallel-threads=# is a dynamic variable that can be changed without restarting mysqld. All slaves connections must however be stopped when changing the value.

What can be run in parallel

Parallel replication can be in-order or out-of-order:

  • In-order executes transactions in parallel, but orders the commit step of the transactions to happen in the exact same order as on the master. Transactions are only executed in parallel to the extent that this can be automatically verified to be possible without any conflicts. This means that the use of parallelism is completely transparent to the application.
  • Out-of-order can execute and commit transactions in different order on the slave than originally on the master. This means that the application must be tolerant to seeing updates occur in different order. The application is also responsible for ensuring that there are no conflicts between transactions that are replicated out-of-order. Out-of-order is only used in GTID mode and only when explicitly enabled by the application, using the replication domain that is part of the GTID.

In-order parallel replication

In-order parallel replication uses the group commit on the master to discover potential for parallel apply of events on the slave. If two transactions commit together in a group commit on the master, they are written into the binlog with the same commit id (this can be seen in the GTID events in the output of mysqlbinlog). Such events are certain to not conflict with each other, and they can be scheduled by the parallel replication to run in different worker threads.

Two transactions that were committed separately on the master can potentially conflict (eg. modify the same row of a table). Thus, the worker that applies the second transaction will not start immediately, but wait until the first transaction begins the commit step; at this point it is safe to start the second transaction, as it can no longer disrupt the execution of the first one.

In either case, when the two transactions reach the point where the low-level commit happens and commit order is determined, the two commits are ordered to happen in the same order as on the master, so that operation is transparent to applications.

The opportunities for parallel replication on slaves can be highly increased if more transactions group-commit together on the master. This can be tuned using the binlog_commit_wait_count and binlog_commit_wait_usec variables. If for example the application can tolerate up to 50 milliseconds extra delay for transactions on the master, one can set binlog_commit_wait_usec=50000 and binlog_commit_wait_count=20 to get up to 20 transactions at a time available for replication in parallel. Care must however be taken to not set binlog_commit_wait_usec too high, as this could cause significant slowdown for applications that run a lot of small transactions serially one after the other.

Note that even if there is no parallelism available from the master group commit, there is still an opportunity for speedup from in-order parallel replication, since the actual commit steps of different transactions can run in parallel. This can be particularly effective on a slave with binlog enabled (log_slave_updates=1), and more so if slave is configured to be crash-safe (sync_binlog=1 and innodb_flush_logs_at_trx_commit=1), as this makes group commit possible on the slave.

Out-of-order parallel replication

Out-of-order parallel replication happens (only) when using GTID mode, when GTIDs with different replication domains are used. The replication domain is set by the DBA/application using the variable gtid_domain_id.

Two transactions having GTIDs with different domain_id are scheduled to different worker threads by parallel replication, and are allowed to execute completely independently from each other. It is the responsibility of the application to only set different domain_ids for transactions that are truly independent, and are guaranteed to not conflict with each other. The application must also be able to work correctly even though the transactions with different domain_id are seen as committing in different order between the slave and the master, and between different slaves.

Out-of-order parallel replication can potentially give more performance gain than in-order parallel replication, since the application can explicitly give more opportunities for running transactions in parallel than what the server can determine on its own automatically.

One simple but effective usage is to run long-running statements, such as ALTER TABLE, in a separate replication domain. This allows replication of other transactions to proceed uninterrupted:

SET SESSION gtid_domain_id=1 ALTER TABLE t ADD INDEX myidx(b) SET SESSION gtid_domain_id=0

Normally, a long-running ALTER TABLE or other query will stall all following transactions, causing the slave to become behind the master as least as long time as it takes to run the long-running query. By using out-of-order parallel replication by setting the replication domain id, this can be avoided. The DBA/application must ensure that no conflicting transactions will be replicated while the ALTER TABLE runs.

Another common opportunity for out-of-order parallel replication comes in connection with multi-source replication. Suppose we have two different masters M1 and M2, and we are using multi-source replication to have S1 as a slave of both M1 and M2. S1 will apply events received from M1 in parallel with events received from M2. If we now have a third-level slave S2 that replicates from S1 as master, we want S2 to also be able to apply events that originated on M1 in parallel with events that originated on M2. This can be achieved with out-of-order parallel replication, by setting gtid_domain_id different on M1 and M2.

Note that there are no special restrictions on what operations can be replicated in parallel using out-of-order; such operations can be on the same database/schema or even on the same table. The only restriction is that the operations must not conflict, that is they must be able to be applied in any order and still end up with the same result.

When using out-of-order parallel replication, the current slave position in the master's binlog becomes multi-dimensional - each replication domain can have reached a different point in the master binlog at any one time. The current position can be seen from the variable gtid_slave_pos. When the slave is stopped, restarted, or switched to replicate from a different master using CHANGE MASTER, MariaDB automatically handles restarting each replication domain at the appropriate point in the binlog.

Expected performance gain

Here is an article showing up to ten times improvement when using parallel replication:

Configuration variable --slave-parallel-max-queued

The variable slave_parallel_max_queued is only meaningful when parallel replication is used (slave_parallel_threads>0).

When parallel replication is used, the SQL threads will read ahead in the relay logs, queueing events in memory while looking for opportunities for executing events in parallel. The @@slave_parallel_max_queued variable sets a limit for how much memory the SQL threads will use for read-ahead in the relay logs looking for such opportunities.

Note that @@slave_parallel_max_queued is not a hard limit, since the binlog events that are currently executing always need to be held in-memory.

@@slave_parallel_max_queued is mainly needed when using GTID with different replication domain ids. If the binary log contains first transactions in domain 1 followed by some transactions in domain 2, then parallel replication can execute the domain 2 transactions in parallel with the domain 1 transactions. However, this requires that the SQL thread is able to read ahead of the domain 1 transactions while they are executing so that it can queue the domain 2 transactions for parallel execution. Thus, the total size of the domain 1 transactions must be less than @@slave_parallel_max_queued, or parallel execution will not be possible. On the other hand, a too large @@slave_parallel_max_queued value on a slave that is much behind the master could cause the SQL thread to queue up an excessive amount of events in-memory vainly looking for opportunities for parallelism, which could lead to too high memory consumption.

For parallel replication of transactions that group-committed together on the master, only one (or two when moving to the next group commit) transactions can be queued for one worker at a time. In this case, it is sufficient that @@slave_parallel_max_queued is larger than the event size of two normal transactions.

Configuration variable slave_domain_parallel_threads

The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order.

If one master connection or replication domain is currently processing a long-running query, it is possible that it will allocate all the worker threads in the pool, only to have them wait for the long-running query to complete, stalling any other master connection or replication domain, which will have to wait for a worker thread to become free.

This can be avoided by setting slave_domain_parallel_threads to a number that is lower than slave_parallel_threads. When set different from zero, each replication domain in one master connection can reserve at most that many worker threads at any one time, leaving the rest (up to the value of slave_parallel_threads) free for other master connections or replication domains to use in parallel.

The slave_domain_parallel_threads variable is dynamic and can be changed without restarting the server; all slaves must be stopped while changing it, though.

Implementation details

The implementation is described in MDEV-4506.


Comments loading...