Optimizing Conservative In-order Parallel Replication with MariaDB 10.0

Conservative in-order parallel replication is a great feature in MariaDB 10.0 that improves replication performance by using knowledge of group commit on the master to commit transactions in parallel on a slave. If
slave_parallel_threads
is greater than 0, then the SQL thread will instruct multiple worker threads to concurrently apply transactions that were committed in the same group commit on the master.
Conservative in-order parallel replication is a good alternative to out-of-order parallel replication for use cases where explicitly setting domain IDs for independent transactions is impractical or impossible.
Although setting slave_parallel_threads
is enough to enable conservative in-order parallel replication, you may have to tweak binlog_commit_wait_usec
and binlog_commit_wait_count
in order to increase your group commit ratio on the master, which is necessary to enable parallel applying on the slave. In this blog post, I’ll show an example where this is the case.
Note: MariaDB 10.1 also adds the slave_parallel_mode
configuration variable to enable other modes for in-order parallel replication.
Configure the master and slave
For our master, let’s configure the following settings:
[mysqld] log_bin binlog_format=ROW server_id=1
For our slave, let’s configure the following:
[mysqld] server_id=2 slave_parallel_threads=2
Set up replication on master
Now, let’s set up the master for replication:
MariaDB [(none)]> CREATE USER 'repl'@'172.31.31.73' IDENTIFIED BY 'password'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'172.31.31.73'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> RESET MASTER; Query OK, 0 rows affected (0.22 sec) MariaDB [(none)]> SHOW MASTER STATUSG *************************** 1. row *************************** File: master-bin.000001 Position: 322 Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec) MariaDB [(none)]> SELECT BINLOG_GTID_POS('master-bin.000001', 322); +-------------------------------------------+ | BINLOG_GTID_POS('master-bin.000001', 322) | +-------------------------------------------+ | | +-------------------------------------------+ 1 row in set (0.00 sec)
If you’ve set up GTID replication with MariaDB 10.0 before, you’ve probably used BINLOG_GTID_POS
to convert a binary log position to its corresponding GTID position. On newly installed systems like my example above, this GTID position might be blank.
Now, let’s set up replication on the slave:
MariaDB [(none)]> SET GLOBAL gtid_slave_pos =''; Query OK, 0 rows affected (0.09 sec) MariaDB [(none)]> CHANGE MASTER TO master_host='172.31.31.72', master_user='repl', master_password='password', master_use_gtid=slave_pos; Query OK, 0 rows affected (0.04 sec) MariaDB [(none)]> START SLAVE; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.31.31.72 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 322 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 601 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 322 Relay_Log_Space: 898 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 1 row in set (0.00 sec)
Create a test table on master
Let’s set up a test table on the master using mysqlslap. The table will automatically be replicated to the slave:
mysqlslap -u root --create-schema=db1 --no-drop --create="CREATE TABLE test_table (id INT AUTO_INCREMENT PRIMARY KEY,file BLOB);"
Generate some data on master
Now, in a Linux shell on the master, let’s create a random 1 KB file:
[gmontee@master ~]$ dd if=/dev/urandom of=/tmp/file.out bs=1KB count=1 1+0 records in 1+0 records out 1000 bytes (1.0 kB) copied, 0.000218694 s, 4.6 MB/s [gmontee@master ~]$ chmod 0644 /tmp/file.out
Get group commit status on master (before first test)
Before we insert our data on the master, let’s get the starting values of Binlog_commits
and Binlog_group_commits
.
MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits'); +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Binlog_commits | 20 | | Binlog_group_commits | 20 | +----------------------+-------+ 2 rows in set (0.00 sec)
Insert some data on master
Now, let’s use mysqlslap to insert our random file into the table a bunch of times:
mysqlslap -u root --create-schema=db1 --concurrency=5 --iterations=20 --no-drop --query="INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));"
Get group commit status on master (after first test)
After inserting our data, let’s get the values of Binlog_commits
and Binlog_group_commits
again.
MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits'); +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Binlog_commits | 120 | | Binlog_group_commits | 120 | +----------------------+-------+ 2 rows in set (0.00 sec)
To get the group commit ratio for our batch job, we would subtract the pre-job Binlog_commits and Binlog_group_commits values from the post-job values:
transactions/group commit = (Binlog_commits
(after) – Binlog_commits
(before))/(Binlog_group_commits
(after) – Binlog_group_commits
(before))
So here, we have:
transactions/group commit = (120 – 20) / (120 – 20) = 100 / 100 = 1 transactions/group commit
At 1 transactions/group commit, there isn’t any potential for the slave to apply transactions in parallel.
Insert some more data on master
Now let’s change binlog_commit_wait_count
and binlog_commit_wait_usec
:
MariaDB [(none)]> SET GLOBAL binlog_commit_wait_count=2; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> SET GLOBAL binlog_commit_wait_usec=10000; Query OK, 0 rows affected (0.00 sec)
And then let’s insert some more data:
mysqlslap -u root --create-schema=db1 --concurrency=5 --iterations=20 --no-drop --query="INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));"
Get group commit status on master (after second test)
After changing the values for binlog_commit_wait_count
and binlog_commit_wait_usec
and inserting more data, let’s get the values of Binlog_commits
and Binlog_group_commits
again.
MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits'); +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Binlog_commits | 220 | | Binlog_group_commits | 145 | +----------------------+-------+ 2 rows in set (0.00 sec)
So here, we have:
transactions/group commit = (220 – 120) / (145 – 120) = 100 / 25 = 4 transactions/group commit
At 4 transactions/group commit, there is much more potential for the slave to apply transactions in parallel.
Check slave concurrency
Now that the values of binlog_commit_wait_count
and binlog_commit_wait_usec
have been tweaked on the master to allow for parallel applying on the slave, let’s execute a bigger job on the master and then see if the slave actually does apply its transactions in parallel.
First, let’s run this on the master:
mysqlslap -u root --create-schema=db1 --concurrency=100 --iterations=100 --no-drop --query="INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));"
And then, let’s execute SHOW FULL PROCESSLIST
on the slave:
MariaDB [(none)]> SHOW FULL PROCESSLIST; +----+-------------+-----------+------+---------+------+-----------------------------------------------+-----------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+-----------------------------------------------+-----------------------+----------+ | 2 | system user | | NULL | Connect | 2266 | Waiting for master to send event | NULL | 0.000 | | 3 | system user | | NULL | Connect | 7 | closing tables | NULL | 0.000 | | 4 | system user | | NULL | Connect | 7 | closing tables | NULL | 0.000 | | 5 | system user | | NULL | Connect | 2266 | Waiting for room in worker thread event queue | NULL | 0.000 | | 8 | root | localhost | NULL | Query | 0 | init | SHOW FULL PROCESSLIST | 0.000 | +----+-------------+-----------+------+---------+------+-----------------------------------------------+-----------------------+----------+ 5 rows in set (0.00 sec)
Here, both of our worker threads have the state “closing tables”, so both of them are applying transactions in parallel.
Conclusion
If you want to use conservative in-order parallel replication to improve slave performance, but you find that your slave isn’t applying transactions in parallel, you may want to adjust binlog_commit_wait_count
and binlog_commit_wait_usec
.