Optimizing Out-of-order Parallel Replication with MariaDB 10.0

Out-of-order parallel replication is a great feature in MariaDB 10.0 that improves replication performance by committing independent transactions in parallel on a slave. If slave_parallel_threads is greater than 0, then the SQL thread will instruct multiple worker threads to concurrently apply transactions with different domain IDs.

If an application is setting the domain ID, and if parallel replication is enabled in MariaDB, then out-of-order parallel replication should mostly work automatically. However, depending on an application’s transaction size and the slave’s lag behind the master, slave_parallel_max_queued may have to be adjusted. In this blog post, I’ll show an example where this is the case.

Configure the master and slave

For our master, let’s configure the following settings:

[mysqld]
max_allowed_packet=1073741824
log_bin
binlog_format=ROW
sync_binlog=1
server_id=1

For our slave, let’s configure the following:

[mysqld]
server_id=2
slave_parallel_threads=2
slave_domain_parallel_threads=1
slave_parallel_max_queued=1KB

In our test, we plan to use two different domain IDs, so slave_parallel_threads is set to 2. Also, notice how small slave_parallel_max_queued is here: it is only set to 1 KB. With such a small value, it will be easier to see the behavior I want to demonstrate.

Set up replication on master

Now, let’s set up the master for replication:

MariaDB [(none)]> CREATE USER 'repl'@'192.168.1.46' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.1.46';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> RESET MASTER;
Query OK, 0 rows affected (0.22 sec)

MariaDB [(none)]> SHOW MASTER STATUSG
*************************** 1. row ***************************
            File: master-bin.000001
        Position: 313
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT BINLOG_GTID_POS('master-bin.000001', 313);
+-------------------------------------------+
| BINLOG_GTID_POS('master-bin.000001', 313) |
+-------------------------------------------+
|                                           |
+-------------------------------------------+
1 row in set (0.00 sec)

If you’ve set up GTID replication with MariaDB 10.0 before, you’ve probably used BINLOG_GTID_POS to convert a binary log position to its corresponding GTID position. On newly installed systems like my example above, this GTID position might be blank.

Set up replication on slave

Now, let’s set up replication on the slave:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos ='';
Query OK, 0 rows affected (0.09 sec)

MariaDB [(none)]> CHANGE MASTER TO master_host='192.168.1.45', master_user='repl', master_password='password', master_use_gtid=slave_pos;
Query OK, 0 rows affected (0.04 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.45
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 313
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 601
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 313
              Relay_Log_Space: 898
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 
1 row in set (0.00 sec)

Create some test tables on master

Let’s set up some test tables on the master. These will automatically be replicated to the slave. We want to test parallel replication with two domains, so we will set up two separate, but identical tables, in two different databases:

MariaDB [(none)]> CREATE DATABASE db1;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> CREATE TABLE db1.test_table (
    -> id INT AUTO_INCREMENT PRIMARY KEY,
    -> file BLOB
    -> );
Query OK, 0 rows affected (0.12 sec)

MariaDB [(none)]> CREATE DATABASE db2;
Query OK, 1 row affected (0.01 sec)

MariaDB [(none)]> CREATE TABLE db2.test_table (
    -> id INT AUTO_INCREMENT PRIMARY KEY,
    -> file BLOB
    -> );
Query OK, 0 rows affected (0.06 sec)

Stop SQL thread on slave

For the test, we want the slave to fall behind the master, and we want its relay log to grow. To make this happen, let’s stop the SQL thread on the slave:

MariaDB [(none)]> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.02 sec)

Insert some data on master

Now, in a Linux shell on the master, let’s create a random 1 MB file:

[gmontee@master ~]$ dd if=/dev/urandom of=/tmp/file.out bs=1MB count=1
1+0 records in
1+0 records out
1000000 bytes (1.0 MB) copied, 0.144972 s, 6.9 MB/s
[gmontee@master ~]$ chmod 0644 /tmp/file.out

Now, let’s create a script to insert the contents of the file into both of our tables in db1 and db2 with different values of gtid_domain_id:

tee /tmp/domain_test.sql <<EOF
SET SESSION gtid_domain_id=1;
BEGIN;
INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));
COMMIT;
SET SESSION gtid_domain_id=2;
BEGIN;
INSERT INTO db2.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));
COMMIT;
EOF

After that, let’s run the script a bunch of times. We can do this with a bash loop:

[gmontee@master ~]$ { for ((i=0;i<1000;i++)); do cat /tmp/domain_test.sql; done; } | mysql --max_allowed_packet=1073741824 --user=root

Restart SQL thread on slave

Now the relay log on the slave should have grown quite a bit. Let’s restart the SQL thread and watch the transactions get applied. To do this, let’s open up two shells on the slave.

On the first shell on the slave, connect to MariaDB and restart the SQL thread:

MariaDB [(none)]> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)

On the second shell, let’s look at SHOW PROCESSLIST output in a loop:

[gmontee@slave ~]$ for i in {1..1000}; do mysql --user=root --execute="SHOW PROCESSLIST;"; sleep 1s; done;

Take a look at the State column for the slave’s SQL thread:

+----+-------------+-----------+------+---------+--------+-----------------------------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time   | State                                         | Info             | Progress |
+----+-------------+-----------+------+---------+--------+-----------------------------------------------+------------------+----------+
|  3 | system user |           | NULL | Connect |    139 | closing tables                                | NULL             |    0.000 |
|  4 | system user |           | NULL | Connect |    139 | Waiting for work from SQL thread              | NULL             |    0.000 |
|  6 | system user |           | NULL | Connect | 264274 | Waiting for master to send event              | NULL             |    0.000 |
| 10 | root        | localhost | NULL | Sleep   |     43 |                                               | NULL             |    0.000 |
| 21 | system user |           | NULL | Connect |     45 | Waiting for room in worker thread event queue | NULL             |    0.000 |
| 54 | root        | localhost | NULL | Query   |      0 | init                                          | SHOW PROCESSLIST |    0.000 |
+----+-------------+-----------+------+---------+--------+-----------------------------------------------+------------------+----------+

With such a low slave_parallel_max_queued value, it will probably say "Waiting for room in worker thread event queue." most of the time. The SQL thread doesn’t have enough memory allocated to read-ahead more of the relay log. This can prevent the SQL thread from providing enough work for all of the worker threads. The worker threads will probably show a State value of "Waiting for work from SQL thread" more often.

Conclusion

If you expect to be able to benefit from parallel slave threads, but you find that the State column in SHOW PROCESSLIST often shows "Waiting for room in worker thread event queue" for your SQL thread, you should try increasing slave_parallel_max_queued to see if that helps. The default slave_parallel_max_queued value of 132 KB will probably be acceptable for most workloads. However, if you have large transactions or if your slave falls behind the master often, and you hope to use out-of-order parallel replication, you may have to adjust this setting. Of course, most users probably want to avoid large transactions and slave lag for other reasons as well.

Has anyone run into this problem before? Were you able to figure out a solution on your own?