Group Commit for the Binary Log
Contents
MariaDB starting with 5.3
Group commit for the binary log was first introduced in MariaDB 5.3
Overview
In MariaDB 5.3 and above, the server supports group commit. This is an important optimization that helps MariaDB reduce the number of expensive disk operations that are performed.
Durability
In ACID terminology, the "D" stands for durability. In order to ensure durability with group commit, innodb_flush_log_at_trx_commit=1
and/or sync_binlog=1
should be set. These settings are needed to ensure that if the server crashes, then any transaction that was committed prior to the time of crash will still be present in the database after crash recovery.
Setting both innodb_flush_log_at_trx_commit=1
and sync_binlog=1
provides the best guarantee of consistency after a crash.
If sync_binlog=1
is set, but innodb_flush_log_at_trx_commit=1
is not set, then it is possible after a crash to end up in a state where a transaction present in the binary log is missing from the InnoDB redo log. This will make the master different from the slaves, and will thus break replication.
Similarly, if innodb_flush_log_at_trx_commit=1
is set, but sync_binlog=1
is not set, then it is possible after a crash to end up in a state where a transaction present in the InnoDB redo log is missing from binary log. This again will cause master and slaves to differ and replication to break. In MariaDB 10.0 and above, this configuration can also cause the transaction to be missing from InnoDB due to some optimizations made to the innodb_flush_log_at_trx_commit=1
option in those versions. In that case, it is recommended to always set sync_binlog=1
. See Non-durable Binary Log Settings for more information.
Amortizing Disk Flush Costs
After every transaction COMMIT
, the server normally has to flush any changes the transaction made to the InnoDB redo log and the binary log to disk (i.e. by calling system calls such as fsync()
or fdatasync()
or similar). This helps ensure that the data changes made by the transaction are stored durably on the disk. Disk flushing is a time-consuming operation, and can easily impose limits on throughput in terms of the number of transactions-per-second (TPS) which can be committed.
The idea with group commit is to amortize the costs of each flush to disk over multiple commits from multiple parallel transactions. For example, if there are 10 transactions trying to commit in parallel, then we can force all of them to be flushed disk at once with a single system call, rather than do one system call for each commit. This can greatly reduce the need for flush operations, and can consequently greatly improve the throughput of transactions-per-second (TPS).
However, to see the positive effects of group commit, the workload must have sufficient parallelism. A good rule of thumb is that at least three parallel transactions are needed for group commit to be effective. For example, while the first transaction is waiting for its flush operation to complete, the other two transactions will queue up waiting for their turn to flush their changes to disk. When the first transaction is done, a single system call can be used to flush the two queued-up transactions, saving in this case one of the three system calls.
In addition to sufficient parallelism, it is also necessary to have enough transactions per second wanting to commit that the flush operations are a bottleneck. If no such bottleneck exists (i.e. transactions never or rarely need to wait for the flush of another to complete), then group commit will provide little to no improvement.
Changing Group Commit Frequency
MariaDB starting with 10.0
The frequency of group commits can be changed by configuring the binlog_commit_wait_usec
and binlog_commit_wait_count
system variables, which were introduced in MariaDB 10.0.
Measuring Group Commit Ratio
Two status variables are available for checking how
effective group commit is at reducing flush overhead. These are the Binlog_commits
and Binlog_group_commits
status variables. We can obtain those values with the following query:
SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits');
Binlog_commits
is the total number of transactions committed to the binary log.
Binlog_group_commits
is the total number of groups committed to the binary log. As explained in the previous sections of this page, a group commit is when a group of transactions is flushed to the binary log together by sharing a single flush system call. When sync_binlog=1
is set, then this is also the total number of flush system calls executed in the process of flushing commits to the binary log.
Thus the extent to which group commit is effective at reducing the number of flush system calls on the binary log can be determined by the ratio between these two status variables. Binlog_commits
will always be as equal to or greater than Binlog_group_commits
. The greater the difference is between these status variables, the more effective group commit was at reducing flush overhead.
To calculate the group commit ratio, we actually need the values of these status variables from two snapshots. Then we can calculate the ratio with the following formula:
transactions/group commit
= (Binlog_commits
(snapshot2) - Binlog_commits
(snapshot1))/(Binlog_group_commits
(snapshot2) - Binlog_group_commits
(snapshot1))
For example, if we had the following first snapshot:
SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits'); +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Binlog_commits | 120 | | Binlog_group_commits | 120 | +----------------------+-------+ 2 rows in set (0.00 sec)
And the following second snapshot:
SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits'); +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Binlog_commits | 220 | | Binlog_group_commits | 145 | +----------------------+-------+ 2 rows in set (0.00 sec)
Then we would have:
transactions/group commit
= (220 - 120) / (145 - 120) = 100 / 25 = 4 transactions/group commit
If your group commit ratio is too close to 1, then it may help to change your group commit frequency.
Use of Group Commit with Parallel Replication
In MariaDB 10.0 and above, group commit is also used to enable conservative mode of in-order parallel replication.
Effects of Group Commit on InnoDB Performance
In MariaDB 10.0 and above, when both innodb_flush_log_at_trx_commit=1
(the default) is set and the binary log is enabled, there is now one less sync to disk inside InnoDB during commit (2 syncs shared between a group of transactions instead of 3). See Binary Log Group Commit and InnoDB Flushing Performance for more information.
MariaDB starting with 10.0
The InnoDB flushing performance improvements related to group commit were first introduced in MariaDB 10.0.
Status Variables
Binlog_commits
is the total number of transactions committed to the binary log.
Binlog_group_commits
is the total number of groups committed to the binary log.
Binlog_group_commit_trigger_count
is the total number of group commits triggered because of the number of binary log commits in the group reached the limit set by the system variable binlog_commit_wait_count
.
Binlog_group_commit_trigger_lock_wait
is the total number of group commits triggered because a binary log commit was being delayed because of a lock wait where the lock was held by a prior binary log commit. When this happens the later binary log commit is placed in the next group commit.
Binlog_group_commit_trigger_timeout
is the total number of group commits triggered because of the time since the first binary log commit reached the limit set by the system variable binlog_commit_wait_usec
.
MariaDB starting with 10.0
The Binlog_group_commit_trigger_count
, Binlog_group_commit_trigger_lock_wait
, and Binlog_group_commit_trigger_timeout
status variables were first introduced in MariaDB 10.0.18 and MariaDB 10.1.5.
To query these variables, use a statement such as:
SHOW GLOBAL STATUS LIKE 'Binlog_%commit%';