group-commit
Group commit for the binary log
Note: This page describes features in an unreleased version of MariaDB.
Unreleased means there are no official packages or binaries available for download which contain the features. If you want to try out any of the new features described here you will need to get and compile the code yourself.
Group commit for the binary log is scheduled for MariaDB 5.3, and is also available in the MariaDB 5.2-rpl feature preview.
Group commit is an important optimisation when the server is run with
innodb_flush_logs_at_trx_commit=1
and/or sync_binlog=1
. These settings are
needed to ensure that if a server crashes, any transaction that was reported
as committed prior to the time of crash will still be present in the database
after crash recovery ("durability", the "D" in ACID). Both settings are also
necessary to be able to recover into a consistent state at all after a crash;
if both are not set, it is possible after a crash to end up in a state where a
transaction in the binary log is missing in InnoDB/XtraDB, or vice versa,
which breaks replication for slaves.
When these settings are in effect, the server does an fsync()
(or fdatasync()
or similar) call on the XtraDB transaction log file / binary log file during
COMMIT
in order to ensure that data is stored durably on the disk. This is a
time-consuming operation, and can easily limit the throughput in terms of
number of commits per second that can be sustained.
The idea with group commit is to amortise the costs of each fsync()
over
multiple commits from multiple parallel transactions. If there are say 10
transactions in parallel trying to commit, we can force all of them to disk at
once with a single fsync()
, rather than do one fsync()
for each. This can
greatly reduce the need for fsync()
calls, and consequently greatly improve
the commits-per-second throughput.
Group commit works automatically; there are no options needed to enable it, and there are no negative consequences (performance or reliability or otherwise) from it.
But to see any positive effect of group commit, the workload must have
sufficient parallelism. Normally, at least three parallel transactions are
needed for group commit to be effective: While the first transaction is
waiting for fsync()
to complete, the other two transactions will queue up
waiting for their turn to call fsync()
. When the first transaction is done,
a single fsync()
can be used for the two queued-up transactions, saving in
this case one out of three fsync()s
.
(To see any benefit, it is also necessary to have sufficiently many
transaction per second wanting to commit that the fsync()
calls become a
bottleneck. If no transaction needs to wait for the fsync()
of another to
complete, group commit will show little or no improvement).
For the binary log, two new status variables are available for checking how effective group commit is at reducing the number of fsync() calls needed:
- Variable Name:
binlog_commits
- Scope: Session
- Data Type:
Number
This is the total number of transactions committed to the binary log.
- Variable Name:
binlog_group_commits
- Scope: Session
- Data Type:
Number
This is the total number of group commits done to the binary log (a group
commit is when a group of transactions are written into the binary log
together, sharing a single fsync()
call). When sync_binlog=1
, then this is the
number of fsync()'s
done for transaction commits to the binary log
Thus the extend to which group commit is effective at reducing the number of
fsync()
calls on the binary log can be determined by the ratio between these
two status variables. binlog_commits
will always be as high or higher than
binlog_group_commits
; the bigger the difference, the more effective group
commit was at reducing fsync()
overhead.
To query these variables, use for example a statement like this:
SHOW STATUS LIKE 'binlog_%commits';
(Information about the effectiveness of group commit inside XtraDB can be
obtained from similar status variables like innodb_data_fsyncs
; however, the
data may be slightly harder to interpret as XtraDB/InnoDB also uses fsync()
sometimes for other purposes than (group) commits).
For more information about the design and implementation of group commit, see MWL#116.