Group commit per il log binario
Le Group commit sono un'importante ottimizzazione di MariaDB 5.3, utile se il server è eseguito con innodb_flush_logs_at_trx_commit=1
e/o sync_binlog=1
. Queste impostazioni sono necessarie per accertarsi che, se il server va in crash, tutte le transazioni che sono state segnate come concluse prima dell'arresto continueranno a essere presenti nel database dopo il crash recovery ("durability", la "D" in ACID). Inoltre, entrambe le impostazioni sono necessarie per consentire il recupero di uno stato coerente dopo il crash; se non sono entrambe impostate, è possibile che il recupero porti a uno stato in cui le transazioni nel log binario manchino nelle tabelle InnoDB/XtraDB, o viceversa, perciò gli slave replicheranno dati errati.
When these settings are in effect, the server does an fsync()
(or
fdatasync()
or similar) call on the XtraDB transaction log file / binary
log file during COMMIT
in order to ensure that data is stored durably on
the disk. An fsync()
is a time-consuming operation, and can easily limit
throughput in terms of the number of commits per second which can be
sustained.
The idea with group commit is to amortise the costs of each fsync()
over
multiple commits from multiple parallel transactions. If there are say 10
transactions in parallel trying to commit, we can force all of them to disk at
once with a single fsync()
, rather than do one fsync()
for each. This
can greatly reduce the need for fsync()
calls, and consequently greatly
improve the commits-per-second throughput.
Group commit works automatically; there are no options needed to enable it, and there are no negative consequences (performance or reliability or otherwise) from it.
However, to see the positive effects of group commit, the workload must have
sufficient parallelism. Normally, at least three parallel transactions are
needed for group commit to be effective: While the first transaction is
waiting for fsync()
to complete, the other two transactions will queue up
waiting for their turn to call fsync()
. When the first transaction is done,
a single fsync()
can be used for the two queued-up transactions, saving in
this case one of the three fsync()s
.
In addition to sufficient parallelism, it is also necessary to have enough
transactions per second wanting to commit that the fsync()
calls are a
bottleneck. If no such bottleneck exists (i.e. transactions never or rarely
need to wait for the fsync()
of another to complete), group commit will
provide little to no improvement.
New Status Variables
For the binary log, two new status variables are available for checking how effective group commit is at reducing the number of fsync() calls needed:
binlog_commits
- Variable Name:
binlog_commits
- Scope: Session
- Data Type:
Number
This is the total number of transactions committed to the binary log.
binlog_group_commits
- Variable Name:
binlog_group_commits
- Scope: Session
- Data Type:
Number
This is the total number of group commits done to the binary log (a group
commit is when a group of transactions are written into the binary log
together, sharing a single fsync()
call). When sync_binlog=1
, then this
is the number of fsync()'s
done for transaction commits to the binary log
Thus the extent to which group commit is effective at reducing the number of
fsync()
calls on the binary log can be determined by the ratio between
these two status variables. binlog_commits
will always be as high or higher
than binlog_group_commits
; the bigger the difference, the more effective
group commit was at reducing fsync()
overhead.
To query these variables, use for example a statement like this:
SHOW STATUS LIKE 'binlog_%commits';
(Information about the effectiveness of group commit inside XtraDB can be
obtained from similar status variables like innodb_data_fsyncs
; however,
the data may be slightly harder to interpret as XtraDB/InnoDB also uses
fsync()
sometimes for other purposes than (group) commits).
For more information about the design and implementation of group commit, see MWL#116.