Rollback Prepared Transactions Asynchronously During Binlog Crash Recovery

spacer

This is a guest post by Libing Song from Alibaba Cloud.

If the transaction takes 30 minutes to execute, it will probably take 30 minutes to rollback. The server will not be available for the same period if a transaction should be rolled back at startup.

This blog post introduces a new feature which speeds up the rollback of big transactions at server startup. It significantly improves RTO for a server without a replica or its replica is far behind. The feature debuts in the recent MariaDB Server 11.7 release series. This is a contribution from the AliSQL Team.

Background

The binary log (Binlog) contains “events” that describe database changes both data and structure. It is used for incremental backup and replication. It consists of a set of binary log files and an index. The binary events of a transaction are written into the binlog file and persisted when the transaction is committed. It has to keep the consistency between binary log and data stored in storage engines, otherwise the replica setup through replication cannot get exact data same to the source.

MariaDB Server has a mechanism to guarantee the consistency between binary log and data stored in all transactional engines (e.g. InnoDB) when an abnormal shutdown happens. The mechanism uses a two phase commit (2PC) on normal transactions which is called internal XA.

Transaction state diagram

With the internal XA, a transaction is committed in three steps:

  1. Prepare the transaction to engine. Its state is changed from ACTIVE to PREPARED and persisted with a xid.
  2. Write its binlog events with a xid event to the binlog file and persist the binlog file.
  3. Commit the transaction.

When a crash happens, a transaction could be in one of the below states:

  • Active: previous internal XA process guarantees that it is never binlogged.
  • Prepared but not binlogged (or partially logged): The transaction is prepared, but its xid doesn’t appear in the binlog file.
  • Prepared and binlogged: The transaction is prepared, and xid appears in the binlog file.
  • Committed: It is binlogged and committed.

For committed transactions, there is nothing to do at server startup. For active transactions, they will be rolled back by background rollback thread automatically. Prepared transactions need to be handled according to the state of the binlog file. If the transaction’s xid is found in the binlog file, then the transaction is committed, otherwise the transaction is rolled back. 

The process handling the prepared transactions is called binlog recovery, it has to be finished before the server can provide service to users. Commit is usually fast, but rollback often takes as long as its execution time. If the transaction takes 30 minutes to execute, it will probably take 30 minutes to rollback. The server will not be available for the same period. 

Diagram showing server startup, binlog recovery and rollback

A feature (MDEV#33853) is released with MariaDB Server11.7 to eliminate the impact of large transactions at server startup. This blog post explains the solution of the feature.

Solution

When rolling back a prepared transaction, it first reverts its state to active and then undo all operations sequentially. In this feature, the process is split to two parts during binlog recovery. Binlog recovery just does the first part to set the transaction’s state to ACTIVE and persist the state. Background rollback thread will do the second part to undo all of the operations. The first part is always fast, the server can provide service to users just after the first part is finished.

Diagram showing server startup, binlog recovery and rollback thread.

The new recovery method is also re-entrant, that is it can be retried multiple times until it is fully completed.

Conclusion

To guarantee the server has a good recovery time object (RTO), it must reduce the recovery time in the worst cases. Rollback big transactions at startup is one of the worst cases. With this feature, the recovery time can be reduced to only seconds where it previously would have taken hours.

Download MariaDB Community Server 11.7 to try this new feature. Tell us what you think!