Enhancements for START TRANSACTION WITH CONSISTENT SNAPSHOT

You are viewing an old version of this article. View the current version here.

Note: The feature described in this page is also available in the MariaDB 5.2 replication feature preview release

NOTE: The PBXT part of this feature is still experimental. It has not been finally scheduled for a release due to not yet being reviewed by the PBXT maintainers. It is included in the MariaDB-5.2-rpl feature preview, but may or may not make it into MariaDB 5.3.

With the introduction of group commit, MariaDB also introduces an enhanced storage engine API for COMMIT that allows engines to coordinate commit ordering and visibility with each other and with the binary log. With this, the START TRANSACTION WITH CONSISTENT SNAPSHOT statement has been enhanced to ensure consistency also in-between storage engines that support the new API. At the time of writing, there supporting engines are XtraDB and PBXT. In addition, the binary log, while not a storage engine as such, also supports the new API and can provide a binlog position consistent with storage engine transaction snapshots.

This means that with transaction isolation level at least REPEATABLE READ, the START TRANSACTION WITH CONSISTENT SNAPSHOT statement can be used to ensure that queries will see a transaction-consistent view of the database also between storage engines. It is then not possible for a query to see the changes from some transaction T in XtraDB tables without also seeing the changes T makes to PBXT tables. (within a single transactiona storage engine like XtraDB or PBXT, consistency is always guaranteed even without using START TRANSACTION WITH CONSISTENT SNAPSHOT).

For example, suppose the following two transactions run in parallel:

Transaction T1:

    BEGIN;
    SET @t = NOW();
    UPDATE xtradb_table SET a= @t WHERE id = 5;
    UPDATE pbxt_table SET b= @t WHERE id = 5;
    COMMIT;

Transaction T2:

    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
    START TRANSACTION WITH CONSISTENT SNAPSHOT;
    SELECT t1.a, t2.b
      FROM xtradb_table t1 INNER JOIN pbxt_table t2 ON t1.id=t2.id
     WHERE t1.id = 5;

Then transaction T2 will always see the same value for xtradb_table.a and pbxt_table.b.

(In MariaDB 5.2 and earlier, and MySQL at least upto 5.5, START TRANSACTION WITH CONSISTENT SNAPSHOT does not give any guarantees of consistency between different storage engines. So it is possible, even with a "consistent" snapshot, to see the changes in a trasaction only to InnoDB/XtraDB tables, not PBXT tables, for example.)

Another use for this is to obtain a binary log position that is consistent with a particular transactional state of the storage engine(s) in the database. This is done with two new status variables for the binary log:

  • Variable Name: binlog_snapshot_file
  • Scope: Session
  • Data Type: String
  • Variable Name: binlog_snapshot_position
  • Scope: Session
  • Data Type: Number

These variables give the binary log file and position like SHOW MASTER STATUS does. But they can be queried in a transactionally consistent way. After starting a transaction using START TRANSACTION WITH CONSISTENT SNAPSHOT, the two variables will give the binlog position corresponding to the state of the database of the consistent snapshot so taken, irrespectively of which other transactions have been committed since the snapshot was taken (SHOW MASTER STATUS always shows the position of the last committed transaction). This works for MVCC storage engines that implement the commit ordering part of the storage engine API, which at the time of writing is XtraDB and PBXT.

This is useful to obtain a logical dump/backup with a matching binlog position that can be used to provision a new slave with the original server as the master. First START TRANSACTION WITH CONSISTENT SNAPSHOT is executed. Then a consistent state of the database is obtained with queries, and the matching binlog position is obtained with SHOW STATUS LIKE 'binlog_snapshot%'. When this is loaded on a new slave server, the binlog position is used in a CHANGE MASTER TO statement to set the slave replicating from the correct position.

With the variables binlog_snapshot_file and binlog_snapshot_position, such provisioning can be done fully non-blocking on the master. Without them, is is necessary to get the binlog position under FLUSH TABLES WITH READ LOCK; this can stall the server for a potentially long time, as it blocks new queries until all already started updates have finished.

The mysqldump program has been extended to use these new status variables. This means that a backup suitable for provisioning a slave can be obtained as normal like this:

    mysqldump --single-transaction --master-data ...

The dump will be fully non-blocking if both the mysqldump program and the queried server include the necessary feature (eg. both are from MariaDB 5.2-rpl, 5.3, or higher). In other cases, it will fall back to the old blocking method using FLUSH TABLES WITH READ LOCK.

For more information on the design and implementation of this feature, see MWL#136.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.