Enhancements for START TRANSACTION WITH CONSISTENT SNAPSHOT
Note: The feature described in this page is also available in the MariaDB 5.2 replication feature preview release
NOTE: The PBXT part of this feature is still experimental. It has not been finally scheduled for a release due to not yet being reviewed by the PBXT maintainers. It is included in the MariaDB-5.2-rpl feature preview, but may or may not make it into MariaDB 5.3.
With the introduction of group commit, MariaDB also introduces an
enhanced storage engine API for COMMIT that allows engines to coordinate
commit ordering and visibility with each other and with the binary log. With
this, the START TRANSACTION WITH CONSISTENT SNAPSHOT
statement has been
enhanced to ensure consistency also in-between storage engines that support the new
API. At the time of writing, there supporting engines are XtraDB and PBXT. In
addition, the binary log, while not a storage engine as such, also supports
the new API and can provide a binlog position consistent with storage engine
transaction snapshots.
This means that with transaction isolation level at least REPEATABLE READ, the
START TRANSACTION WITH CONSISTENT SNAPSHOT
statement can be used to ensure
that queries will see a transaction-consistent view of the database also
between storage engines. It is then not possible for a query to see the
changes from some transaction T in XtraDB tables without also seeing the
changes T makes to PBXT tables. (within a single transactiona storage engine
like XtraDB or PBXT, consistency is always guaranteed even without using START
TRANSACTION WITH CONSISTENT SNAPSHOT
).
For example, suppose the following two transactions run in parallel:
Transaction T1:
BEGIN; SET @t = NOW(); UPDATE xtradb_table SET a= @t WHERE id = 5; UPDATE pbxt_table SET b= @t WHERE id = 5; COMMIT;
Transaction T2:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; START TRANSACTION WITH CONSISTENT SNAPSHOT; SELECT t1.a, t2.b FROM xtradb_table t1 INNER JOIN pbxt_table t2 ON t1.id=t2.id WHERE t1.id = 5;
Then transaction T2 will always see the same value for xtradb_table.a
and
pbxt_table.b
.
(In MariaDB 5.2 and earlier, and MySQL at least upto 5.5, START TRANSACTION
WITH CONSISTENT SNAPSHOT
does not give any guarantees of consistency between
different storage engines. So it is possible, even with a "consistent"
snapshot, to see the changes in a trasaction only to InnoDB/XtraDB tables, not
PBXT tables, for example.)
Another use for this is to obtain a binary log position that is consistent with a particular transactional state of the storage engine(s) in the database. This is done with two new status variables for the binary log:
- Variable Name:
binlog_snapshot_file
- Scope: Session
- Data Type:
String
- Variable Name:
binlog_snapshot_position
- Scope: Session
- Data Type:
Number
These variables give the binary log file and position like SHOW MASTER STATUS
does. But they can be queried in a transactionally consistent way. After
starting a transaction using START TRANSACTION WITH CONSISTENT SNAPSHOT
, the
two variables will give the binlog position corresponding to the state of the
database of the consistent snapshot so taken, irrespectively of which other
transactions have been committed since the snapshot was taken (SHOW MASTER
STATUS
always shows the position of the last committed transaction). This works for
MVCC storage engines that implement the commit ordering part of the storage
engine API, which at the time of writing is XtraDB and PBXT.
This is useful to obtain a logical dump/backup with a matching binlog position
that can be used to provision a new slave with the original server as the
master. First START TRANSACTION WITH CONSISTENT SNAPSHOT
is executed. Then a
consistent state of the database is obtained with queries, and the matching
binlog position is obtained with SHOW STATUS LIKE 'binlog_snapshot%'
. When
this is loaded on a new slave server, the binlog position is used in a CHANGE
MASTER TO
statement to set the slave replicating from the correct position.
With the variables binlog_snapshot_file and binlog_snapshot_position, such
provisioning can be done fully non-blocking on the master. Without them, is is
necessary to get the binlog position under FLUSH TABLES WITH READ LOCK
;
this can stall the server for a potentially long time, as it blocks new
queries until all already started updates have finished.
The mysqldump program has been extended to use these new status variables. This means that a backup suitable for provisioning a slave can be obtained as normal like this:
mysqldump --single-transaction --master-data ...
The dump will be fully non-blocking if both the mysqldump program and the
queried server include the necessary feature (eg. both are from MariaDB
5.2-rpl, 5.3, or higher). In other cases, it will fall back to the old
blocking method using FLUSH TABLES WITH READ LOCK
.
For more information on the design and implementation of this feature, see MWL#136.