Atomic DDL's
In MariaDB 10.6.1 we have improved reability for DDL's (Data definition language operations) to make most of them operations Atomic and the rest crash safe, even if the server would crash in the middle of the operation.
The design of Atomic/Crash-safe DDL (MDEV-17567) allows it to work with all storage engines.
Definitions
- Atomic means that either the operations succeeds (and is logged to the binary log or is complete reversed.
- Crash safe means that in case of a crash, after the server has restarted, all tables are consistent, there is no temporary files or tables on disk and the binary log matches the status of the server.
- DDL Data definition langue.
- DML Data manipulation language.
- 'DDL recovery log' or 'DDL log' for short, is the new log file,
ddl-recovery.log
by default, that stores all DDL operations in progress. This is used to recover the state of the server in case of sudden crash.
Background why Atomic DDL's are needed
Before 10.6, in case of a crash, there was a small possibility that one of the following things could happen:
- There could be left some temporary tables starting with
#sql-alter
or#sql-shadow
or temporary files ending with ''. - The table in the storage engine and the table's .frm file could be out of sync.
- During a multi-table rename, only some of the tables where renamed.
Which DDL operations are now Atomic
- CREATE TABLE, except when used with CREATE OR REPLACE, which is only crash safe.
- RENAME TABLE] and [[rename-table|RENAME TABLE's.
- CREATE VIEW
- CREATE SEQUENCE
- CREATE TRIGGER
- DROP TRIGGER
- DROP TABLE and DROP VIEW. Dropping multiple tables is only crash safe.
- ALTER TABLE
- ALTER SEQUENCE is not listed above as it is internally implemented as a DML.
Which DDL operations are now crash safe
DROP TABLE of multiple tables.
DROP TABLE over multiple tables is treated as if every DROP is a separate, Atomic operations. This means that after a crash, all fully, or partly, dropped tables will be dropped and logged to the binary log. The not dropped tables will be left untouched.
CREATE OR REPLACE TABLE
CREATE OR REPLACE TABLE foo is implemented as:
DROP TABLE IF EXISTS foo; CREATE TABLE foo ...
This means that if there is a crash during CREATE TABLE
then the original table 'foo' will be dropped even if the new table was not created.
If the table was not re-created, the binary log will contain the
DROP TABLE
.
DROP DATABASE
DROP DATABASE is implemented as:
loop over all tables DROP TABLE table
Each DROP TABLE is atomic, but in case of a crash, things will work the same way as DROP TABLE with multiple tables.
Atomic with different storage engines
Atomic/Crash-safe DDL's work with all storage engines that has either
atomic DDL's internally or are able to re-execute DROP
or RENAME
in
case of failure.
This should be true for most storage engines. The ones that still need some work are:
- The S3 storage engine.
- The partitioning engine. Partitioning should be atomic for most cases, but there are still some known issues that needs to be tested and fixed.
The DDL log recovery file
The new startup option --log-ddl-recovery=path (ddl-recovery.log
by default) can be used to specify the place for
the DDL log file. This is mainly useful in the case when one has a
file system on persistent memory as there is a lot of sync on this
file during DDL operations.
This file contains all DDL operations that are in progress.
At MariaDB server startup, the DDL log file is copied to a file with the same base name but with a -backup.log
suffix. This is mainly done to be able to find out what went wrong if recovery would fail.
If the server would crash during recovery (unlikely but possible), the recovery will continue where was before. The recovery will retry each entry up to 3 times before giving up and proceed with the next entry.
Conclusions
- We believe that a clean separation of layers leads to an easier-to-maintain solution. The Atomic DDL implementation in MariaDB 10.6 introduced minimal changes t o the storage engine API, mainly for native ALTER TABLE.
- In our InnoDB implementation, no file format changes were needed on top of the RENAME undo log that was introduced in MariaDB Server 10.2.19 for a backup-safe TRUNCATE reimplementation. Correct use of sound design principles (write-ahead logging and transactions; also file creation now follows the ARIES protocol) is sufficient. We removed the hacks (at most one CREATE or DROP per transaction) and implemented correct
<<code>rollback<</code>> and <<code>>purge<</code>>
triggers for the InnoDB SYS_INDEXES table. - Numerous DDL recovery bugs in InnoDB were found and fixed quickly thanks to https://rr-project.org. We are still working on one: data files must not be deleted before the DDL transaction is committed.
Thanks to Atomic/Crash-safe DDL's, the MariaDB server is now much more stable and reliable in unstable environments. There is still ongoing work to fix the few remaining issues mentioned above to make all DDL operations Atomic. The target for these are MariaDB 10.7.
See also
- MDEV-17567 Atomic DDL. This MDEV entry links to all other entries related to Atomic operations that contains a lot of information how things are implemented.