FusionIO/DirectFS atomic write support

You are viewing an old version of this article. View the current version here.

Starting with 5.5.31 / 10.0.2 XtraDB and Innodb that are included with now supports atomic writes without using doublewrite buffer.

h1.Partial write operations.

When Innodb writes to the filesystem, there is in general case no guarantee that write operation will be complete (not partial), in case of poweroff event, of if operating system crashes at the exact moment write is done.

Without detection or prevention of partial writes, integrity of the database can be compromised after recovery. Thus, since its inception, Innodb had mechanism to detect and ignore partial writes via innodb_doublewrite parameter (also innodb_checksum can be used to detect partial write).

A new boolean configuration option _innodb_use_atomic_writes_ has been added for this purpose. By default innodb_use_atomic_writes option is set to 0(OFF). It should be set to 1 (ON), if underlying filesystem and the device supports atomic writes - which as of time of writing means FusionIO device and directFS filesystem.

Upon initialization, InnoDB/XtraDB switches off doublewrite, sets the file flush method to O_DIRECT (unless already set to O_DIRECT or ALL_O_DIRECT (or O_DIRECT_NO_FSYNC in 5.6), and turn on posix_fallocate if innodb_use_atomic_writes option is ON. InnoDB/XtraDB issues an ioctl (DFS_IOCTL_ATOMIC_WRITE_SET) to underlying filesystem for every data file that is opened if innodb_use_atomic_writes option is ON. This new ioctl hints underlying filesystem to transparently convert normal writes (via pwrite(), write(), io_submit() calls) to atomic writes which is guaranteed by the underlying device. Upon ioctl() failure, opening of data file fails, an error is returned to the calling code with a message written in the log file. In case the data file is ibdata1, mysqld does not start. This approach is extensible to other filesystems/devices choose to provide this functionality.

This approach transparently looks at the capabilities of underlying filesystem, and the device and achieves atomicity in a manner which provides improved performance, better latency, and superior flash endurance. This approach also reduces write amplification on flash devices.

To support this, following define has been added in InnoDB/XtraDB. • #define DFS_IOCTL_ATOMIC_WRITE_SET _IOW(0x95, 2, uint) Refer to following flow charts for detailed call sequence.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.