FusionIO/DirectFS atomic write support

You are viewing an old version of this article. View the current version here.

Starting with 5.5.31 / 10.0.2 XtraDB and Innodb that are included with now supports atomic writes without using doublewrite buffer.

Partial write operations

When Innodb writes to the filesystem, there is in general case no guarantee that write operation will be complete (not partial), in case of poweroff event, of if operating system crashes at the exact moment write is done. Without detection or prevention of partial writes, integrity of the database can be compromised after recovery. Thus, since its inception, Innodb had mechanism to detect and ignore partial writes via innodb_doublewrite parameter (also innodb_checksum can be used to detect partial write).

Atomic write - a faster alternatve to innodb_doublewrite

The innodb_doublewrite parameter does not come without set of problems. Especially on SSD, writing each page twice can have detrimenal effects (known as write leveling). A better solution would be to directly ask filesystem to provide atomic (all or nothing) write guarantee. As of time of writing, it is only directFS filesystem on FusionIO device that provides atomic write functionality. Now, this functionality is also supported by MariaDB's XtraDB and Innodb storage engines. To switch atomic writess atomic writes instead of doublewrite buffer, add

innodb_use_atomic_writes = 1

to the my.cnf config file.

Following happens if innodb_use_atomic_writes is switched ON

  • in case /innodb_file_flush_method/ is neither O_DIRECT, ALL_O_DIRECT, or O_DIRECT_NO_FSYNC, it is switched to O_DIRECT
  • /innodb_use_fallocate/ is switched ON (files are extended using posix_fallocate rather than writing zeros behind the end of file)
  • Whenever Innodb datafile is openred, a special /ioctl()/ is issued to switch on atomic writes. If the call fails, error is logged and returned to the caller. This means in case system tablespace is not located on atomic write capable device/filesystem, innodb will refuse to start.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.