Faster, Better, Stronger, InnoDB in MariaDB Server 10.5

InnoDB is the default storage engine for MariaDB Enterprise Server and MariaDB Community Server. While originally based on the MySQL implementation of InnoDB, MariaDB has been diverging from the original for years in order to give MariaDB users a better experience. For example by implementing persistent AUTO_INCREMENT in 10.2, and trx_sys.mutex scalability improvements, instant ADD COLUMN in 10.3, and instant DROP COLUMN and other ALTER TABLE improvements in 10.4. MariaDB Server 10.5 continues on this path.

MariaDB Server 10.5 includes significant improvements to the InnoDB storage engine in MariaDB. By making better use of hardware resources by InnoDB, we’ve improved performance and scalability and have made backup and recovery faster and easier.

Generally speaking the improvements can be grouped into three categories: changes to configuration parameters; changes to background tasks; and changes to the redo log and recovery.

Changes to Configuration Parameters

We have changed how some configuration parameters behave. We have deprecated some and hardwired others.

  • innodb_buffer_pool_instances=1 Hardwired at 1. In our tests, a single buffer pool provides the best performance in almost every instance.
  • innodb_page_cleaners=1 Hardwired at 1. We only have 1 buffer pool instance so only 1 cleaner is needed.
  • innodb_log_files_in_group=1 (only ib_logfile0) Hardwired at 1. In our benchmarks, 1 showed a performance improvement. In 10.2 you could set it to 1 but the default was 2. In 10.5 you can only have a single file.
  • innodb_undo_logs=128 (the “rollback segment” component of DB_ROLL_PTR) Hardwired at 128 which has been the default for a long time. It can no longer be changed.
  • For MariaDB Enterprise Server only made innodb_log_file_size and innodb_purge_threads dynamic. They can be changed with SET GLOBAL without restarting the server.
  • Cleanup of InnoDB Data Scrubbing code. The background operations for scrubbing freed data have been removed, and related configuration parameters have been deprecated. When scrubbing is enabled or page_compressed tables are being used, the contents of freed pages will be zeroed out or freed to the file system by the normal page flushing mechanism.
  • SHOW GLOBAL STATUS improvement. SHOW GLOBAL STATUS now shows several variables from SHOW ENGINE INNODB STATUS. This simplifies monitoring of these parameters.

Changes to Background Tasks

We addressed some maintenance debt by making changes to a number of background tasks.

  • Eliminated background merges of the change buffer. The change buffer was created some 20 years ago for reasons that made sense at the time, but a lot has changed since then. Asynchronous merges of the change buffer have been recognized as a cause of problematic I/O spikes and random crashes. The change buffer intends to speed up modifications to secondary indexes when the page is not in the buffer pool. We will now only merge buffered changes on demand, when the affected secondary index page must be read.
  • Single buffer pool. In extensive testing, the single buffer performs best in almost every case.
    • with a single buffer pool, it is easier to employ std::atomic data members
    • single buffer pool only requires a single page cleaner, single flush list of dirty pages
    • In a write-heavy workload on a tiny buffer pool, we observed some increased contention on the buffer pool mutex, but that was resolved by making the buffer pool more scalable. A refactored, cache-aware hash table implementation with simple std::atomic based read-write locks addressed this scalability problem.
  • Background tasks use thread pool.  Background tasks now use a thread pool rather than separate internal threads. The internal thread pool dynamically creates or destroys threads based on the number of active tasks. In particular, the I/O threads have been cleaned up.

    This simplifies management because it is easier to configure the maximum number of tasks of a kind than to configure the number of threads. Also, in MariaDB Enterprise Server you can set  innodb_purge_threads dynamically with SET GLOBAL.

    There are still some tasks that use dedicated threads such as background rollback of incomplete transactions and crash recovery. And, there are still some tasks that have yet to be moved to the thread pool, for example, the encryption key notation thread.

  • Innodb-wide r/w lock for locking concurrent DDL replaced by metadata locks(MDL) on table name. Now we use metadata locks when executing some background operations. We acquire the lock to prevent users from executing something like DROP TABLE while a purge of history is executing. This used to be covered by dict_operation_lock covering any InnoDB table.  Acquiring only MDL on the table name improves scalability.

Changes to Redo Log and Recovery

We have improved the Redo Log record format to make it more compact and extensible which also makes backup and recovery faster and more reliable. The changes improve the Redo Log and Recovery while still guaranteeing atomicity, consistency, isolation, and durability (ACID), and full crash safety of user transactions.

  • Freed pages log record. Creating log records that indicate when data pages have been freed, for example, when you are dropping an index or performing a large delete of records in a data file accomplishes a couple of things:
    • We now avoid writes of freed pages after DROP (or rebuild). The new log records indicate that the pages will be freed so the contents will no longer be written to the data files.
    • Recovery is faster since recovery can skip reading those pages when it sees that the page was freed.
  • New log record format. The new log record format explicitly encodes lengths which makes the physical format easy to parse and minimizes copying because a record can never exceed innodb_page_size. This makes for stronger and faster backup and recovery.
  • Improved validation of logical records for UNDO_APPEND, INSERT, DELETE. The special logical records improve performance during normal operations and backup and recovery due to fewer writes and reads of data pages. Improved validation detects corrupted data more reliably.
  • Improved group commit. Reduces contention and improves scalability and write speed.
    • group_commit_lock introduced for more efficient synchronization of redo log writing and flushing. It reduces CPU consumption on log_write_up_to(). It also reduces spurious wakeups, and improves the throughput in write-intensive benchmarks.
  • Optional libpmem interface added for improved performance on Intel ® Optane™ DC Persistent Memory Module. Requires compiling from source.
  • Optimized memory management on recovery (or mariabackup --prepare)

Conclusion

The changes we’ve made to the InnoDB storage engine in MariaDB Server 10.5 provide significant performance improvements, scalability and easier management while supporting in-place upgrades from older versions. It will provide a greatly improved experience to our users now and going forward.

Download MariaDB Community Server 10.5 and give a try yourself to see the difference.

More Information