Recent InnoDB Enhancements that Boost Performance in MariaDB Server

spacer

The performance of MariaDB Server has been improved over the years thanks to many enhancements to the InnoDB storage engine and feedback from challenging performance tests. In this blog, I will explain the different approaches we took with InnoDB to boost MariaDB Server performance.

“MariaDB is faster than MySQL”

The prominent database performance engineer Mark Callaghan has recently covered MariaDB in his blog titled “MariaDB is faster than MySQL.” His diversion from regular coverage of MySQL has been noticed. One of the key takeaways from Mark’s work is that a development snapshot of MariaDB Server 11.4.3 shows 7 – 36% higher queries per second (QPS) performance when compared to MySQL 8.0.38.

workloadminimummaximumaveragemedian
point-11.051.271.091.08
point-21.061.091.071.07
range-11.031.351.121.11
range-21.101.371.191.11
writes0.832.061.361.37

MariaDB was faster than MySQL on all microbenchmarks except for update-inlist.

Importance of Long Test Runs

Long test runs may be needed for the system to reach a steady state where meaningful conclusions can be drawn. This is especially true about the InnoDB buffer pool and the purge subsystem that keeps cleaning up the history of committed transactions after MVCC reads no longer need to access old versions of data.

Not only steady states may be interesting, but also transient states, such as in the recently introduced and fixed performance regressions MDEV-33508 and MDEV-34458, which affect page replacement in the buffer pool. Unfortunately we missed these in code review as well as in our regular performance testing. Mark did not find these ones either.

The Challenge in Running Performance Tests

Performance testing is hard. The state space explosion is a challenge in any testing of a complex system. In performance testing, the combinatorial explosion of configuration parameters is multiplied by many combinations of workload and hardware parameters.

Yes, it is more straightforward to test an in-memory workload, because the throughput and latency will be more predictable and consistent than something that might involve many parameters of several underlying subsystems, such as the operating system’s file system cache, the flash erase cycles on a SSD, or other factors that might be difficult to control. However, it would be useful to monitor some trends, such as the wall-clock or CPU time or IOPs consumed during the warm-up phase of benchmarks. While these numbers could fluctuate a lot between individual runs, monitoring the trends would allow to find systematic changes that would affect real-world workloads.

Use of Microbenchmarks

We are more or less bound to rather simple microbenchmarks. The good news is that most parts of the InnoDB storage engine can be exercised rather well by them. The MariaDB version of InnoDB has diverged quite a bit from MySQL. This includes some basic improvements, such as reducing the use of heap memory allocations, simplifying some logic, as well as some more ambitious refactoring related to concurrency (MDEV-18746, MDEV-13935, MDEV-33757, MDEV-14638, MDEV-20612, MDEV-21534, MDEV-24341 to name some examples).

Changes to File Formats

Some performance improvements have involved changes to file formats. Years ago, MariaDB Community Server 10.3 removed an old bottleneck, accessing the transaction system page to update the maximum transaction identifier; we can recover it from the undo log headers that we would traverse on startup anyway. Another bottleneck was that log writes (including zero-filling, computing checksums and encrypting data) were covered by a single mutex, which was a huge bottleneck. A central design goal of the new write-ahead log was that each writer can compute checksums for their own data (after optionally encrypting it), independent of other threads. Even the copying of the log snippets is concurrent (MDEV-27774). Related to this, in MDEV-33515 we found that there is no ‘one size fits all’ solution: for high concurrency workloads we reluctantly had to introduce a parameter SET GLOBAL innodb_log_spin_wait_delay so that certain high-end CPUs can benefit from a userspace spinlock, while older CPU generations will not suffer from CPU and memory bus thrashing.

Avoiding Changes to File Formats When Possible

We try to avoid changes to file formats when possible. MariaDB Server 10.5 and 10.6 use the same InnoDB data and log file format, but 10.6 supports crash-safe DDL with strict write-ahead logging (MDEV-24626, MDEV-25180, MDEV-25506) and with a simpler, faster logic of creating data files in DDL operations. There are no complex upgrade procedures. It is even possible (but not recommended and not officially supported) to kill a 10.6 server during a DML-only workload and start up 10.5.

Another example is the InnoDB doublewrite buffer, the 64+64 pages residing in the InnoDB system tablespace. Its purpose is to allow recovery in case the MariaDB server process was killed in the middle of a data page. Recovery will examine the doublewrite buffer for any valid pages and if needed, recover a corrupted data page based on a good copy in the doublewrite buffer. The doublewrite buffer used to be written using synchronous I/O. Senior Software Engineer Vladislav Vaintroub pointed out that such writes often conflict with another system call that is making writes durable, forming a serious performance bottleneck. In MariaDB Server 10.5, we refactored the doublewrite buffer to use asynchronous writes. On doublewrite completion, the data pages will be written to their final location. We also doubled the capacity of the buffer, to further leverage the power of asynchronous data page writes. We maintain 128+128 pages in main memory and actually write 128 pages at a time to the doublewrite buffer while filling the other 128-page buffer in RAM. Before writing the other set of 128 pages into the doublewrite buffer we will wait for the previous batch to complete, including the writes to the final location. There are some further optimizations around this, such as skipping the doublewrite buffer for pages that can be recovered based on log records (MDEV-15528, MDEV-19738).

Improvements in Usability

Slightly related to performance, we have also improved usability, for example by allowing many parameters to be changed while the server is running, and by allowing the InnoDB system tablespace to be shrunk (MDEV-14795, MDEV-19229). Changing the number of undo tablespaces used to require a rebuild of the entire database instance from a logical dump. For ease of maintenance, MariaDB Enterprise Server 11.4 will switch to multiple InnoDB undo tablespaces by default. Enabling multiple undo tablespaces instead of storing the undo log in the InnoDB system tablespace will also make it faster to allocate undo log pages for transactions, because not all threads are forced to compete for the same allocation bitmap page in the system tablespace.

Keep your suggestions coming!

Improving performance, stability and usability is an ongoing effort. Feedback, enhancement ideas and bug reports are always welcome.