InnoDB Quality Improvements in MariaDB Server

spacer

InnoDB is the default storage engine used for MariaDB and is especially great for mixed read and write workloads. Over the last few years, we’ve delivered a number of quality improvements in the InnoDB storage engine.  For me, quality means not only correctness, but also portability, code clarity, performance and user-friendliness. Sometimes these are tightly related. For example, the way InnoDB used atomic memory operations was not compatible with some hardware platforms. The fixes we implemented two years ago with MariaDB Server version 10.2 not only fixed the portability, but also clarified the code and removed unnecessary code, which slightly improves performance on all platforms.

Performance improvements

Last year, with MariaDB Server 10.3, we improved the performance of the InnoDB transaction subsystem by introducing lock-free data structures and removing writes to the TRX_SYS page. While working on this, we fixed some bugs in the transaction recovery code that we had inherited via the MySQL code base.

Usability improvements
If you’re using a recent version of MariaDB Server 10.2 or later, you’ll notice it should start up faster with clean progress reporting for recovery, purge of transaction history faster, and shut down more quickly while playing nice with systemd.

In MariaDB Server 10.3, the effort to reduce fil_system hash table lookups slightly improved performance and it also eased diagnostics and debugging, by making tablespace and file name information directly available in many data structures.

InnoDB and Mariabackup Stability Improvements

With InnoDB, you get a persistent database, which guarantees Atomicity, Consistency, Isolation and Durability (ACID) of concurrently executing transactions. My 2018 presentation at MariaDB’s user conference Deep Dive: InnoDB Transactions and Write Paths sheds some light on this.

A single row operation in InnoDB is not atomic. InnoDB would first write an undo log record, then modify each index of the table separately. We have had a few public MariaDB bug reports where the indexes of an InnoDB table are inconsistent with each other, and I remember a few private bugs from my Oracle days. MDEV-18272 recently fixed one source of such corruption, which was introduced back in MySQL 5.5.7 and was very difficult to find.

It used to be common knowledge that you cannot back up InnoDB while DDL operations such as ALTER, RENAME or TRUNCATE TABLE are being executed. Fixing this in Mariabackup 10.2 involved extending the InnoDB undo and redo log formats so that file operations inside InnoDB are crash-safe and covered by a single redo log.

Mariabackup basically is an extreme test of InnoDB crash recovery code, because a large amount of write-ahead log (redo log) could be written by InnoDB while the physical backup is running. Also, incomplete data pages could be copied due to sloppy checksum validation; the new file format innodb_checksum_algorithm=full_crc32 in MariaDB Server 10.4 should be a significant improvement.

Starting with MariaDB Server 10.2.24 and 10.3.15, InnoDB recovery is faster and more reliable, because it avoids reads of data pages that can be reconstructed solely based on redo log records. Those changes increased the repeatability of some pre-existing crash recovery bugs. Thanks to the Random Query Generator (RQG) grammar simplifier, we were able to narrow down and fix those bugs, among them missing logging of ALTER TABLE operations (as well as unnecessary logging), something that MariaDB inherited from MySQL 5.7.

How We Improve Stability

It helps to have an open culture where collaboration between teams and with the community is encouraged. But in the end, I think that the systematic use of tools and methodology can make a big difference when it comes to the correct operation of software, especially on parallel and distributed systems.

AddressSanitizer instrumentation
Most C or C++ programmers have probably heard about Valgrind, which is a binary-to-binary translator that instruments executable code to catch bugs related to memory allocation and initialization. Because it runs a kind of a single-threaded emulator, its ability to catch errors related to parallel execution in multi-threaded programs is limited.

The various Sanitizers that are available in the Clang and GCC compilers are a welcome alternative. More information is available when the instrumentation is applied by the compiler. For example, Valgrind would be unable to report stack-use-after-scope.

AddressSanitizer almost replaces Valgrind. Use of uninitialized memory would be tracked by MemorySanitizer, which we plan to implement next.

For custom memory allocators in MariaDB, we already had some Valgrind instrumentation, and recently we added AddressSanitizer instrumentation. One case of accessing freed memory was reported over 100 times faster by AddressSanitizer than by Valgrind. It’s these types of tools that enable us to deliver more stability faster.

Random Query Generator and grammar simplifier
The RQG can submit SQL statements to the database. The concurrent execution of random statements from multiple connections can test for edge cases and glitches. In InnoDB, we have numerous debug assertions for documenting and enforcing our assumptions and invariants. The assertions are our ‘lottery tickets’ for testing, along with AddressSanitizer instrumentation.

Often it is very hard to debug a crash that was generated by a large RQG grammar, involving lots of irrelevant client connections. That is where the grammar simplifier comes into play. It will eliminate grammar rules while preserving the crash that is being investigated.

I first learned to appreciate the RQG grammar simplifier back in my Oracle days, when I was implementing online ALTER TABLE. During QA, my dedicated tester Matthias Leich would serve me with tricky test cases that would require code patches. I had only fond memories of that intensive QA process until recently, when Elena Stepanova tested the instant ADD COLUMN in MariaDB 10.3, finding many bugs that we were able to fix immediately. Last year, we were able to strengthen the InnoDB team even more by bringing Matthias and two developers onboard at MariaDB.

RQG has also proved useful in bug hunting. We have been able to reproduce and fix many elusive bugs, often before receiving any external reports.

Fault injection tests in the regression test suite
The regression tests that run on our continuous integration platforms include a few tests for InnoDB crash recovery. Starting with MariaDB Server 10.2, InnoDB does not leak memory after refusing to start up due to errors. This was a necessary improvement for successfully running the entire regression test suite under AddressSanitizer. Some tests carefully exercise the recovery and handling of incomplete transactions. Recently, after I realized that log checkpoints are harmful when testing recovery, we were able to reproduce a cause of MySQL 5.5 Bug #61104 (corruption of secondary indexes via the change buffer), which we hope to fix soon.

Contributors welcomed

While we share the same heritage with MySQL, MariaDB is a very different database today. We are no longer merging new MySQL features with MariaDB so moving forward we won’t inherit MySQL bugs (a lot of which were mentioned above). We are developing features you’ll never get in MySQL. For example, MariaDB’s extensible architecture means you can use different purpose-built storage engines to fit your workload requirements, whether it’s InnoDB for mixed reads and writes, MyRocks for write intensive, Spider for extreme scale or ColumnStore for analytics. Watch this recent webinar about our storage engines architecture to learn more about these important storage engines. Finally, our development process is open and transparent, as opposed to Oracle MySQL’s closed development. It can be much more rewarding to work with MariaDB who balances customer and community needs as well as work with community members on making their contributions good for merging.

InnoDB quality has been a major focus for MariaDB over the years and we’re excited to roll out new improvements with the next release of MariaDB Server soon. Thanks to my incredible colleagues, such as Thirunarayanan Balathandayuthapani, Valerii Kravchuk, Eugene Kosov, Matthias Leich, Elena Stepanova and Sergey Vojtovich, for making this a great product.