Write Optimizations for Qualcomm Centriq 2400 in MariaDB 10.3.5 Release Candidate
MariaDB has been collaborating with Qualcomm Datacenter Technologies in pushing the performance envelop by leveraging innovative ARM-based hardware architecture with MariaDB’s unique database architecture. As part of the Qualcomm Centriq™ 2400 product launch back in Nov 2017, we demonstrated the strong read scalability of MariaDB on this chip. Since then, MariaDB and Qualcomm engineering have been working to improve the scalability of write operations which we would like to share with the developer community today.
We are pleased to announce a number of performance improvements that are being made available in the recently shipped 10.3 release candidate 10.3.4. By leveraging the highly parallelized 48-core Qualcomm Centriq 2400 processor running at 2.6GHz with 6 memory channels in a fully coherent ring architecture, our interest is to extract write performance optimization in a single row write use case for a highly threaded application.
MariaDB uses the sysbench benchmark software to measure performance. In this blog, we'll examine the following 2 benchmarks using sysbench 1.0:
- Oltp_update_index : This simulates updating a single row value by primary key index where a secondary index must be updated as a result of the update.
- Oltp_update_nonindex: This simulates updating a single row value by primary key index where there is no secondary index. This obviously requires less work than oltp_update_index.
What we see is that as the number of concurrent threads increase, the performance is up to 48% faster in 10.3 than 10.2 on the Centriq™ 2400:
The improvements made remove points of contention and optimize for the ARM64 chipset, specifically:
- MDEV-15090 : Reduce the overhead of writing undo log records
- MDEV-15132 : Avoid accessing the TRX_SYS page
- MDEV-15019 : InnoDB: store ReadView on trx
- MDEV-14756 : Remove trx_sys_t::rw_trx_list
- MDEV-14482 : Cache line contention on ut_rnd_ulint_counter()
- MDEV-15158 : On commit, do not write to the TRX_SYS page
- MDEV-15104 : Remove trx_sys_t::rw_trx_ids and trx_sys_t::serialisation_list
- MDEV-14638 : Replace trx_sys_t::rw_trx_set with LF_HASH
- MDEV-14529 : InnoDB rw-locks: optimize memory barriers
- MDEV-14374 : UT_DELAY code : Removing hardware barrier for arm64 bit platform
- MDEV-14505 : Threads_running becomes scalability bottleneck
In summary what this means is that MariaDB will perform significantly better under high levels of concurrent updates improving response times in your applications at peak load.
The improvements will also provide benefits to other chip architectures but a much greater gain is seen on the Centriq™ 2400 due to its design capable of supporting much high thread count. By utilizing physical cores vs hyper-threading a lower number of cores the Centriq™ 2400 demonstrates an additional 13% gain over a comparable reference Broadwell platform.
As Centriq™ 2400 systems come to market this year we are excited to see customer workloads taking advantage of the scalability combined with lower power consumption to run high scale database workloads.