Improving the Performance of MariaDB MaxScale

spacer

Performance has been a central goal of MariaDB MaxScale’s architecture from the start. Core elements in the drive for performance is the use of asynchronous I/O and Linux’ epoll. With a fixed set of worker threads, whose amount is selected to correspond to the number of available cores, each thread should either be working or be idle, but never be waiting for I/O. This should provide good performance that scales well with the number of available CPUs. However, benchmarking revealed that was not entirely the case. The performance of MaxScale did not continuously increase as more worker threads were added, but after a certain point the performance would actually start to decrease.

When we started working on MariaDB MaxScale 2.1 we decided to investigate what the actual cause for that behaviour was and to make modifications in order to improve the situation. As will be shown below, the modifications that were made have indeed produced good results.

Benchmark

The results below were produced with sysbench OLTP in read-only mode. The variant used performs 1000 point selects per transaction and the focus is on measuring the throughput that the proxy can deliver.

In the benchmark two computers were used, both of which have 16 cores / 32 hyperthreads, 128GB RAM and an SSD drive. The computers are connected with a GBE LAN. On one of the computers there were 4 MariaDB-10.1.13 backends and on the other MariaDB MaxScale and sysbench.

MariaDB MaxScale 2.0

The following figure shows the result for MariaDB MaxScale 2.0. As the basic architecture of MaxScale has been the same from 1.0, the result is applicable to all versions up to 2.0.

2.0-1000-selects.png

In the figure above, on the vertical axis there is the number of queries performed by second and on the horizontal axis there is the number of threads used by sysbench. That number corresponds to the number of clients MariaDB MaxScale will see. In each cluster, the blue bar shows the result when the backends are connected to directly, and the red, yellow and green bars when the connection goes via MaxScale running with 4, 8 and 16 threads respectively.

As long as the number of clients is small, the performance of MariaDB MaxScale is roughly the same irrespective of the number of worker threads. However, already at 32 clients it can be seen that 16 worker threads perform worse than 8 and at 64 clients that is evident. The overall performance of MaxScale is clearly below that of a direct connection.

MariaDB MaxScale 2.1

For MariaDB MaxScale 2.1 we did some rather significant changes regarding how the worker threads are used. Up until version 2.0 all threads were used in a completely symmetric way with respect to all connections, both from client to MaxScale and from MaxScale to the backends, which implied a fair amount of locking inside MaxScale. In version 2.1 a session and all its related connections are pinned to a particular worker thread. That means that there is a need for significantly less locking and that the possibility for races has been reduced.

The following figure shows the result of the same benchmark when run using MariaDB MaxScale 2.1.

2.1.0-1000-selects.png

Compared with the result of MaxScale 2.0, the performance follows and slightly exceeds the baseline of a direct connection. At 64 clients, 16 worker threads still provide the best performance but at 128 clients 8 worker threads overtake. However, as these figures were obtained with a version very close to the 2.1.0 Beta we are fairly confident that by the time 2.1 GA is released we will be able to address that as well. Based on this benchmark, the performance of 2.1 is roughly 45% better than that of 2.0.

MariaDB MaxScale and ProxySQL

ProxySQL is a MySQL proxy that in some respects is similar to MariaDB MaxScale. Out of curiosity we ran the same benchmark with ProxySQL 1.3.0, using exactly the same environment as when MaxScale was benchmarked. In the following we will show in different figures how MaxScale 2.0, MaxScale 2.1 and ProxySQL 1.3.0 compares, when 4, 8 and 16 threads are used.

2.0-2.1-ProxySQL-1000-selects-4thr.png

With 4 threads, ProxySQL 1.3.0 performs slightly better than MaxScale 2.0 and MaxScale 2.1 performs significantly better than both. The performance of both MaxScale 2.1 and ProxySQL 1.3.0 start to drop at 256 client connections.

2.0-2.1-ProxySQL-1000-selects-8thr.png

With 8 threads, up until 128 client connections the order remains the same, with MaxScale 2.1 at the top. When the number of client connections grows larger than that, MaxScale 2.1 and ProxySQL 1.3.0 are roughly on par.

2.0-2.1-ProxySQL-1000-selects-16thr.png

With 16 threads, up until 64 client connections the order is still the same. When the number of client connections grows larger than that, MaxScale 2.1 and ProxySQL 1.3.0 are roughly on par. The performance of both drop slightly compared with the 8 thread situation.

Too far reaching conclusions should not be drawn from these results alone; it was just one benchmark, MaxScale 2.1 will still evolve before it is released as GA and the version of ProxySQL was not the very latest one.

Summary

In MariaDB MaxScale 2.1 we have made significant changes to the internal architecture and benchmarking shows that the changes were beneficial for the overall performance. MaxScale 2.1 consistently performs better than MaxScale 2.0 and with 8 threads, which in this benchmark is the sweetspot for MaxScale 2.0, the performance has improved by 45%.

If you want to give MaxScale 2.1.0 Beta a go, it can be downloaded at https://downloads.mariadb.com/MaxScale/.