MariaDB MaxScale Top Filter

In this blog, I am going to discuss using the MaxScale Top filter for query analysis and, in particular, how to use it to easily find the top slow queries for a particular user without using the slow log. The Top filter is a great tool for query analysis. It can be enabled without downtime, and the logs don’t require storage space on the database server itself.

MaxScale is an advanced intelligent database proxy from MariaDB. It provides load balancing and high availability functionality and includes easy-to-use filtering capabilities. The MaxScale functionalities are implemented as modules. So it is easy to integrate the modules needed for your particular use case.

What is Top Filter?

MaxScale’s Filter Modules process requests after MaxScale receives the request from the client but before the service routes the request to a backend database server. Filters can log, alter, or reject the request, depending on the specific filter being used. The Top filter can be placed in that pipeline to monitor and measure the execution time of every statement within a connection and log the details of the longest running statements in a file.

Architecture

MaxScale Top Filter Architecture

How Does it Work?

MaxScale handles the incoming application connections, and depending on the match and exclude parameters, the Top filter keeps track of the statements that are executed in the database and monitors the response time of each query. If the response time is longer than for the other statements that have been recorded, that query and the response time information are added to the ordered list within the filter up to the maximum number “N” to be stored (set with the count parameter).

If you have set  the count option in the MaxScale config file, the Top filter will record only that number of statements. For example, if we set count=10 the Top filter report will include the top 10 longest running statements, plus summary data for that session. Once the session ends, a report will be written to a session-specific log file that is constructed based on the session ID and the filebase parameter.

Top filter Query logging will contain:

  • The top N queries executed by the connection and the execution time for each.
  • The time the connection was opened.
  • Host connection details.
  • The username used in the connection.
  • Duration of the connection.
  • Total number of statements executed.
  • Average execution time for a statement.
Filter Options

filebase – This is a mandatory parameter to add to maxscale.cnf. It creates a basename of the output file for each session.

count – configures the total number “N” of SQL statements to store and report on. If more than “N” statements are executed, only the top “N” with the longest execution times will be kept. The default value is 10, so the output file will contain the top 10 slow queries by default.

match, exclude and options – These parameters can be used to include or exclude queries from logging using regular expressions.The “options” parameter is used to configure how the regex is processed, such as the case sensitivity and the whether extended regex is used.

user – enables Top filter logging for sessions used by the given username. If the parameter is not set, sessions for all users will be logged, except for the sessions excluded by the “source” parameter.

source – Only sessions that originate from this address will be logged.

Example Setup

As an example, I have configured a primary/replica setup with MaxScale as a load balancer. For more details on installation and configuration, see  MariaDB documentation.

Server 1 - 192.168.56.113
Server 2 - 192.168.56.114
MaxScale - 192.168.56.115
 maxctrl > list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────────┐
│ Server  │ Address        │ Port │ Connections │ State           │ GTID            │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────────┤
│ server1 │ 192.168.56.113 │ 3306 │ 0           │ Master, Running │ 0-2-42,1-1-4344 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────────┤
│ server2 │ 192.168.56.114 │ 3306 │ 0           │ Slave, Running  │ 0-2-42,1-1-4344 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────────┘
Top Filter Configuration

Use MaxCtrl to create a new filter or edit the MaxScale config file. To edit the config file, in this Top filter configuration, after the [monitor] module, we need to add module=topfilter under the [LogFilter] tag in the MaxScale config file.

[LogFilter]
type=filter
module=topfilter
filebase=/tmp/slowquery
count=20

Then we need to define the service under the [Service] tag.

[Service]
type=service
router=readconnroute
servers=server1,server2
user=max_mon
password=XXXXX
filters=LogFilter

Once the above parameters have been added to the  MaxScale config file, restart the MaxScale service.

Production Use Cases

Let us take the following use case as an example.

Suppose every day we are running reporting queries at 2 AM. During this time our production resource usage is very high. We have recently added new queries to our reporting module. In the last couple of days, reports have not been generating properly, so we want to collect the top slow queries for the report_user.

MaxScale makes that easy  –  just configure the Top filter and add which user needs to be monitored in the MaxScale config.

Of course there are other (more labor intensive) ways you can get the information. You can use:

  • Slow log – You can find the problem queries using the slow log, but it will take some time and effort. If you have enabled master_accept_reads=true parameter in your MaxScale config, MaxScale routes some reads to the primary server. So you would need to analyze the slow log for both primary and replica servers.

But why would you use either of those methods when the Top filter can do the work for you?

In our case, we choose to use the Top filter in MaxScale. We have configured the Top filter and added the report_user to the MaxScale config file.

Example
[LogFilter]
type=filter
module=topfilter
filebase=/tmp/slowquery
count=20
user=report_user

Then we started the sysbench to load 3M records. Once the data was loaded, we started the run test.

Example
[root@centos13 sysbench]# sysbench select_random_points.lua --tables=1 --table_size=3000000 --mysql-host=192.168.56.115 --db-driver=mysql --mysql-user=report_user --mysql-password=xxxx --mysql-port=4006 --mysql-db=sbtest  --time=600 --report-interval=10 --threads=2 run
 
Initializing worker threads...
Threads started!
[ 10s ] thds: 2 tps: 6690.11 qps: 6690.11 (r/w/o: 6690.11/0.00/0.00) lat (ms,95%): 0.39 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 2 tps: 6528.01 qps: 6528.01 (r/w/o: 6528.01/0.00/0.00) lat (ms,95%): 0.40 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 2 tps: 6495.30 qps: 6495.30 (r/w/o: 6495.30/0.00/0.00) lat (ms,95%): 0.41 err/s: 0.00 reconn/s: 0.00
.
.
.
[ 590s ] thds: 2 tps: 6601.24 qps: 6601.24 (r/w/o: 6601.24/0.00/0.00) lat (ms,95%): 0.40 err/s: 0.00 reconn/s: 0.00
[ 600s ] thds: 2 tps: 6453.91 qps: 6453.91 (r/w/o: 6453.91/0.00/0.00) lat (ms,95%): 0.41 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            3903414
        write:                           0
        other:                           0
        total:                           3903414
    transactions:                        3903414 (6505.65 per sec.)
    queries:                             3903414 (6505.65 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          600.0019s
    total number of events:              3903414
 
Latency (ms):
         min:                                    0.20
         avg:                                    0.31
         max:                                   30.67
         95th percentile:                        0.42
         sum:                              1196337.76
 
Threads fairness:
    events (avg/stddev):           1951707.0000/17520.00
    execution time (avg/stddev):   598.1689/0.01

After completing the run test, we can go to /tmp directory and check the slowquery.0 file. It has a list of slow queries which are run on report_user.

Sample Output
[root@centos13 sysbench]# cat /tmp/slowquery.0 
Top 20 longest running queries in session.
==========================================
 
Time (sec) | Query
-----------+-----------------------------------------------------------------
  300.064  |          SELECT id, k, c, pad
          FROM sbtest1
          WHERE k IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
  1.034    |         SELECT id, k, c, pad
          FROM sbtest1
          WHERE id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        
-----------+-----------------------------------------------------------------
 
 
Session started Sat Jun 19 09:31:57 2021
Connection from ::ffff:192.168.33.13
Username        report_user
 
Total of 2 statements executed.
Total statement execution time     390.5 seconds
Average statement execution time   195.002 seconds
Total connection time              390.9 seconds

From the above results, total executed statements are two (2) and the total execution time for those statements is 390.5 sec. The average statement execution time is  195.002 sec. The total duration for that particular thread is 390.9 sec. Most importantly, we can see by the “Top 20” query report that one query  varies in execution time by a factor of 300 depending on the values used in the query. While this is not enough detail to end troubleshooting here, it provides enough detail to narrow down the troubleshooting to a specific query.

Conclusion

The Top filter in MariaDB MaxScale can be a useful tool for query analysis. It is easy to use. The available parameters provide flexibility for constraining which queries should be filtered. It can be enabled without downtime, and the logs don’t use storage space on the database server itself.

More Information

Find out more about MariaDB MaxScale