Which SQL queries take all the time? Using MaxScale to answer that age old question.

Here we take a look at how one of the example filters supplied with the MaxScale 1.0 beta can answer that simplest of profiling questions – “Which of my database queries run within the MySQL server for the longest time?”.

Why A MaxScale Filter?

MaxScale filters, although they can be thought of as simply a way to modify the SQL requests that pass through MaxScale, are also a good place to insert logging or monitoring. They have the advantage that that receive not just the SQL statements that are sent to the server, but they may also receive the returned results. This means that MaxScale can not just log the query information, but it can also record timestamps related to when it observes different parts of the request or response. As a way to demonstrate this a very simple example filter was constructed for the 1.0 Beta release of MaxScale – the top filter.

What Does Topfilter Do?

The topfilter filter observes both requests being sent to the server and responses begin sent back. It sends both the requests and the responses onwards unmodified, however it takes note of the request contents, provided it is a SQL request, along with a timestamp when it saw the request. When it sees the first response come back from the server it also takes a timestamp and uses the difference as a measure of the execution time of the statement. The longest running queries are retained, along with the execution time data. the number of retained queries can be configured within the MaxScale configuration of the filter. When the client connection is closed a report will be written to file with the longest running queries that were observed for that connection.

How Do I Setup My System

The first step in creating a system to allow you to measure your queries is to configure a MaxScale service that is suitable for your application. That may be any of the supported routers; e.g. the readconnroute or Read/Write Splitter routers. Here we will assume that we only have one server, we are not using MaxScale for the scale-out abilities of the tool, merely as a simple performance tuning aid. Therefore we will use the readconnroute router with the option to connect to any server, the only server we have. The basic configuration for the service is as follows:

[Top10] type=service router=readconnroute servers=dbnode1 user=maxadmin passwd=6628C50E07CCE1F0392EDEEB9D1203F3

The service also requires a listener, to define the port on which MaxScale will listen to requests to connect to the service and to link the service with the protocol module. [Top10 Listener] type=listener service=Top10 protocol=MySQLClient port=7706

We have used port 7706 for MaxScale to listen to incoming requests, this will allow the MaxScale to be located on the same machine as the MySQL/MariaDB server, any port may be chosen by the user provided it is not already in use. If no MySQL instance is running on the same machine the default MySQL port of 3306 may be used.

In this example we will assume we have a single database node; which we will call dbnode1.

The database node should be configured as follows; [dbnode1] type=server address=… port=3306 protocol=MySQLBackend

The address= entry should contain the IP address or a hostname for the server. The assumption here is that the MySQL server is using the default MySQL port of 3306 to listen for incoming requests.

For completeness we should also add a monitor module that can determine if the node is in the correct state to communicate. This is more important if the backend is not a single database server, but a cluster of servers. It is good practice to always include a monitor even when only a single node is configured as it allows for ease of extension to a cluster and provides more information as to the state of the system. [MySQLMonitor] type=monitor module=mysqlmon servers=dbnode1 user=maxmonitor passwd= 6628C50E37CCE1F0392EDEEB9D1403F3

We now have a configuration that basically allows MaxScale to be used as a pure proxy for accesses to the database. There is no intelligence within MaxScale begin used, as we have not options to route to more than a single database. Although the principles here are still true if multiple backend databases are given and the more usual scale-out properties of MaxScale are used, in order to reduce the complexity and illustrate the technique only a single backend database has been used here.

At this point the application should work if connected to port 7706 on the MaxScale host, but no monitoring data will be collected. We most add the filter into the MaxScale configuration.

To add the filter we must first create a filter definition that links our filter module to the parameters we want for this particular case. Assuming we want to capture the 10 longest running queries and write them into a log file in the directory /var/log/myapp with filenames of the form session.$$, a filter section as follows can be used. [top10] type=filter module=topfilter count=10 filebase=/var/log/myapp/session.

We then need to add this filter to the server by adding an extra line to the definition to the filter [Top10] type=service router=readconnroute router_options=running servers=dbnode1 user=maxadmin passwd=6628C50E07CCE1F0392EDEEB9D1203F3 filters=top10

Now if we connect to the port 7706 with our application we will get a number of files created in /var/log/myapp, these files will contain reports of the statements executed. An example of such a report is shown below.

Session started Thu Jun 19 17:06:34 2014 
Connection from 127.0.0.1 
Username mark 
Total of 24 statements executed. 
Total statement execution time 35.653 seconds 
Average statement execution time 1.486 seconds 
Total connection time 80.443 seconds 
Top 10 longest running queries in session. 
========================================== 
Time (sec) | Query 
-----------+----------------------------------------------------------------- 
    22.856 | select sum(salary), year(from_date) from ... 
     5.301 | select d.dept_name as "Department", y.y1 ... 
     2.898 | select year(now()) - year(birth_date) as age, ... 
     2.221 | select dept_name as "Department", sum(salary) / 12 ... 
     0.839 | select dept_name as "Department", avg(year(now()) - ... 
     0.665 | select year(hire_date) as "Hired", d.dept_name, ... 
     0.337 | select moves.n_depts As "No. of Departments",... 
     0.245 | select year(now()) - year(birth_date) as age, ... 
     0.175 | select year(hire_date) as "Hired", count(*) as ... 
     0.160 | select year(hire_date) - year(birth_date) as ... 
-----------+----------------------------------------------------------------- 

Please note, the queries have been truncated in this blog entry, an actual run would include the complete query text.

Refining The Selection Criteria

Other options may be given to the top filter to reduce the subset of queries it will consider; regular expression patterns may be used to include or exclude particular statements, the client address may be used to limit the sessions to only those originating from that address or the client user name may be used to limit the functionality to only that user.

I am only interested in select performance

In this case you can use the regular expression capabilities to limit the report to simply show select statements, other statements will not be included in the ranking. To do this modify the filter section with the extra parameter match= [top10] type=filter module=topfilter count=10 match=select filebase=/var/log/myapp/session.

I want to see what data modification statements take a long time

In this case, rather than build a complex match expression the exclude= option can be used to exclude any select statements. [top10] type=filter module=topfilter count=10 exclude=select filebase=/var/log/myapp/session.

I only want select statements from a particular database user

The match= option can be used to restrict the logging to just select statements and the additional user= parameter can be used to further limit the statements that are including in the ranking. [top10] type=filter module=topfilter count=10 match=select user=reporting filebase=/var/log/myapp/session.

Conclusion

MaxScale was not designed as a tool that would be used to analyse the performance of your SQL application, and it is not as good or as extensive in this role as some of the purpose written tools, but the filters and the general plugin architecture of MaxScale mean that it can be used to perform tasks it was not originally conceived to do. The filter presented here is only really a demonstration of what can be achieved, it was written very quickly and did not require very much code to implement. However it hopefully can be useful in its own right and also to stimulate more complex and useful filters to be created.