Pipes, Filters and MaxScale

It was back in the 1980’s when I was first introduced to Unix, the concept of scripting and in particular pipes. At that time I remember being really impressed by the concept of small utilities that could be easily strung together to quickly create tools. It was a great way to implement those one off jobs or things that would not warrant the development of a specific tool, or so it seemed. I soon realised that actually, for some purposes, there was no need to go on a develop a bespoke tool to do the job, scripts were good enough. What does this have to do with MySQL and MaxScale?

The answer lies in a new feature that has been introduced in version 0.7 of MaxScale, filters. Filters are simple MaxScale plugins that can be inserted into the processing stream of a MaxScale service. One or more filters can be placed between the client application protocol module and the router module within MaxScale. The request data received from the client is passed to the first filter in the chain, that filter may examine and optimally modify the data before passing it on to the next filter in the chain. This is termed downstream filtering in MaxScale and allows for the implementation of such filters as query modifiers, logging and filtering. In the next release of MaxScale support will also be added such that a filter may also receive the upstream data, the result set returned by the database. Upstream filtering will be optional as not all filters require access to the data.

The concept is to bring some of the power and flexibility of Unix pipes to the SQL world. Because the filter API is relatively simple, and MaxScale provides lots of utility functions to make life easier, the task of writing a filter is relatively easy. Also there are some filters already available and the hope is that more will be provided, both by us and via community contribution.

The advantage of this approach is that, because the filters are in MaxScale, the way an application interacts with the database can be altered without the need to modify either the application or the database. As the catalogue of filters grows it is hoped that filters may be combined in order to allow MaxScale to not only be a useful proxy and scaling tool, but also to add to the utility of database programming.

In the 0.7 release one of the most interesting filters included is the regex filter, this can be thought of sed for MySQL. It effectively allows SQL statements to be modified as they pass through MaxScale, matching a pattern and substituting a replacement string. Although a simple example of a filter, it does have some practical use. Imagine you have an application that uses syntax that is no longer supported, such as the MySQL create table syntax from the days of 5.1 which used the keyword TYPE and has now been superceeded with ENGINE. Rather than modify your application you can merely put a filter in place that will match the TYPE keyword and replace it with ENGINE.

Another filter included in the 0.7 release is a logging filter, it simply writes a copy of every SQL command sent through MaxScale into a log file. A new log file is created for each client connection through MaxScale. This is a simple way to see what an individual connection is doing without modifying your MySQL logging levels or your application. In the next version another filter will be available, that uses the upstream filtering to monitor when results are returned. This filter keeps only the top N queries by execution duration and writes a log file when the connection is closed. This file list the queries that took longest to execute within the session. The number of queries it reports can be configured via a filter parameter.

I wanted to bring some of the flexibility and power of Unix scripting and pipes to the MySQL world and hope that the filtering mechanism in MaxScale achieves at least some of this. There is still a lot of work to be done, but release 0.7 at least previews some of this functionality. Others features yet to come are the ability to have branches in the filter chain, essentially the Unix tee command for MySQL connections. This will open up a number of new areas, either diverting or duplicating statements into a second database server or, due to the way MaxScale allows for protocol plugins, to other non-MySQL data stores.

A repost from Mark’s MaxScale Blog.