January 23, 2015

MaxScale Firewall Filter

maxscaleMaxScale’s filter system is very flexible and enables a new way of interacting with queries. The upcoming firewall filter shows just one of the many ways that you can control and manage the flow of queries through MaxScale.

The firewall filter is meant to offer finer and more varied control over queries and their execution. The main idea of the filter is to work as a base to build and improve upon and to show just how that can be done with MaxScale.

The filter provides a variety of ways to control which kinds of queries get executed. The simplest ones block queries that happen during a certain time range or do a certain operation. The more complex ones can match queries using a regular expression, check for the existence of a WHERE clause in the query or deny the query on the basis of the current session’s query speed. These rules can be applied to specific users or network ranges or the combination of these two. This is done to allow the rules of the filter to be more easily integrated into an existing environment which already has rules in the database.

Rules and Users

To define rules for the filter first the name of the rule must be defined. This is used when the rules are applied to users. After that the keyword ‘deny’ marks the beginning of the rule’s content. The required part for the content is one of the wildcard, columns, regex, limit_queries or no_where_clause keywords. The wildcard keyword requires no other parameters and only checks if the query uses the wildcard character. The columns keyword requires a list of columns and checks if the query targets one of these. The regex keyword expects a regular expression enclosed in single or double quotes. All quotes inside the regular expression need to be escaped with the backslash character. The limit_queries keyword requires three parameters. It first expects a number of queries then the time period in seconds in which the queries are measured and finally the time in seconds for which further queries are denied if the amount of queries exceeds the first parameter during the time period defined in the second parameter. The no_where_clause keyword checks if the query has a WHERE -clause.

Syntax for defining the rules

rule NAME deny [wildcard | columns VALUE ... | regex REGEX |
limit_queries COUNT TIMEPERIOD HOLDOFF | no_where_clause]
[at_times VALUE...] [on_queries [select|update|insert|delete]]

Syntax for applying rules to users

users NAME ... match [any|all] rules RULE ...

Configuration

To allow easy modification of rules without having to touch MaxScale’s configuration file this filter uses an external text file to store the rules. All the rules and the users to whom they are applied to need to be in this file. This file is provided to the filter by entering the path of the rule file into the configuration file. Here is an example configuration of the firewall filter with the ‘rules’ variable being the one that points to the file that contains the rules.

[Firewall]
type=filter
module=fwfilter
rules=/home/user/rule_file

The contents of the rule file could be as follows.

rule peak_hour1 deny limit_queries 1000 1.0 5 at_times 17:00:00-17:30:00
rule peak_hour2 deny wildcard at_times 17:00:00-17:30:00
rule personal deny columns salary phone on_queries update|delete
users maxuser@192.168.% match any rules personal
users %@% match any rules peak_hour1 peak_hour2

This defines three rules and applies one of them to the user ‘maxuser’ from the address 192.168.% and the other two to all users from any network. The first rule states that if the speed of the incoming queries exceeds 1000 queries per second then further queries are blocked for the next five seconds. The second one denies the usage of the wildcard. These two rules are only active between the times 17:00 and 17:30. The third rule denies updates and deletes that target the columns ‘salary’ or ‘phone’.

What does this allow me to do?

Let’s say I’m facing a problem with a couple of large and complex read queries taking too much time and slowing down a large database too much. Now let’s say we can’t route them to a secondary slave during peak hours because resources are limited and we still do not want to allow these queries to slow the database down. If we knew exactly who was doing these queries we could just block those users and be done with it. But in this situation every query from those users is blocked.

This is where the firewall filter comes in. If we know what the heavy queries look like (let’s say we used MaxScale’s topfilter to find it out) we could come up with a regular expression that matches one, a part or all of them. Also we could only make the rule active during peak hours when the performance of the database is critical. By detecting these queries we can still allow normal queries from these users during peak hours while still allowing the heavy queries outside of peak hours.

This is just one example of a tailored solution to a specific problem and how the modularity of MaxScale enables you expand your horizons. The firewall filter could be seen as a surgeon’s blade that operates only on things that need to be operated on and is only brought out when needed.

About Markus Mäkelä

Markus Mäkelä is a Software Engineer working on MariaDB MaxScale. He graduated from Metropolia University of Applied Sciences in Helsinki, Finland.

Read all posts by Markus Mäkelä