MaxScale Rewrite Filter

Overview

The rewritefilter is a filter that can be configured to manipulate the SQL statements that are being run before they get to the backing database. The changes can be made using a template file that can be in rewrite-file format (rf) or JSON file format.

For details on how to configure the filter module itself, see the rewritefilter module.

rf vs JSON File Format

The rewrite-file format (rf) template file specifies each rule beginning with a %% line.

  • Any per-rule options are added prior to the match & replace text sections and are specified as one option: value per line

  • The match text is preceded by a % line

  • The replace text is preceded by a second % line

  • Comment lines are specified starting with a # character

  • Empty lines are ignored

  • The match text and replace text can span multiple lines, though it usually requires the whitespace-ignoring option to be specified

  • Backslash characters are handled literally, requiring no extra escaping

The syntax of the rf file therefore looks like this:

%%
# This is a comment line.
# Add options in the form "name: value" here
regex_grammar: Native
ignore_whitespace: true
%
match text
%
replace text

That rule format can be repeated in the file to define as many rules as you need.

The JSON format is specified as a JSON object with the key templates specifying a list of rules. Each rule in the list is a JSON object that specifies options as key & value pairs as well as the match_template key & value and the replace_template key & value. Normal JSON quoting occurs, which means that double quotes and backslashes get an extra backslash escape. True/false values are specified as JSON boolean values, not as quoted strings. No comments are possible.

The JSON format looks like this (with optional added whitespace):

{
  "templates": [
     {
       "regex_grammar": "Native",
       "ignore_whitespace": true,
       "match_template": "match text",
       "replace_template": "replace text"
     }
   ]
}

The templates list can be extended with additional rule objects to define as many rules as you need.

The rf format is easier for humans to create and edit, while the JSON format is easier for programmatic generation. For example, the MaxScale GUI creates the template file in the JSON format.

Per-rule Options

The following are the options that can be set for each rule in the template file. Some of the options default to the value specified in the configuration file's filter section (when specified there).

  • regex_grammar - override the regex grammar and matching style from the default. The choices are:

    • Native (the default when unconfigured and unspecified)

      This mode has the side-effect of enabling placeholder syntax and forcing the rule to always match the whole line. It can thus only result in a single change on the line. All other settings use only their regular expression matching syntax and can result in one or more replacements being made on the line.

    • ECMAScript

    • Posix

    • EPosix

    • Awk

    • Grep

    • EGrep

  • case_sensitive - a boolean value that controls if uppercase & lowercase are distinct when matching. The unconfigured and unspecified default is true.

  • ignore_whitespace - a boolean value that treats whitespace variances as insignificant. Defaults to true.

    The use of this option turns all literal whitespace in the match text into a syntax that can match any amount of whitespace. For example, a match of "x = 5" would match "x   =    5" and "x=5". However, a match of "x=5" would not match "x = 5".

    If you want to specify mandatory whitespace with this option enabled, you will need to use a regex such as @{1:\s+} or similar.

  • log_replacement - a boolean value that controls if replacements are logged at the NOTICE level or not logged. The unconfigured and unspecified default is false (no logging).

  • continue_if_matched - a boolean value that controls if more replacements should be applied. Defaults to false.

  • what_if - a boolean value that disables the actual replacement and instead logs what would have been replaced at the NOTICE level. Defaults to false (normal functioning).

Native Rules

The Native regex_grammar mode makes use of numbered placeholders that specify both match groups and references in the replace text. A basic placeholder matches a non-greedy amount of text while a regex placeholder matches a set of text that you specify.

  • @{NUM} - a basic placeholder where NUM is a positive integer of your choosing

  • @{NUM:REGEX} - regex placeholder where NUM is a positive integer of your choosing and REGEX is a valid ECMAScript regular expression

Note that the supplied regex value is used literally with no munging done. For instance, a space in a regex placeholder is unaffected by ignore_whitespace.

Using the same placeholder number multiple times in a match text ensures that the same text appears in each place when matching. This is referred to as a back-reference in regular-expression terms. When adding a back-reference for a regex placeholder, specify a basic placeholder (This is @{1:['"]}quoted@{1} text).

The replace text can contain references to placeholders in the match text using the basic placeholder syntax (@{NUM}). It is legal to refer to a placeholder number multiple times if you want to duplicate that placeholder value.

Note that the use of regular-expression anchors can make matching more efficient, especially anchoring the start of the match. For example, using @{1:^} at the start of the match text ensures that the SQL statement is matched at its beginning.

Any use of a } character within a regex rule requires that it be backslash escaped or it will be interpreted as the end of the placeholder.

In ECMAScript it is possible to match digits using either \d or [[:digit:]] and a mandatory whitespace character using \s or [[::space:]] as you prefer.

Special care must be taken when using ( in a regex placeholder. If it is to be matched literally, it needs to be backslash escaped. If it is being used for specifying repeating elements or multiple-choices, you must make the group non-matching by using (?: instead of a ( on its own. Any capturing groups will interfere with the placeholder handling.

A simple Native example:

%%
regex_grammar: Native
case_sensitive: false
%
# The first "%" begins the match text section
@{1:^}select @{2} from my_table where id = @{3}
%
# The second "%" begins the replace text section
select @{2} from my_table where my_id = @{3}

The above template rule would effect a hot-fix for a client that was using the old column name of id to use the new column name of my_id as long as it is a select whose syntax matches the rule.

Automatic Placeholders

When using Native regex_grammar, a match text is always modified to begin and end with a placeholder if the rule omits one or both, and a corresponding implied placeholder is added to the replace text. This means that the match text always matches the whole statement. Any implied placeholders cannot be referenced manually. This also means that a Native rule can only result in a single replacement on the line.

For example, the following 2 rules are the same. Each substitutes the first instance of the string "bad" to "good":

%%
regex_grammar: Native
%
bad
%
good

%%
regex_grammar: Native
%
@{1}bad@{2}
%
@{1}good@{2}

Typically a Native rule will not be this simple, and will usually begin with a @{1:^} to anchor the match to the start of the statement. An unanchored match can be replaced anywhere on the line, so the above rules could affect text inside or outside of quoted strings.

Keep in mind that the auto-suppression of a starting or ending placeholder can bite you if the supplied placeholder does not match the starting or ending text of the surrounding statement. For instance, this rule mangles the statement into just the word bar:

%%
regex_grammar: Native
%
@{1:\b}foo@{2:\b}
%
bar

Whereas both these rules properly substitute the first instance of the word-boundary restricted word foo to bar:

%%
regex_grammar: Native
%
@{1}@{2:\b}foo@{3:\b}@{4}
%
@{1}bar@{4}

%%
regex_grammar: Native
%
@{1.*?\b}foo@{2:\b.*}
%
@{1}bar@{2}

Non-Native Rules

When using a non-Native regex_grammar, the match text is in normal regular expression syntax and can match & replace one or more times. This is present for the sole purpose of allowing simple match-and-replace rules.

For example, the following would replace a table name anywhere on the line as many times as it appeared (and uses the word-boundary regex to ensure that it doesn't affect a substring of a longer table name):

%%
regex_grammar: ECMAScript
case_sensitive: false
%
\bwrong_table_name\b
%
correct_table_name

Be cautious with such rules as they can even affect the contents inside of quoted strings.

Given the simpler role of non-Native rules, you cannot make use of capture groups in the replacement text. You can, however, make use of a group in a look-behind clause:

%%
regex_grammar: ECMAScript
case_sensitive: false
%
(['"])string\1
%
'newstring'

The ignore_whitespace setting does affect spaces specified in non-Native rules, so if you are doing anything where you are specifying a space as a part of a regex sequence (such as foo +bar[ ]text), be sure to turn off ignore_whitespace for that rule.