MaxScale Rewrite Filter
This page is part of MariaDB's Documentation.
The parent of this page is: Query Rewriting
Topics on this page:
Overview
The rewritefilter
is a filter that can be configured to manipulate the SQL statements that are being run before they get to the backing database. The changes can be made using a template file that can be in rewrite-file format (rf
) or JSON file format.
For details on how to configure the filter module itself, see the rewritefilter
module.
rf
vs JSON File Format
The rewrite-file format (rf
) template file specifies each rule beginning with a %%
line.
Any per-rule options are added prior to the match & replace text sections and are specified as one
option: value
per lineThe match text is preceded by a
%
lineThe replace text is preceded by a second
%
lineComment lines are specified starting with a
#
characterEmpty lines are ignored
The match text and replace text can span multiple lines, though it usually requires the whitespace-ignoring option to be specified
Backslash characters are handled literally, requiring no extra escaping
The syntax of the rf
file therefore looks like this:
%%
# This is a comment line.
# Add options in the form "name: value" here
regex_grammar: Native
ignore_whitespace: true
%
match text
%
replace text
That rule format can be repeated in the file to define as many rules as you need.
The JSON format is specified as a JSON object with the key templates
specifying a list of rules. Each rule in the list is a JSON object that specifies options as key & value pairs as well as the match_template
key & value and the replace_template
key & value. Normal JSON quoting occurs, which means that double quotes and backslashes get an extra backslash escape. True/false values are specified as JSON boolean values, not as quoted strings. No comments are possible.
The JSON format looks like this (with optional added whitespace):
{
"templates": [
{
"regex_grammar": "Native",
"ignore_whitespace": true,
"match_template": "match text",
"replace_template": "replace text"
}
]
}
The templates
list can be extended with additional rule objects to define as many rules as you need.
The rf
format is easier for humans to create and edit, while the JSON format is easier for programmatic generation. For example, the MaxScale GUI creates the template file in the JSON format.
Per-rule Options
The following are the options that can be set for each rule in the template file. Some of the options default to the value specified in the configuration file's filter section (when specified there).
regex_grammar
- override the regex grammar and matching style from the default. The choices are:Native
(the default when unconfigured and unspecified)This mode has the side-effect of enabling placeholder syntax and forcing the rule to always match the whole line. It can thus only result in a single change on the line. All other settings use only their regular expression matching syntax and can result in one or more replacements being made on the line.
ECMAScript
Posix
EPosix
Awk
Grep
EGrep
case_sensitive
- a boolean value that controls if uppercase & lowercase are distinct when matching. The unconfigured and unspecified default is true.ignore_whitespace
- a boolean value that treats whitespace variances as insignificant. Defaults to true.The use of this option turns all literal whitespace in the match text into a syntax that can match any amount of whitespace. For example, a match of "
x = 5
" would match "x = 5
" and "x=5
". However, a match of "x=5
" would not match "x = 5
".If you want to specify mandatory whitespace with this option enabled, you will need to use a regex such as
@{1:\s+}
or similar.log_replacement
- a boolean value that controls if replacements are logged at the NOTICE level or not logged. The unconfigured and unspecified default is false (no logging).continue_if_matched
- a boolean value that controls if more replacements should be applied. Defaults to false.what_if
- a boolean value that disables the actual replacement and instead logs what would have been replaced at the NOTICE level. Defaults to false (normal functioning).
Native
Rules
The Native
regex_grammar
mode makes use of numbered placeholders that specify both match groups and references in the replace text. A basic placeholder matches a non-greedy amount of text while a regex placeholder matches a set of text that you specify.
@{NUM}
- a basic placeholder whereNUM
is a positive integer of your choosing@{NUM:REGEX}
- regex placeholder whereNUM
is a positive integer of your choosing andREGEX
is a validECMAScript
regular expression
Note that the supplied regex value is used literally with no munging done. For instance, a space in a regex placeholder is unaffected by ignore_whitespace
.
Using the same placeholder number multiple times in a match text ensures that the same text appears in each place when matching. This is referred to as a back-reference in regular-expression terms. When adding a back-reference for a regex placeholder, specify a basic placeholder (This is @{1:['"]}quoted@{1} text
).
The replace text can contain references to placeholders in the match text using the basic placeholder syntax (@{NUM}
). It is legal to refer to a placeholder number multiple times if you want to duplicate that placeholder value.
Note that the use of regular-expression anchors can make matching more efficient, especially anchoring the start of the match. For example, using @{1:^}
at the start of the match text ensures that the SQL statement is matched at its beginning.
Any use of a }
character within a regex rule requires that it be backslash escaped or it will be interpreted as the end of the placeholder.
In ECMAScript
it is possible to match digits using either \d
or [[:digit:]]
and a mandatory whitespace character using \s
or [[::space:]]
as you prefer.
Special care must be taken when using (
in a regex placeholder. If it is to be matched literally, it needs to be backslash escaped. If it is being used for specifying repeating elements or multiple-choices, you must make the group non-matching by using (?:
instead of a (
on its own. Any capturing groups will interfere with the placeholder handling.
A simple Native
example:
%%
regex_grammar: Native
case_sensitive: false
%
# The first "%" begins the match text section
@{1:^}select @{2} from my_table where id = @{3}
%
# The second "%" begins the replace text section
select @{2} from my_table where my_id = @{3}
The above template rule would effect a hot-fix for a client that was using the old column name of id
to use the new column name of my_id
as long as it is a select whose syntax matches the rule.
Automatic Placeholders
When using Native
regex_grammar
, a match text is always modified to begin and end with a placeholder if the rule omits one or both, and a corresponding implied placeholder is added to the replace text. This means that the match text always matches the whole statement. Any implied placeholders cannot be referenced manually. This also means that a Native
rule can only result in a single replacement on the line.
For example, the following 2 rules are the same. Each substitutes the first instance of the string "bad" to "good":
%%
regex_grammar: Native
%
bad
%
good
%%
regex_grammar: Native
%
@{1}bad@{2}
%
@{1}good@{2}
Typically a Native
rule will not be this simple, and will usually begin with a @{1:^}
to anchor the match to the start of the statement. An unanchored match can be replaced anywhere on the line, so the above rules could affect text inside or outside of quoted strings.
Keep in mind that the auto-suppression of a starting or ending placeholder can bite you if the supplied placeholder does not match the starting or ending text of the surrounding statement. For instance, this rule mangles the statement into just the word bar
:
%%
regex_grammar: Native
%
@{1:\b}foo@{2:\b}
%
bar
Whereas both these rules properly substitute the first instance of the word-boundary restricted word foo
to bar
:
%%
regex_grammar: Native
%
@{1}@{2:\b}foo@{3:\b}@{4}
%
@{1}bar@{4}
%%
regex_grammar: Native
%
@{1.*?\b}foo@{2:\b.*}
%
@{1}bar@{2}
Non-Native
Rules
When using a non-Native
regex_grammar
, the match text is in normal regular expression syntax and can match & replace one or more times. This is present for the sole purpose of allowing simple match-and-replace rules.
For example, the following would replace a table name anywhere on the line as many times as it appeared (and uses the word-boundary regex to ensure that it doesn't affect a substring of a longer table name):
%%
regex_grammar: ECMAScript
case_sensitive: false
%
\bwrong_table_name\b
%
correct_table_name
Be cautious with such rules as they can even affect the contents inside of quoted strings.
Given the simpler role of non-Native
rules, you cannot make use of capture groups in the replacement text. You can, however, make use of a group in a look-behind clause:
%%
regex_grammar: ECMAScript
case_sensitive: false
%
(['"])string\1
%
'newstring'
The ignore_whitespace
setting does affect spaces specified in non-Native
rules, so if you are doing anything where you are specifying a space as a part of a regex sequence (such as foo +bar[ ]text
), be sure to turn off ignore_whitespace
for that rule.