March 13, 2018

MariaDB MaxScale: Masking and Firewall

To properly prevent some columns from being accessed, the masking filter that was introduced in MariaDB MaxScale 2.1 must be used together with the database firewall filter. In version 2.1, it was somewhat cumbersome to setup the database firewall filter for use together with the masking filter. With the recently released MariaDB MaxScale 2.2 (GA), we have introduced new features that makes the combined use of the masking filter and the database firewall filter much easier.

Before moving to the new functionality introduced in version 2.2, let us recap what the masking filter actually can do. With the masking filter it is possible to mask the returned values, so that even though a column can be SELECTed, the actual value will not be revealed. This can be used for restricting access to some information without making the actual column inaccessible.

For instance, suppose there is a table person that, among other columns, contains the column ssn where the social security number of a person is stored. With the masking filter it is possible to specify that when the ssn field is queried, a masked value is returned, unless the user making the query is a specific one.

That is, when making the query

> SELECT name, ssn FROM person;

instead of getting the real result, as in

+-------+-------------+
| name  | ssn         |
+-------+-------------+
| Bob   | 435-22-3267 |
| Alice | 721-07-4426 |
...

the ssn would be masked, as in

+-------+-------------+
| name  | ssn         |
+-------+-------------+
| Bob   | XXX-XX-XXXX |
| Alice | XXX-XX-XXXX |
...

Taking masking into use is quite straightforward; in the service section it is referred to just like any other filter

[TheService]
type=service
...
filters=Masking

and at minimum the filter configuration looks as follows:

[Masking]
type=filter
module=masking
rules=path-to-masking-rules-file

The value of the rules key is the path of the file containing the rules. The full documentation for the masking filter can be found here, so we will not cover all details but only the basics and what is new in MaxScale 2.2. The rules file used in the previous example looks as follows:

{
   "rules": [
       {
           "replace": {
               "column": "ssn"
           },
           "with": {
               "value": "XXX-XX-XXXX"
           }
       }
   ]
}

The content must be JSON object with a single key rules whose value is an array of rule objects. Each rule object can contain various keys, the most notable of which are replace and with. The value of replace tells what should be replaced and the value of with what it should be replaced with. So, above we specify that the value of a column called ssn should be replaced with the string “XXX-XX-XXXX”.

The string specified with value is used to replace the actual value only if the length of the real value is exactly the same as the length of the string. If that is not the case, then the value will by default be replaced with as many “X”s as necessary. This is different from MaxScale 2.1, where the catch all string had to be specified using a fill key. The fill key can still be specified if some other string but “X” should be used.

Note that with the above rules, the masking will be applied to all users, but with the key applies_to it can be specified what users are affected and with the key exempted what users are exempted, so it is straightforward to apply the masking to only a specific set of users.

More Functionality

Replacing the entire value with something else was the only option there was in MaxScale 2.1. In MaxScale 2.2 we have provided additional alternatives for how the value is replaced.

Partial Replacement

In MaxScale 2.2, it is no longer necessary to replace the entire value, but it is possible to replace only some part of it. Consider, for instance, the previous example where we masked the social security number entirely. If we would like to only mask the last 4 digits of the number it could be accomplished with the following rule.

{
   "rules": [
       {
           "replace": {
               "column": "ssn",
               "match": "....$"
           },
           "with": {
               "value": "YYYY"
           }
       }
   ]
}

With the match key a PCRE2 regular expression can be defined. If some part of the value matches that regular expression, then that and only that part is replaced. The string above matches the four last characters - whatever they are - so the last four characters will be masked.

With partial matching, the string specified with value is used, provided the length of that string is exactly the same as the string matched by the regular expression. With these rules, the result looks as follows:

+-------+-------------+
| name  | ssn         |
+-------+-------------+
| Bob   | 435-22-YYYY |
| Alice | 721-07-YYYY |
...

Obfuscation

Another feature introduced in MaxScale 2.2 is obfuscation. That is, instead of simply replacing a value with a fixed string, the value is replaced with a new value that is created from the original value, but in such a manner that the original value cannot easily be obtained from the generated one. Using obfuscation instead of replacement is easy; simply replace the replace keyword with obfuscate and remove the with object entirely.

{
   "rules": [
       {
           "obfuscate": {
               "column": "ssn"
           }
       }
   ]
}

And the result is as follows:

+-------+-------------+
| name  | ssn         |
+-------+-------------+
| Bob   | +|bv%~6d{y  |
| Alice | ~H;Oj#Q~~(\ |

Currently partial matching is not supported in conjunction with obfuscation but is always done to the entire value. Partial matching will be supported in the future.

The obfuscation algorithm is basically a hashing function that in principle makes it impossible to obtain the original value from the obfuscated one. Note, however, that if the range of values is limited, it is straightforward to figure out the possible original values by running the full range of values through the obfuscation algorithm, thus obtaining a mapping from values to obfuscated values, and then comparing the result of the query with the values in that mapping.

Note on function blocking and masking:

The masking filter works only on the result set, which means it is easy to bypass it. Consider the following:

> select name, ssn from person;
+-------+-------------+
| name  | ssn         |
+-------+-------------+
| Bob   | XXX-XX-XXXX |
| Alice | XXX-XX-XXXX |
+-------+-------------+


> select name, concat(ssn) from person;
+-------+-------------+
| name  | concat(ssn) |
+-------+-------------+
| Bob   | 435-22-3267 |
| Alice | 721-07-4426 |
+-------+-------------+

Simply by not using the column name as such, but “hiding” it in a function, means that the masking can be bypassed.

Combining the Masking and Firewall Filters

Preventing the bypassing of the masking filter using the firewall filter was possible already in MaxScale 2.1, but was quite awkward as the required firewall rules became quite extensive. Basically it was necessary to explicitly prevent the use of any function using which the masking could be bypassed. In MaxScale 2.2 we have provided additional firewall functionality using which the task becomes much easier.

Starting with the service in the configuration file, the requests must first be passed through the firewall before they are provided to the masking filter.

[TheService]
type=service
...
filters=Firewall|Masking

And the firewall needs a section of its own.

[Firewall]
type=filter
module=dbfwfilter
rules=path-to-firewall-rules-file

Then comes the firewall rules. What we want to prevent is the use of functions in conjunction with the column we want to mask. That can be accomplished using the following firewall rules.

rule no_functions_with_ssn match uses_function ssn

users %@% match any rules no_functions_with_ssn

The rule above matches if the column ssn is used in conjunction with any function. If we now try to bypass the masking, the result will be as follows:

> select name, concat(ssn) from person;

ERROR 1141 (HY000): Access denied for user 'cecil'@'127.0.0.1' to database 'testdb': Permission denied to column 'ssn' with function.

Every function cannot be used for bypassing the masking, so it is also possible to whitelist certain functions. For instance, if we want to allow the use of the function LENGTH, that can be accomplished as follows:

rule only_length_with_ssn match not_function length columns ssn

users %@% match any rules only_length_with_ssn

This rule matches if the column ssn is used in conjunction with any other function but LENGTH. So, the following works

> select name, length(ssn) from person;
+-------+-------------+
| name  | length(ssn) |
+-------+-------------+
| Bob   |          11 |
| Alice |          11 |
+-------+-------------+

but the use of any other function fails

> select name, concat(ssn) from person;
ERROR 1141 (HY000): Access denied for user 'cecil'@'127.0.0.1' to database 'testdb': Permission denied to column 'ssn' with function 'concat'.

Conclusion

The masking and firewall filter can together be used for ensuring that unauthorized users cannot access the actual value of some column. In MariaDB MaxScale 2.2, the functionality of the firewall has been extended so that it is easy to prevent actions that otherwise could be used for bypassing the masking filter. Download MariaDB MaxScale now to get started.

About Johan Wikman

Johan Wikman is a senior developer working on MaxScale at MariaDB Corporation. 

Read all posts by Johan Wikman