All pages
Powered by GitBook
1 of 21

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

MaxScale 25.01 Binlog Filter

Binlog Filter

This filter was introduced in MariaDB MaxScale 2.3.0.

  • Binlog Filter

Overview

The binlogfilter can be combined with a binlogrouter service to selectively replicate the binary log events to replica servers.

The filter uses two settings, match and exclude, to determine which events are replicated. If a binlog event does not match or is excluded, the event is replaced with an empty data event. The empty event is always 35 bytes which translates to a space reduction in most cases.

When statement-based replication is used, any query events that are filtered out are replaced with a SQL comment. This causes the query event to do nothing and thus the event will not modify the contents of the database. The GTID position of the replicating database will still advance which means that downstream servers replicating from it keep functioning correctly.

The filter works with both row based and statement based replication but we recommend using row based replication with the binlogfilter. This guarantees that there are no ambiguities in the event filtering.

Settings

match

  • Type:

  • Mandatory: No

  • Dynamic: Yes

  • Default: None

Include queries that match the regex. See next entry, exclude, for more information.

exclude

  • Type:

  • Mandatory: No

  • Dynamic: Yes

  • Default: None

Exclude queries that match the regex.

If neither match nor exclude are defined, the filter does nothing and all events are replicated. This filter does not accept regular expression options as a separate setting, such settings must be defined in the patterns themselves. See the for more information.

The two settings are matched against the database and table name concatenated with a period. For example, the string the patterns are matched against for the database test and table t1 is test.t1.

For statement based replication, the pattern is matched against all the tables in the statements. If any of the tables matches the match pattern, the event is replicated. If any of the tables matches the exclude pattern, the event is not replicated.

rewrite_src

  • Type:

  • Mandatory: No

  • Dynamic: Yes

  • Default: None

See the next entry, rewrite_dest, for more information.

rewrite_dest

  • Type:

  • Mandatory: No

  • Dynamic: Yes

  • Default: None

rewrite_src and rewrite_dest control the statement rewriting of the binlogfilter. The rewrite_src setting is a PCRE2 regular expression that is matched against the default database and the SQL of statement based replication events (query events). rewrite_dest is the replacement string which supports the normal PCRE2 backreferences (e.g the first capture group is $1, the second is $2, etc.).

Both rewrite_src and rewrite_dest must be defined to enable statement rewriting.

When statement rewriting is enabled must be used. The filter will disallow replication for all replicas that attempt to replicate with traditional file-and-position based replication.

The replacement is done both on the default database as well as the SQL statement in the query event. This means that great care must be taken when defining the rewriting rules. To prevent accidental modification of the SQL into a form that is no longer valid, use database and table names that never occur in the inserted data and is never used as a constant value.

Example Configuration

With the following configuration, only events belonging to database customers are replicated. In addition to this, events for the table orders are excluded and thus are not replicated.

For more information about the binlogrouter and how to use it, refer to the .

This page is licensed: CC BY-SA / Gnu FDL

MaxScale 25.01 Comment Filter

Comment Filter

  • Comment Filter

    • Overview

Overview

With the comment filter it is possible to define comments that are injected before the actual statements. These comments appear as sql comments when they are received by the server.

Settings

The Comment filter requires one mandatory parameter to be defined.

inject

  • Type: string

  • Mandatory: Yes

  • Dynamic: Yes

A parameter that contains the comment injected before the statements. There is also defined variable $IP that can be used to comment the IP address of the client in the injected comment. Variables must be written in all caps.

Examples

Example 1 - Inject IP address of the connected client into statements

as comment.

The following configuration adds the IP address of the client to the comment.

In this example when MaxScale receives statement like:

It would look like

when received by server.

This page is licensed: CC BY-SA / Gnu FDL

rewrite_dest

Overview
Settings
match
exclude
rewrite_src
Example Configuration
regex
regex
PCRE2 api documentation
regex
regex
binlogrouter documentation
Settings
inject
Examples
Example 1 - Inject IP address of the connected client into statements

MaxScale 25.01 Optimistic Transaction Execution Filter

Optimistic Transaction Execution Filter

The optimistictrx filter implements optimistic transaction execution. The filter is designed for a use-case where most of the transactions are read-only and writes happen rarely but each set of read-only statements is still grouped into a read-write transaction (i.e. START TRANSACTION, BEGIN orSET autocommit=0).

This filter will replace the BEGIN and START TRANSACTION SQL commands withSTART TRANSACTION READ ONLY. If the transaction is fully read-only, the transaction completes normally. However, if a write happens in the middle of a transaction, the filter issues a ROLLBACK command and then replays the read-only part of the transaction, including the original BEGIN statement. If the results of the replayed read-only part of the transaction is identical to the one that was returned to the client, the transaction proceeds normally. If the result checksum does not match, the connection is closed to prevent a write with the wrong transaction state from happening.

Configuration

To add the filter to a service, define an instance of the filter and then add it to a service's filters list:

This can also be done at runtime with:

This page is licensed: CC BY-SA / Gnu FDL

MaxScale 25.01 Maxrows

Maxrows

This filter was introduced in MariaDB MaxScale 2.1.

[BinlogFilter]
type=filter
module=binlogfilter
match=/customers[.]/
exclude=/[.]orders/

[BinlogServer]
type=service
router=binlogrouter
server_id=33
filters=BinlogFilter

[BinlogListener]
type=listener
service=BinlogServer
port=4000
[MyComment]
type=filter
module=comment
inject="Comment to be injected"

[MyService]
type=service
router=readwritesplit
servers=server1
user=myuser
password=mypasswd
filters=MyComment
[IPComment]
type=filter
module=comment
inject="IP=$IP"

[MyService]
type=service
router=readwritesplit
servers=server1
user=myuser
password=mypasswd
filters=IPComment
SELECT user FROM people;
/* IP=::ffff:127.0.0.1 */SELECT user FROM people;
Overview
  • Configuration

  • Settings

    • max_resultset_rows

    • max_resultset_size

    • max_resultset_return

  • Example Configuration

  • Overview

    The Maxrows filter is capable of restricting the amount of rows that a SELECT, a prepared statement or stored procedure could return to the client application.

    If a resultset from a backend server has more rows than the configured limit or the resultset size exceeds the configured size, an empty result will be sent to the client.

    Configuration

    The Maxrows filter is easy to configure and to add to any existing service.

    Settings

    The Maxrows filter has no mandatory parameters. Optional parameters are:

    max_resultset_rows

    • Type: number

    • Mandatory: No

    • Dynamic: Yes

    • Default: (no limit)

    Specifies the maximum number of rows a resultset can have in order to be returned to the user.

    If a resultset is larger than this an empty result will be sent instead.

    max_resultset_size

    • Type: size

    • Mandatory: No

    • Dynamic: Yes

    • Default: 64Ki

    Specifies the maximum size a resultset can have in order to be sent to the client. A resultset larger than this, will not be sent: an empty resultset will be sent instead.

    max_resultset_return

    • Type: enum

    • Mandatory: No

    • Dynamic: Yes

    • Values: empty, error, ok

    • Default: empty

    Specifies what the filter sends to the client when the rows or size limit is hit, possible values:

    • an empty result set

    • an error packet with input SQL

    • an OK packet

    Example output with ERR packet:

    debug

    • Type: number

    • Mandatory: No

    • Dynamic: Yes

    • Default: 0

    An integer value, using which the level of debug logging made by the Maxrows filter can be controlled. The value is actually a bitfield with different bits denoting different logging.

    • 0 (0b00000) No logging is made.

    • 1 (0b00001) A decision to handle data form server is logged.

    • 2 (0b00010) Reached max_resultset_rows or max_resultset_size is logged.

    To log everything, give debug a value of 3.

    Example Configuration

    Here is an example of filter configuration where the maximum number of returned rows is 10000 and maximum allowed resultset size is 256KB

    This page is licensed: CC BY-SA / Gnu FDL

    Maxrows
    [OptimisticTrx]
    type=filter
    module=optimistictrx
    
    [MyService]
    ...
    filters=OptimisticTrx
    maxctrl create filter OptimisticTrx optimistictrx
    maxctrl alter service-filter MyService OptimisticTrx

    MaxScale 25.01 LDI Filter

    LDI Filter

    The ldi (LOAD DATA INFILE) filter was introduced in MaxScale 23.08.0 and it extends the MariaDB LOAD DATA INFILE syntax to support loading data from any object storage that supports the S3 API. This includes cloud offerings like AWS S3 and Google Cloud Storage as well as locally run services like Minio.

    If the filename starts with either S3:// or gs://, the path is interpreted as a S3 object file. The prefix is case-insensitive. For example, the following command would load the file my-data.csv from the bucket my-bucket into the table t1.

    How to Upload Data

    Here is a minimal configuration for the filter that can be used to load data from AWS S3:

    The first step is to move the file to be loaded into the same region that MaxScale and the MariaDB servers are in. One factor in the speed of the upload is the network latency and minimizing it by moving the source and the destination closer improves the data loading speed.

    The next step is to connect to MaxScale and prepare the session for an upload by providing the service account access and secret keys.

    Once the credentials are configured, the data loading can be started:

    Data Uploads with MariaDB Xpand

    This feature has been removed in MaxScale 24.02.

    Common Problems With Data Loading

    Missing Files

    If you are using self-hosted object storage programs like Minio, a common problem is that they do not necessarily support the newer virtual-hosted-style requests that is used by AWS. This usually manifests as an error either about a missing file or a missing bucket.

    If the host parameter is set to a hostname, it's assumed that the object storage supports the newer virtual-hosted-style requests. If this not the case, the filter must be configured with protocol_version=1.

    Conversely, if the host parameter is set to a plain IP address, it is assumed that it does not support the newer virtual-hosted-style request. If the host does support it, the filter must be configured with protocol_version=2.

    Settings

    key

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    The S3 access key used to perform all requests to it.

    This must be either configured in the MaxScale configuration file or set withSET @maxscale.ldi.s3_key='<key>' before starting the data load.

    secret

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    The S3 secret key used to perform all requests to it.

    This must be either configured in the MaxScale configuration file or set withSET @maxscale.ldi.s3_secret='<secret>' before starting the data load.

    region

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: us-east-1

    The S3 region where the data is located.

    The value can be overridden with SET @maxscale.ldi.s3_region='<region>' before starting the data load.

    host

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: s3.amazonaws.com

    The location of the S3 object storage. By default the original AWS S3 host is used. The corresponding value for Google Cloud Storage isstorage.googleapis.com.

    The value can be overridden with SET @maxscale.ldi.s3_host='<host>' before starting the data load.

    port

    • Type: integer

    • Mandatory: No

    • Dynamic: Yes

    • Default: 0

    The port on which the S3 object storage is listening. If unset or set to the value of 0, the default S3 port is used.

    The value can be overridden with SET @maxscale.ldi.s3_port=<port> before starting the data load. Note that unlike the other values, the value for this variable must be an SQL integer and not an SQL string.

    no_verify

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    If set to true, TLS certificate verification for the object storage is skipped.

    use_http

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    If set to true, communication with the object storage is done unencrypted using HTTP instead of HTTPS.

    protocol_version

    • Type: integer

    • Mandatory: No

    • Dynamic: Yes

    • Default: 0

    Which protocol version to use. By default the protocol version is derived from the value of host but this automatic protocol version deduction will not always produce the correct result. For the legacy path-style requests used by older S3 storage buckets, the value must be set to 1. All new buckets use the protocol version 2.

    For object storage programs like Minio, the value must be set to 1 as the bucket name cannot be resolved via the subdomain like it is done for object stores in the cloud.

    import_user

    This parameter has been removed in MaxScale 24.02.

    import_password

    This parameter has been removed in MaxScale 24.02.

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Psreuse

    Psreuse

    The psreuse filter reuses identical prepared statements inside the same client connection. This filter only works with binary protocol prepared statements and not with text protocol prepared statements executed with the PREPARE SQL command.

    When this filter is enabled and the connection prepares an identical prepared statement multiple times, instead of preparing it on the server the existing prepared statement handle is reused. This also means that whenever prepared statements are closed by the client, they will be left open by readwritesplit.

    Enabling this feature will increase memory usage of a session. The amount of memory stored per prepared statement is proportional to the length of the prepared SQL statement and the number of parameters the statement has.

    Configuration

    To add the filter to a service, define an instance of the filter and then add it to a service's filters list:

    Limitations

    • If the SQL in the prepared statement is larger than 1677723 bytes, the prepared statement will not be cached.

    • If the same SQL is prepared more than once at the same time, only one of them will succeed. This happens as the prepared statement reuse uses the SQL string in the comparison to detect if a statement is already prepared.

    This page is licensed: CC BY-SA / Gnu FDL

    [MaxRows]
    type=filter
    module=maxrows
    
    [MaxRows-Routing-Service]
    type=service
    ...
    filters=MaxRows
    max_resultset_rows=1000
    max_resultset_size=128Ki
    MariaDB [(test)]> select * from test.t4;
    ERROR 1415 (0A000): Row limit/size exceeded for query: select * from test.t4
    debug=2
    [MaxRows]
    type=filter
    module=maxrows
    max_resultset_rows=10000
    max_resultset_size=256000
    debug
    [PsReuse]
    type=filter
    module=psreuse
    
    [MyService]
    ...
    filters=PsReuse

    Settings

    • key

    • secret

    • region

    • host

    Values: 0, 1, 2
    LDI Filter
    How to Upload Data
    Data Uploads with MariaDB Xpand
    Common Problems With Data Loading
    Missing Files
    boolean
    boolean

    MaxScale 25.01 Hintfilter

    Hintfilter

    This filter adds routing hints to a service. The filter has no parameters.

    • Hintfilter

    • Hint Syntax

    Hint Syntax

    Note: If a query has more than one comment only the first comment is processed. Always place any MaxScale related comments first before any other comments that might appear in the query.

    Comments and comment types

    The client connection will need to have comments enabled. For example themariadb and mysql command line clients have comments disabled by default and they need to be enabled by passing the --comments or -c option to it. Most, if not all, connectors keep all comments intact in executed queries.

    For comment types, use either -- (notice the whitespace after the double hyphen) or # after the semicolon or /* ... */ before the semicolon.

    Inline comment blocks, i.e. /* .. */, do not require a whitespace character after the start tag or before the end tag but adding the whitespace is advised.

    Hint body

    All hints must start with the maxscale tag.

    The hints have two types, ones that define a server type and others that contain name-value pairs.

    Routing destination hints

    These hints will instruct the router to route a query to a certain type of a server.

    Route to primary

    A master value in a routing hint will route the query to a primary server. This can be used to direct read queries to a primary server for a up-to-date result with no replication lag.

    Route to replica

    A slave value will route the query to a replica server. Please note that the hints will override any decisions taken by the routers which means that it is possible to force writes to a replica server.

    Route to named server

    A server value will route the query to a named server. The value of<server name> needs to be the same as the server section name in maxscale.cnf. If the server is not used by the service, the hint is ignored.

    Route to last used server

    A last value will route the query to the server that processed the last query. This hint can be used to force certain queries to be grouped to the same server.

    Name-value hints

    These control the behavior and affect the routing decisions made by the router. Currently the only accepted parameter is the readwritesplit parametermax_slave_replication_lag. This will route the query to a server with a lower replication lag than this parameter's value.

    Hint stack

    Hints can be either single-use hints, which makes them affect only one query, or named hints, which can be pushed on and off a stack of active hints.

    Defining named hints:

    Pushing a hint onto the stack:

    Popping the topmost hint off the stack:

    You can define and activate a hint in a single command using the following:

    You can also push anonymous hints onto the stack which are only used as long as they are on the stack:

    Prepared Statements

    The hintfilter supports routing hints in prepared statements for both thePREPARE and EXECUTE SQL commands as well as the binary protocol prepared statements.

    Binary Protocol

    With binary protocol prepared statements, a routing hint in the prepared statement is applied to the execution of the statement but not the preparation of it. The preparation of the statement is routed normally and is sent to all servers.

    For example, when the following prepared statement is prepared with the MariaDB Connector-C function mariadb_stmt_prepare and then executed withmariadb_stmt_execute the result is always returned from the primary:

    Support for binary protocol prepared statements was added in MaxScale 6.0 ().

    The protocol commands that the routing hints are applied to are:

    • COM_STMT_EXECUTE

    • COM_STMT_BULK_EXECUTE

    • COM_STMT_SEND_LONG_DATA

    • COM_STMT_FETCH

    Support for direct execution of prepared statements was added in MaxScale 6.2.0. For example the MariaDB Connector-C uses direct execution whenmariadb_stmt_execute_direct is used.

    Text Protocol

    Text protocol prepared statements (i.e. the PREPARE and EXECUTE SQL commands) behave differently. If a PREPARE command has a routing hint, it will be routed according to the routing hint. Any subsequent EXECUTE command will not be affected by the routing hint in the PREPARE statement. This means they must have their own routing hints.

    The following example is the recommended method of executing text protocol prepared statements with hints:

    The PREPARE is routed normally and will be routed to all servers. TheEXECUTE will be routed to the primary as a result of it having the route to master hint.

    Examples

    Routing SELECT queries to primary

    In this example, MariaDB MaxScale is configured with the readwritesplit router and the hint filter.

    Behind MariaDB MaxScale is a primary server and a replica server. If there is replication lag between the primary and the replica, read queries sent to the replica might return old data. To guarantee up-to-date data, we can add a routing hint to the query.

    The first INSERT query will be routed to the primary. The following SELECT query would normally be routed to the replica but with the added routing hint it will be routed to the primary. This way we can do an INSERT and a SELECT right after it and still get up-to-date data.

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Lua Filter

    Lua Filter

    The luafilter is a filter that calls a set of functions in a Lua script.

    Read the Lua language documentation for information on how to write Lua scripts.

    Note: This module is experimental and must be built from source. The module is deprecated in MaxScale 23.08 and might be removed in a future release.

    Settings

    The luafilter has two parameters. They control which scripts will be called by the filter. Both parameters are optional but at least one should be defined. If both global_script and session_script are defined, the entry points in both scripts will be called.

    global_script

    The global Lua script. The parameter value is a path to a readable Lua script which will be executed.

    This script will always be called with the same global Lua state and it can be used to build a global view of the whole service.

    session_script

    The session level Lua script. The parameter value is a path to a readable Lua script which will be executed once for each session.

    Each session will have its own Lua state meaning that each session can have a unique Lua environment. Use this script to do session specific tasks.

    Lua Script Calling Convention

    The entry points for the Lua script expect the following signatures:

    • nil createInstance(name) - global script only, called when the script is first loaded

      • When the global script is loaded, it first executes on a global level before the luafilter calls the createInstance function in the Lua script with the filter's name as its argument.

    • nil newSession(string, string) - new session is created

    These functions, if found in the script, will be called whenever a call to the matching entry point is made.

    Script Template

    Here is a script template that can be used to try out the luafilter. Copy it into a file and add global_script=<path to script> into the filter configuration. Make sure the file is readable by the maxscale user.

    Functions Exposed by the Luafilter

    The luafilter exposes the following functions that can be called inside the Lua script API endpoints. The callback function in which they can be called is documented after the function signature. If the functions are called outside of the correct callback function, they raise a Lua error.

    • string mxs_get_sql() (use: routeQuery)

    • Returns the SQL of the query being executed. This returns an empty string for any query that is not a text protocol query (COM_QUERY). Support for prepared statements is not yet implemented.

    • string mxs_get_type_mask() (use: routeQuery)

    Example Configuration and Script

    Here is a minimal configuration entry for a luafilter definition.

    And here is a script that opens a file in /tmp/ and logs output to it.

    Limitations

    • mxs_get_sql() and mxs_get_canonical() do not work with queries done with the binary protocol.

    • The Lua code is not restricted in any way which means excessively slow execution of it can cause the MaxScale process to become slower or to be aborted due to a SystemD watchdog timeout.

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Consistent Critical Read Filter

    Consistent Critical Read Filter

    This filter was introduced in MariaDB MaxScale 2.1.

    Overview

    The Consistent Critical Read (CCR) filter allows consistent critical reads to be done through MaxScale while still allowing scaleout of non-critical reads.

    When the filter detects a statement that would modify the database, it attaches a routing hint to all following statements done by that connection. This routing hint guides the routing module to route the statement to the primary server where data is guaranteed to be in an up-to-date state. Writes from one session do not, by default, propagate to other sessions.

    Note: This filter does not work with prepared statements. Only text protocol queries are handled by this filter.

    Controlling the Filter with SQL Comments

    The triggering of the filter can be limited further by adding MaxScale supported comments to queries and/or by using regular expressions. The query comments take precedence: if a comment is found it is obeyed even if a regular expression parameter might give a different result. Even a comment cannot cause a SELECT-query to trigger the filter. Such a comment is considered an error and ignored.

    The comments must follow the and the HintFilter needs to be in the filter chain before the CCR-filter. If a query has a MaxScale supported comment line which defines the parameter ccr, that comment is caught by the CCR-filter. Parameter values match and ignore are supported, causing the filter to trigger (match) or not trigger (ignore) on receiving the write query. For example, the query

    would normally cause the filter to trigger, but does not because of the comment. The match-comment typically has no effect, since write queries by default trigger the filter anyway. It can be used to override an ignore-type regular expression that would otherwise prevent triggering.

    Settings

    The CCR filter has no mandatory parameters.

    time

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: 60s

    The time window during which queries are routed to the primary. The duration can be specified as documented but the value will always be rounded to the nearest second. If no explicit unit has been specified, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. The default value for this parameter is 60 seconds.

    When a data modifying SQL statement is processed, a timer is set to the value of_time_. Once the timer has elapsed, all statements are routed normally. If a new data modifying SQL statement is processed within the time window, the timer is reset to the value of time.

    Enabling this parameter in combination with the count parameter causes both the time window and number of queries to be inspected. If either of the two conditions are met, the query is re-routed to the primary.

    count

    • Type: count

    • Mandatory: No

    • Dynamic: Yes

    • Default: 0

    The number of SQL statements to route to primary after detecting a data modifying SQL statement. This feature is disabled by default.

    After processing a data modifying SQL statement, a counter is set to the value of count and all statements are routed to the primary. Each executed statement after a data modifying SQL statement cause the counter to be decremented. Once the counter reaches zero, the statements are routed normally. If a new data modifying SQL statement is processed, the counter is reset to the value of_count_.

    match

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    These control which statements trigger statement re-routing. Only non-SELECT statements are inspected. For CCRFilter, the exclude-parameter is instead named ignore, yet works similarly.

    ignore

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    See documentation for .

    options

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Values: ignorecase, case, extended

    Regular expression options for match and ignore.

    global

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    global is a boolean parameter that when enabled causes writes from one connection to propagate to all other connections. This can be used to work around cases where one connection writes data and another reads it, expecting the write done by the other connection to be visible.

    This parameter only works with the time parameter. The use of global andcount at the same time is not allowed and will be treated as an error.

    Example Configuration

    Here is a minimal filter configuration for the CCRFilter which should solve most problems with critical reads after writes.

    With this configuration, whenever a connection does a write, all subsequent reads done by that connection will be forced to the primary for 5 seconds.

    This prevents read scaling until the modifications have been replicated to the replicas. For best performance, the value of time should be slightly greater than the actual replication lag between the primary and its replicas. If the number of critical read statements is known, the count parameter could be used to control the number reads that are sent to the primary.

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Throttle

    Throttle

    This filter was added in MariaDB MaxScale 2.3

    LOAD DATA INFILE 'S3://my-bucket/my-data.csv' INTO TABLE t1
        FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
    [LDI-Filter]
    type=filter
    module=ldi
    host=s3.amazonaws.com
    region=us-east-1
    SET @maxscale.ldi.s3_key='<my-access-key>', @maxscale.ldi.s3_secret='<my-secret-key>';
    LOAD DATA INFILE 'S3://my-bucket/my-data.csv' INTO TABLE t1;
    port
    no_verify
    use_http
    protocol_version
    import_user
    import_password

    MariaDB MaxScale 25.01 Filters

    Filters in MariaDB MaxScale intercept and modify database traffic. Use them to transform, block, or log queries, enabling fine-grained control over your database workload and security.

    Route to replica

  • Route to named server

  • Route to last used server

  • Name-value hints

  • Hint stack

  • COM_STMT_RESET
    Comments and comment types
    Hint body
    Routing destination hints
    Route to primary
    Prepared Statements
    Binary Protocol
    Text Protocol
    Examples
    Routing SELECT queries to primary
    MXS-2838

    Lua Script Calling Convention

    • Script Template

    • Functions Exposed by the Luafilter

  • Example Configuration and Script

  • Limitations

  • After the session script is loaded, the newSession function in the Lua scripts is called. The first parameter is the username of the client and the second parameter is the client's network address.

  • nil closeSession() - session is closed

    • The closeSession function in the Lua scripts will be called.

  • (nil | bool | string) routeQuery() - query is being routed

    • The Luafilter calls the routeQuery functions of both the session and the global script. The query is passed as a string parameter to the routeQuery Lua function and the return values of the session specific function, if any were returned, are interpreted. If the first value is bool, it is interpreted as a decision whether to route the query or to send an error packet to the client. If it is a string, the current query is replaced with the return value and the query will be routed. If nil is returned, the query is routed normally.

  • nil clientReply() - reply to a query is being routed

    • This function is called with the name of the server that returned the response.

  • string diagnostic() - global script only, print diagnostic information

    • If the Lua function returns a string that is valid JSON, it will be decoded as JSON and displayed as such in the REST API. If the object does not decode into JSON, it will be stored as a JSON string.

  • Returns the type of the current query being executed as a string. The values are the string versions of the query types defined in query_classifier.h are separated by vertical bars (|). This function can only be called from the routeQuery entry point.

  • string mxs_get_operation() (use: routeQuery)

  • Returns the current operation type as a string. The values are defined in query_classifier.h. This function can only be called from the routeQuery entry point.

  • string mxs_get_canonical() (use: routeQuery)

  • Returns the canonical version of a query by replacing all user-defined constant values with question marks. This function can only be called from the routeQuery entry point.

  • number mxs_get_session_id() (use: newSession, routeQuery, clientReply, closeSession)

  • This function returns the session ID of the current session. Inside thecreateInstance and diagnostic endpoints this function will always return the value 0.

  • string mxs_get_db() (use: newSession, routeQuery, clientReply, closeSession)

  • Returns the current default database used by the connection.

  • string mxs_get_user() (use: newSession, routeQuery, clientReply, closeSession)

  • Returns the username of the client connection.

  • string mxs_get_host() (use: newSession, routeQuery, clientReply, closeSession)

  • Returns the address of the client connection.

  • string mxs_get_replier() (use: clientReply)

  • Returns the target that returned the result to the latest query.

  • Lua Filter
    Settings
    global_script
    session_script
    time
  • count

  • match

  • ignore

  • options

  • global

  • Example Configuration

  • Default: ignorecase

  • Consistent Critical Read Filter
    Overview
    Controlling the Filter with SQL Comments
    Settings
    MaxScale hint syntax
    duration
    here
    regex
    regular expression settings
    regex
    match
    enum
    boolean
    # The --comments flag is needed for the command line client
    mariadb --comments -u my-user -psecret -e "SELECT @@hostname -- maxscale route to server db1"
    -- maxscale <hint body>
    -- maxscale route to [master | slave | server <server name>]
    -- maxscale route to master
    -- maxscale route to slave
    -- maxscale route to server <server name>
    -- maxscale route to last
    -- maxscale <param>=<value>
    -- maxscale <hint name> prepare <hint content>
    -- maxscale <hint name> begin
    -- maxscale end
    -- maxscale <hint name> begin <hint content>
    -- maxscale begin <hint content>
    SELECT user FROM accounts WHERE id = ? -- maxscale route to master
    PREPARE my_ps FROM 'SELECT user FROM accounts WHERE id = ?';
    EXECUTE my_ps USING 123; -- maxscale route to master
    [ReadWriteService]
    type=service
    router=readwritesplit
    servers=server1,server2
    user=maxuser
    password=maxpwd
    filters=Hint
    
    [Hint]
    type=filter
    module=hintfilter
    INSERT INTO table1 VALUES ("John","Doe",1);
    SELECT * FROM table1; -- maxscale route to master
    function createInstance(name)
    
    end
    
    function newSession(user, host)
    
    end
    
    function closeSession()
    
    end
    
    function routeQuery()
    
    end
    
    function clientReply()
    
    end
    
    function diagnostic()
    
    end
    [MyLuaFilter]
    type=filter
    module=luafilter
    global_script=/path/to/script.lua
    f = io.open("/tmp/test.log", "a+")
    
    function createInstance(name)
        f:write("createInstance for " .. name .. "\n")
    end
    
    function newSession(user, host)
        f:write("newSession for: " .. user .. "@" .. host .. "\n")
    end
    
    function closeSession()
        f:write("closeSession\n")
    end
    
    function routeQuery()
        f:write("routeQuery: " .. mxs_get_sql() .. " -- type: " .. mxs_qc_get_type_mask() .. " operation: " .. mxs_qc_get_operation() .. "\n")
    end
    
    function clientReply()
        f:write("clientReply: " .. mxs_get_replier() .. "\n")
    end
    
    function diagnostic()
        f:write("diagnostics\n")
        return "Hello from Lua!"
    end
    INSERT INTO departments VALUES ('d1234', 'NewDepartment'); -- maxscale ccr=ignore
    match=.*INSERT.*
    ignore=.*UPDATE.*
    options=case,extended
    [CCRFilter]
    type=filter
    module=ccrfilter
    time=5
    Overview
  • Configuration

    • Basic Configuration

    • Allowing high frequency bursts

  • Settings

    • max_qps

    • throttling_duration

    • sampling_duration

  • Overview

    The throttle filter is used to limit the maximum query frequency (QPS - queries per second) of a database session to a configurable value. The main use cases are to prevent a rogue session (client side error) and a DoS attack from overloading the system.

    The throttling is dynamic. The query frequency is not limited to an absolute value. Depending on the configuration the throttle will allow some amount of high frequency queries, or especially short bursts with no frequency limitation.

    Configuration

    Basic Configuration

    This configuration states that the query frequency will be throttled to around 500 qps, and that the time limit a query is allowed to stay at the maximum frequency is 60 seconds. All values involving time are configured in milliseconds. With the basic configuration the throttling will be nearly immediate, i.e. a session will only be allowed very short bursts of high frequency querying.

    When a session has been continuously throttled for throttling_duration milliseconds, or 60 seconds in this example, MaxScale will disconnect the session.

    Allowing high frequency bursts

    The two parameters max_qps and sampling_duration together define how a session is throttled.

    Suppose max qps is 400 qps and sampling duration is 10 seconds. Since QPS is not an instantaneous measure, but one could say it has a granularity of 10 seconds, we see that over the 10 seconds 10*400 = 4000 queries are allowed before throttling kicks in.

    With these values, a fresh session can start off with a speed of 2000 qps, and maintain that speed for 2 seconds before throttling starts.

    If the client continues to query at high speed and throttling duration is set to 10 seconds, Maxscale will disconnect the session 12 seconds after it started.

    Settings

    max_qps

    • Type: number

    • Mandatory: Yes

    • Dynamic: Yes

    Maximum queries per second.

    This is the frequency to which a session will be limited over a given time period. QPS is not measured as an instantaneous value but over a configurable sampling duration (see sampling_duration).

    throttling_duration

    • Type: duration

    • Mandatory: Yes

    • Dynamic: Yes

    This defines how long a session is allowed to be throttled before MaxScale disconnects the session.

    sampling_duration

    • Type: duration

    • Mandatory: No

    • Dynamic: Yes

    • Default: 250ms

    Sampling duration defines the window of time over which QPS is measured. This parameter directly affects the amount of time that high frequency queries are allowed before throttling kicks in.

    The lower this value is, the more strict throttling becomes. Conversely, the longer this time is, the longer bursts of high frequency querying is allowed.

    continuous_duration

    • Type: duration

    • Mandatory: No

    • Dynamic: Yes

    • Default: 2s

    This value defines what continuous throttling means. Continuous throttling starts as soon as the filter throttles the frequency. Continuous throttling ends when no throttling has been performed in the past continuous_duration time.

    This page is licensed: CC BY-SA / Gnu FDL

    Throttle
    [Throttle]
    type = filter
    module = throttlefilter
    max_qps = 500
    throttling_duration = 60000
    ...
    
    [Routing-Service]
    type = service
    filters = Throttle
    continuous_duration

    MaxScale 25.01 Top Filter

    Top Filter

    • Top Filter

      • Overview

    Overview

    The top filter is a filter module for MariaDB MaxScale that monitors every SQL statement that passes through the filter. It measures the duration of that statement, the time between the statement being sent and the first result being returned. The top N times are kept, along with the SQL text itself and a list sorted on the execution times of the query is written to a file upon closure of the client session.

    Configuration

    Example minimal configuration:

    Settings

    The top filter has one mandatory parameter, filebase, and a number of optional parameters.

    filebase

    • Type: string

    • Mandatory: Yes

    • Dynamic: Yes

    The basename of the output file created for each session. The session ID is added to the filename for each file written. This is a mandatory parameter.

    The filebase may also be set as the filter, the mechanism to set the filebase via the filter option is superseded by the parameter. If both are set the parameter setting will be used and the filter option ignored.

    count

    • Type: number

    • Mandatory: No

    • Dynamic: Yes

    • Default: 10

    The number of SQL statements to store and report upon.

    match

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    the queries logged by the filter.

    exclude

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    the queries logged by the filter.

    options

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Values: ignorecase, case, extended

    for match and exclude.

    source

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    Defines an address that is used to match against the address from which the client connection to MariaDB MaxScale originates. Only sessions that originate from this address will be logged.

    user

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    Defines a username that is used to match against the user from which the client connection to MariaDB MaxScale originates. Only sessions that are connected using this username will result in results being generated.

    Examples

    Example 1 - Heavily Contended Table

    You have an order system and believe the updates of the PRODUCTS table is causing some performance issues for the rest of your application. You would like to know which of the many updates in your application is causing the issue.

    Add a filter with the following definition:

    Note the exclude entry, this is to prevent updates to the PRODUCTS_STOCK table from being included in the report.

    Example 2 - One Application Server is Slow

    One of your applications servers is slower than the rest, you believe it is related to database access but you are not sure what is taking the time.

    Add a filter with the following definition:

    In order to produce a comparison with an unaffected application server you can also add a second filter as a control.

    In the service definition add both filters

    You will then have two sets of logs files written, one which profiles the top 20 queries of the slow application server and another that gives you the top 20 queries of your control application server. These two sets of files can then be compared to determine what if anything is different between the two.

    Output Report

    The following is an example report for a number of fictitious queries executed against the employees example database available for MySQL.

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Named Server Filter

    Named Server Filter

    • Named Server Filter

      • Overview

    Overview

    The namedserverfilter is a MariaDB MaxScale filter module able to route queries to servers based on regular expression (regex) matches. Since it is a filter instead of a router, the NamedServerFilter only sets routing suggestions. It requires a compatible router to be effective. Currently, bothreadwritesplit and hintrouter take advantage of routing hints in the data packets. This filter uses the PCRE2 library for regular expression matching.

    Configuration

    The filter accepts settings in two modes: legacy and indexed. Only one of the modes may be used for a given filter instance. The legacy mode is meant for backwards compatibility and allows only one regular expression and one server name in the configuration. In indexed mode, up to 25 regex-server pairs are allowed in the form match01 - target01, match02 - target02 and so on. Also, in indexed mode, the server names (targets) may contain a list of names or special tags ->master or ->slave.

    All parameters except the deprecated match and target parameters can be modified at runtime. Any modifications to the filter configuration will only affect sessions created after the change has completed.

    Below is a configuration example for the filter in indexed-mode. The legacy mode is not recommended and may be removed in a future release. In the example, a SELECT on TableOne (match01) results in routing hints to two named servers, while a SELECT on TableTwo is suggested to be routed to the primary server of the service. Whether a list of server names is interpreted as a route-to-any or route-to-all is up to the attached router. The HintRouter sees a list as a suggestion to route-to-any. For additional information on hints and how they can also be embedded into SQL-queries, see .

    Settings

    NamedServerFilter requires at least one matchXY - targetXY pair.

    matchXY

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    matchXY defines a against which the incoming SQL query is matched. XY must be a number in the range 01 - 25. Each match-setting pairs with a similarly indexed target-setting. If one is defined, the other must be defined as well. If a query matches the pattern, the filter attaches a routing hint defined by the target-setting to the query. The_options_-parameter affects how the patterns are compiled.

    options

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Values: ignorecase, case, extended

    for matchXY.

    targetXY

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The hint which is attached to the queries matching the regular expression defined by_matchXY_. If a compatible router is used in the service the query will be routed accordingly. The target can be one of the following:

    • a server or service name (adds a HINT_ROUTE_TO_NAMED_SERVER hint)

    • a list of server names, comma-separated (adds severalHINT_ROUTE_TO_NAMED_SERVER hints)

    • ->master (adds a HINT_ROUTE_TO_MASTER hint)

    The support for service names was added in MaxScale 6.3.2. Older versions of MaxScale did not accept service names in the target parameters.

    source

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    This optional parameter defines an IP address or mask which a connecting client's IP address is matched against. Only sessions whose address matches this setting will have this filter active and performing the regex matching. Traffic from other client IPs is simply left as is and routed straight through.

    Since MaxScale 2.1 it's also possible to use % wildcards:

    Note that using source=% to match any IP is not allowed.

    Since MaxScale 2.3 it's also possible to specify multiple addresses separated by comma. Incoming client connections are subsequently checked against each.

    user

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    This optional parameter defines a username the connecting client username is matched against. Only sessions that are connected using this username will have the match and routing hints applied to them. Traffic from other users is simply left as is and routed straight through.

    Additional remarks

    The maximum number of accepted match - target pairs is 25.

    In the configuration file, the indexed match and target settings may be in any order and may skip numbers. During SQL-query matching, however, the regexes are tested in ascending order: match01, match02, match03 and so on. As soon as a match is found for a given query, the routing hints are written and the packet is forwarded to the next filter or router. Any remaining match regexes are ignored. This means the match - target pairs should be indexed in priority order, or, if priority is not a factor, in order of decreasing match probability.

    Binary-mode prepared statements (COM_STMT_PREPARE) are handled by matching the prepared sql against the match-parameters. If a match is found, the routing hints are attached to any execution of that prepared statement. Text- mode prepared statements are not supported in this way. To divert them, use regular expressions which match the specific "EXECUTE"-query.

    Examples

    Example 1 - Route queries targeting a specific table to a server

    This will route all queries matching the regular expression *from *users to the server named server2. The filter will ignore character case in queries.

    A query like SELECT * FROM users would be routed to server2 where as a query like SELECT * FROM accounts would be routed according to the normal rules of the router.

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Tee Filter

    Tee Filter

    exclude

  • options

  • source

  • user

  • Default: case

  • Configuration
    Settings
    filebase
    count
    match
    Examples
    Example 1 - Heavily Contended Table
    Example 2 - One Application Server is Slow
    Output Report
    regex
    Limits
    regex
    Limits
    enum
    Regular expression options

    source

  • user

  • Default: ignorecase

  • ->slave (adds a HINT_ROUTE_TO_SLAVE hint)

  • ->all (adds a HINT_ROUTE_TO_ALL hint)

  • Configuration
    Settings
    matchXY
    options
    targetXY
    Additional remarks
    Examples
    Example 1 - Route queries targeting a specific table to a server
    Hint-Syntax
    regex
    PCRE2 regular expression
    enum
    Regular expression options
    [MyLogFilter]
    type=filter
    module=topfilter
    
    [Service]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=MyLogFilter
    filebase=/tmp/SqlQueryLog
    count=30
    match=select.*from.*customer.*where
    exclude=where
    options=case,extended
    source=127.0.0.1
    user=john
    [ProductsUpdateTop20]
    type=filter
    module=topfilter
    count=20
    match=UPDATE.*PRODUCTS.*WHERE
    exclude=UPDATE.*PRODUCTS_STOCK.*WHERE
    filebase=/var/logs/top/ProductsUpdate
    [SlowAppServer]
    type=filter
    module=topfilter
    count=20
    source=192.168.0.32
    filebase=/var/logs/top/SlowAppServer
    [ControlAppServer]
    
    type=filter
    module=topfilter
    count=20
    source=192.168.0.42
    filebase=/var/logs/top/ControlAppServer
    [App-Service]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=SlowAppServer | ControlAppServer
    -bash-4.1$ cat /var/logs/top/Employees-top-10.137
    
    Top 10 longest running queries in session.
    
    ==========================================
    
    Time (sec) | Query
    
    -----------+-----------------------------------------------------------------
    
        22.985 |  select sum(salary), year(from_date) from salaries s, (select distinct year(from_date) as y1 from salaries) y where (makedate(y.y1, 1) between s.from_date and s.to_date) group by y.y1
    
         5.304 |  select d.dept_name as "Department", y.y1 as "Year", count(*) as "Count" from departments d, dept_emp de, (select distinct year(from_date) as y1 from dept_emp order by 1) y where d.dept_no = de.dept_no and (makedate(y.y1, 1) between de.from_date and de.to_date) group by y.y1, d.dept_name order by 1, 2
    
         2.896 |  select year(now()) - year(birth_date) as age, gender, avg(salary) as "Average Salary" from employees e, salaries s where e.emp_no = s.emp_no and ("1988-08-01"  between from_date AND to_date) group by year(now()) - year(birth_date), gender order by 1,2
    
         2.160 |  select dept_name as "Department", sum(salary) / 12 as "Salary Bill" from employees e, departments d, dept_emp de, salaries s where e.emp_no = de.emp_no and de.dept_no = d.dept_no and ("1988-08-01"  between de.from_date AND de.to_date) and ("1988-08-01"  between s.from_date AND s.to_date) and s.emp_no = e.emp_no group by dept_name order by 1
    
         0.845 |  select dept_name as "Department", avg(year(now()) - year(birth_date)) as "Average Age", gender from employees e, departments d, dept_emp de where e.emp_no = de.emp_no and de.dept_no = d.dept_no and ("1988-08-01"  between from_date AND to_date) group by dept_name, gender
    
         0.668 |  select year(hire_date) as "Hired", d.dept_name, count(*) as "Count" from employees e, departments d, dept_emp de where de.emp_no = e.emp_no and de.dept_no = d.dept_no group by d.dept_name, year(hire_date)
    
         0.249 |  select moves.n_depts As "No. of Departments", count(moves.emp_no) as "No. of Employees" from (select de1.emp_no as emp_no, count(de1.emp_no) as n_depts from dept_emp de1 group by de1.emp_no) as moves group by moves.n_depts order by 1
    
         0.245 |  select year(now()) - year(birth_date) as age, gender, count(*) as "Count" from employees group by year(now()) - year(birth_date), gender order by 1,2
    
         0.179 |  select year(hire_date) as "Hired", count(*) as "Count" from employees group by year(hire_date)
    
         0.160 |  select year(hire_date) - year(birth_date) as "Age", count(*) as Count from employees group by year(hire_date) - year(birth_date) order by 1
    
    -----------+-----------------------------------------------------------------
    
    Session started Wed Jun 18 18:41:03 2014
    
    Connection from 127.0.0.1
    
    Username        massi
    
    Total of 24 statements executed.
    
    Total statement execution time      35.701 seconds
    
    Average statement execution time     1.488 seconds
    
    Total connection time               46.500 seconds
    
    -bash-4.1$
    [NamedServerFilter]
    type=filter
    module=namedserverfilter
    match01=^Select.*TableOne$
    target01=server2,server3
    match22=^SELECT.*TableTwo$
    target22=->master
    
    [MyService]
    type=service
    router=readwritesplit
    servers=server1,server2,server3
    user=myuser
    password=mypasswd
    filters=NamedServerFilter
    match01=^SELECT
    options=case,extended
    target01=MyServer2
    source=127.0.0.1
    source=192.%.%.%
    source=192.168.%.%
    source=192.168.10.%
    source=192.168.21.3,192.168.10.%
    user=john
    [NamedServerFilter]
    type=filter
    module=namedserverfilter
    match02= *from *users
    target02=server2
    
    [MyService]
    type=service
    router=readwritesplit
    servers=server1,server2
    user=myuser
    password=mypasswd
    filters=NamedServerFilter

    Configuration

  • Settings

    • target

    • service

    • match

  • Limitations

  • Module commands

    • tee disable [FILTER]

    • tee enable [FILTER]

  • Examples

    • Example 1 - Replicate all inserts into the orders table

  • Overview

    The tee filter is a "plumbing" fitting in the MariaDB MaxScale filter toolkit. It can be used in a filter pipeline of a service to make copies of requests from the client and send the copies to another service within MariaDB MaxScale.

    Please Note: Starting with MaxScale 2.2.0, any client that connects to a service which uses a tee filter will require a grant for the loopback address, i.e. 127.0.0.1.

    Configuration

    The configuration block for the TEE filter requires the minimal filter parameters in its section within the MaxScale configuration file. The service to send the duplicates to must be defined.

    Settings

    The tee filter requires a mandatory parameter to define the service to replicate statements to and accepts a number of optional parameters.

    target

    • Type: target

    • Mandatory: No

    • Dynamic: Yes

    • Default: none

    The target where the filter will duplicate all queries. The target can be either a service or a server. The duplicate connection that is created to this target will be referred to as the "branch target" in this document.

    service

    • Type: service

    • Mandatory: No

    • Dynamic: Yes

    • Default: none

    The service where the filter will duplicate all queries. This parameter is deprecated in favor of the target parameter and will be removed in a future release. Both target and service cannot be defined.

    match

    • Type: regex

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    What queries should be included.

    exclude

    • Type: regex

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    What queries should be excluded.

    options

    • Type: enum

    • Mandatory: No

    • Dynamic: Yes

    • Values: ignorecase, case, extended

    • Default: ignorecase

    How regular expressions should be interpreted.

    source

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The optional source parameter defines an address that is used to match against the address from which the client connection to MariaDB MaxScale originates. Only sessions that originate from this address will be replicated.

    user

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The optional user parameter defines a user name that is used to match against the user from which the client connection to MariaDB MaxScale originates. Only sessions that are connected using this username are replicated.

    sync

    • Type: boolean

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    Enable synchronous routing mode. When configured with sync=true, the filter will queue new queries until the response from both the main and the branch target has been received. This means that for n executed queries, n - 1 queries are guaranteed to be synchronized. Adding one extra statement (e.g. SELECT 1) to a batch of statements guarantees that all previous SQL statements have been successfully executed on both targets.

    In the synchronous routing mode, a failure of the branch target will cause the client session to be closed.

    Limitations

    • All statements that are executed on the branch target are done in an asynchronous manner. This means that when the client receives the response there is no guarantee that the statement has completed on the branch target. The sync feature provides some synchronization guarantees that can be used to verify successful execution on both targets.

    • Any errors on the branch target will cause the connection to it to be closed. If target is a service, it is up to the router to decide whether the connection is closed. For direct connections to servers, any network errors cause the connection to be closed. When the connection is closed, no new queries will be routed to the branch target.

    With sync=true, a failure of the branch target will cause the whole session to be closed.

    Module commands

    Read Module Commands documentation for details about module commands.

    The tee filter supports the following module commands.

    tee disable [FILTER]

    This command disables a tee filter instance. A disabled tee filter will not send any queries to the target service.

    tee enable [FILTER]

    Enable a disabled tee filter. This resumes the sending of queries to the target service.

    Examples

    Example 1 - Replicate all inserts into the orders table

    Assume an order processing system that has a table called orders. You also have another database server, the datamart server, that requires all inserts into orders to be replicated to it. Deletes and updates are not, however, required.

    Set up a service in MariaDB MaxScale, called Orders, to communicate with the order processing system with the tee filter applied to it. Also set up a service to talk to the datamart server, using the DataMart service. The tee filter would have as its service entry the DataMart service, by adding a match parameter of "insert into orders" would then result in all requests being sent to the order processing system, and insert statements that include the orders table being additionally sent to the datamart server.

    This page is licensed: CC BY-SA / Gnu FDL

    Tee Filter
    Overview
    [DataMartFilter]
    type=filter
    module=tee
    target=DataMart
    
    [Data-Service]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=DataMartFilter
    match=/insert.*into.*order*/
    exclude=/select.*from.*t1/
    options=case,extended
    source=127.0.0.1
    user=john
    [Orders]
    type=service
    router=readconnroute
    servers=server1, server2, server3, server4
    user=massi
    password=6628C50E07CCE1F0392EDEEB9D1203F3
    filters=ReplicateOrders
    
    [ReplicateOrders]
    type=filter
    module=tee
    target=DataMart
    match=insert[   ]*into[     ]*orders
    
    [DataMart]
    type=service
    router=readconnroute
    servers=datamartserver
    user=massi
    password=6628C50E07CCE1F0392EDEEB9D1203F3
    filters=QLA-DataMart
    
    [QLA-DataMart]
    type=filter
    module=qlafilter
    options=/var/log/DataMart/InsertsLog
    
    [Orders-Listener]
    type=listener
    target=Orders
    port=4011
    
    [DataMart-Listener]
    type=listener
    target=DataMart
    port=4012
    exclude
    options
    source
    user
    sync

    MaxScale 25.01 Query Log All Filter

    Query Log All Filter

    • Query Log All Filter

      • Overview

    Overview

    The Query Log All (QLA) filter logs query content. Logs are written to a file in CSV format. Log elements are configurable and include the time submitted and the SQL statement text, among others.

    Configuration

    A minimal configuration is below.

    Log Rotation

    The qlafilter logs can be rotated by executing the maxctrl rotate logs command. This will cause the log files to be reopened when the next message is written to the file. This applies to both unified and session type logging.

    Settings

    The QLA filter has one mandatory parameter, filebase, and a number of optional parameters. These were introduced in the 1.0 release of MariaDB MaxScale.

    filebase

    • Type: string

    • Mandatory: Yes

    • Dynamic: No

    The basename of the output file created for each session. A session index is added to the filename for each written session file. For unified log files,.unified is appended.

    match

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    Include queries that match the regex.

    exclude

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    Exclude queries that match the regex.

    options

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Values: case, ignorecase, extended

    The extended option enables PCRE2 extended regular expressions.

    user

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: ""

    Limit logging to sessions with this user.

    source

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: ""

    Limit logging to sessions with this client source address.

    user_match

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    Only log queries from users that match this pattern. If the user parameter is used, the value of user_match is ignored.

    Here is an example pattern that matches the users alice and bob:

    user_exclude

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    Exclude all queries from users that match this pattern. If the user parameter is used, the value of user_exclude is ignored.

    Here is an example pattern that excludes the users alice and bob:

    source_match

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    Only log queries from hosts that match this pattern. If the source parameter is used, the value of source_match is ignored.

    Here is an example pattern that matches the loopback interface as well as the address 192.168.0.109:

    source_exclude

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    Exclude all queries from hosts that match this pattern. If the source parameter is used, the value of source_exclude is ignored.

    Here is an example pattern that excludes the loopback interface as well as the address 192.168.0.109:

    log_type

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Values: session, unified, stdout

    The type of log file to use.

    Value
    Description

    log_data

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Values: service, session, date

    Type of data to log in the log files.

    Value
    Description

    The durations reply_time and total_reply_time are by default in milliseconds, but can be specified to another unit using duration_unit.

    The log entry is written when the last reply from the server is received. Prior to version 6.2 the entry was written when the query was received from the client, or if reply_time was specified, on first reply from the server.

    NOTE The error_msg is the raw message from the server. Even if use_canonical_form is set the error message may contain user defined constants. For example:

    Starting with MaxScale 24.02, the query parameter now correctly logs the execution of binary protocol commands as SQL (). The execution of batched statements (COM_STMT_BULK_LOAD) used by some connectors is not logged.

    duration_unit

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: milliseconds

    The unit for logging a duration. The unit can be milliseconds or microseconds. The abbreviations ms for milliseconds and us for microseconds are also valid. This option is available as of MaxScale version 6.2.

    use_canonical_form

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    When this option is true the canonical form of the query is logged. In the canonical form all user defined constants are replaced with question marks. This option is available as of MaxScale version 6.2.

    flush

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    Flush log files after every write.

    append

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    separator

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: ","

    Defines the separator string between elements of log entries. The value should be enclosed in quotes.

    newline_replacement

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: " "

    Default value is " " (one space). SQL-queries may include line breaks, which, if printed directly to the log, may break automatic parsing. This parameter defines what should be written in the place of a newline sequence (\r, \n or \r\n). If this is set as the empty string, then newlines are not replaced and printed as is to the output. The value should be enclosed in quotes.

    Limitations

    • Trailing parts of SQL queries that are larger than 16MiB are not logged. This means that the log output might contain truncated SQL.

    • Batched execution using COM_STMT_BULK_EXECUTE is not converted into their textual form. This is done due to the large volumes of data that are usually involved with batched execution.

    Examples

    Example 1 - Query without primary key

    Imagine you have observed an issue with a particular table and you want to determine if there are queries that are accessing that table but not using the primary key of the table. Let's assume the table name is PRODUCTS and the primary key is called PRODUCT_ID. Add a filter with the following definition:

    The result of using this filter with the service used by the application would be a log file of all select queries querying PRODUCTS without using the PRODUCT_ID primary key in the predicates of the query. Executing SELECT * FROM PRODUCTS would log the following into /var/logs/qla/SelectProducts:

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Rewrite Filter

    Rewrite Filter

    Overview

    The rewrite filter allows modification of sql queries on the fly. Reasons for modifying queries can be to rewrite a query for performance, or to change a specific query when the client query is incorrect and cannot be changed in a timely manner.

    The examples will use Rewrite Filter file format. See below.

    Syntax

    Native syntax

    Rewriter native syntax uses placeholders to grab and replace parts of text.

    Placeholders

    The syntax for a plain placeholder is @{N} where N is a positive integer.

    The syntax for a placeholder regex is @{N:regex}. It allows more control when needed.

    The below is a valid entry in rf format. For demonstration, all options are set. This entry is a do-nothing entry, but illustrates placeholders.

    If the input sql is select id, name from my_table where id = 42 then @{2} = "id, name" and @{3} = "42". Since the replace template is identical to the match template the end result is that the output sql will be the same as the input sql.

    Placeholders can be used as forward references.@{1:^}select @{2}, count(*) from @{3} group by @{2}. For a match, the two @{2} text grabs must be equal.

    Match template

    The match template is used to match against the sql to be rewritten.

    The match template can be partial from mytable. But the actual underlying regex match is always for the whole sql. If the match template does not start or end with a placeholder, placeholders are automatically added so that the above becomes @{1}from mytable@{2}. The automatically added placeholders cannot be used in the replace template.

    Matching the whole input also means that Native syntax does not support (and is not intended to support) scan and replace. Only the first occurrence of the above from mytable can be modified in the replace template. However, one can selectively choose to modify e.g. the first through third occurrence of from mytable by writingfrom mytable @{1} from mytable @{2} from mytable @{3}.

    For scan and replace use a different regex_grammar (see below).

    Replace template

    The replace template uses the placeholders from the match template to rewrite sql.

    An important option for smooth matching is ignore_whitespace, which is on (true) by default. It creates the match regex in such a way that the amount and kind of whitespace does not affect matching. However, to make ignore_whitespace always work, it is important to add whitespace where allowed. If "id=42" is in the match template then only the exact "id=42" can match. But if "id = 42" is used, andignore_whitespace is on, both "id=42" and "id = 42" will match.

    Another example, and what not to do:

    That works, but because the match lacks specific detail about the expected sql, things are likely to break. In this caseshow indexes from my_table would no longer work.

    The minimum detail in this case could be:

    but if more detail is known, like something specific in the where clause, that too should be added.

    Placeholder Regex

    Syntax: @{N:regex}

    In a placeholder regex the character } must be escaped to \} (for literal matching). Plain parenthesis "()" indicate capturing groups, which are internally used by the Native grammar. Thus plain parentheses in a placeholder regex will break matching. However, non-capturing groups can be used: e.g. @{1:(:?Jane|Joe)}. To match a literal parenthesis use an escape, e.g. \(.

    Suppose an application is misbehaving after an upgrade and a quick fix is needed. This query select zip from address_book where str_id = "AZ-124" is correct, but if the id is an integer the where clause should be id = 1234.

    Using plain regular expressions

    For scan and replace the regex_grammar must be set to something else than Native. An example will illustrate the usage.

    Replace all occurrences of "wrong_table_name" with "correct_table_name". Further, if the replacement was made then replace all occurrences of wrong_column_name with correct_column_name.

    Configuration

    Adding a rewrite filter.

    Settings

    template_file

    • Type: string

    • Mandatory: Yes

    • Dynamic: Yes

    • Default: No default value

    Path to the template file.

    regex_grammar

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: Native

    Default regex_grammar for templates

    case_sensitive

    • Type: boolean

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    Default case sensitivity for templates

    log_replacement

    • Type: boolean

    • Mandatory: No

    • Dynamic: Yes

    • Default: false

    Log replacements at NOTICE level.

    Settings per template in the template file

    regex_grammar

    • Type: string

    • Values: Native, ECMAScript, Posix, EPosix, Awk, Grep, EGrep

    Overrides the global regex_grammar of a template.

    case_sensitive

    • Type: boolean

    • Default: From maxscale.cnf

    Overrides the global case sensitivity of a template.

    ignore_whitespace

    • Type: boolean

    • Default: true

    Ignore whitespace differences in the match template and input sql.

    continue_if_matched

    • Type: boolean

    • Default: false

    If a template matches and the replacement is done, continue to the next template and apply it to the result of the previous rewrite.

    what_if

    • Type: boolean

    • Default: false

    Do not make the replacement, only log what would have been replaced (NOTICE level).

    Rewrite file format

    The rf format for an entry is:

    The character # starts a single line comment when it is the first character on a line.

    Empty lines are ignored.

    The rf format does not need any additional escaping to what the basic format requires (see Placeholder Regex).

    Options are specified as follows:

    The colon must stick to the option name.

    The separators % and %% must be the exact content of their respective separator lines.

    The templates can span multiple lines. Whitespace does not matter as long as ignore_whitespace = true. Always use space where space is allowed to maximize the utility ofignore_whitespace.

    Example

    Json file format

    The json file format is harder to read and edit manually. It will be needed if support for editing of rewrite templates is added to the GUI.

    All double quotes and escape characters have to be escaped in json, i.e '"' and '\'.

    The same example as above is:

    Reload template file

    The configuration is re-read if any dynamic value is updated even if the value does not change.

    Reference

    • ECMAScript

    • Posix

    • EPosix

    • Awk

    This page is licensed: CC BY-SA / Gnu FDL

    MaxScale 25.01 Masking

    Masking

    This filter was introduced in MariaDB MaxScale 2.1.

    options

  • user

  • source

  • user_match

  • user_exclude

  • source_match

  • source_exclude

  • log_type

  • log_data

  • duration_unit

  • use_canonical_form

  • flush

  • append

  • separator

  • newline_replacement

  • Default: case

  • Default: session

  • ,
    user
    ,
    reply_time
    ,
    total_reply_time
    ,
    query
    ,
    default_db
    ,
    num_rows
    ,
    reply_size
    ,
    transaction
    ,
    transaction_time
    ,
    num_warnings
    ,
    error_msg
  • Default: date, user, query

  • query

    The SQL of the query if it contains it

    default_db

    The default (current) database

    num_rows

    Number of rows in the result set (v6.2)

    reply_size

    Number of bytes received from the server (v6.2)

    transaction

    BEGIN, COMMIT and ROLLBACK (v6.2)

    transaction_time

    The duration of a transaction (v6.2)

    num_warnings

    Number of warnings in the server reply (v6.2)

    error_msg

    Error message from the server (if any) (v6.2)

    server

    The server where the query was routed (if any) (v22.08)

    command

    The protocol command that was executed (v24.02)

    session

    Write to session-specific files

    unified

    Use one file for all sessions

    stdout

    Same as unified, but to stdout

    service

    Service name

    session

    Unique session id (ignored for session files)

    date

    Timestamp

    user

    User and hostname of client

    reply_time

    Duration from client query to first server reply

    total_reply_time

    Duration from client query to last server reply (v6.2)

    Configuration
    Log Rotation
    Settings
    filebase
    match
    exclude
    Limitations
    Examples
    Example 1 - Query without primary key
    regex
    regex
    enum_mask
    regex
    regex
    regex
    regex
    enum_mask
    enum_mask
    MXS-4959
    bool
    bool
    bool
    Values: Native, ECMAScript, Posix, EPosix, Awk, Grep, EGrep

    Default: From maxscale.cnf

    Grep Same as Posix with the addition of newline '\n' as an alternation separator.
  • EGrep Same as EPosix with the addition of newline '\n' as an alternation separator in addition to '|'.

  • ECMAScript
    V1_chap09.html#tag_09_03
    V1_chap09.html#tag_09_04
    awk.html#tag_20_06_13_04
    [MyLogFilter]
    type=filter
    module=qlafilter
    filebase=/tmp/SqlQueryLog
    
    [MyService]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=MyLogFilter
    filebase=/tmp/SqlQueryLog
    user_match=/(^alice$)|(^bob$)/
    user_exclude=/(^alice$)|(^bob$)/
    source_match=/(^127[.]0[.]0[.]1)|(^192[.]168[.]0[.]109)/
    source_exclude=/(^127[.]0[.]0[.]1)|(^192[.]168[.]0[.]109)/
    MariaDB [test]> select secret from T where x password="clear text pwd";
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
    that corresponds to your MariaDB server version for the right syntax to
    use near 'password="clear text pwd"' at line 1
    newline_replacement=" NL "
    [ProductsSelectLogger]
    type=filter
    module=qlafilter
    match=SELECT.*from.*PRODUCTS .*
    exclude=WHERE.*PRODUCT_ID.*
    filebase=/var/logs/qla/SelectProducts
    
    [Product-Service]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=ProductsSelectLogger
    07:12:56.324 7/01/2016, SELECT * FROM PRODUCTS
    %%
    # options
    regex_grammar: Native
    case_sensitive: true
    what_if: false
    continue_if_matched: false
    ignore_whitespace: true
    %
    # match template
    @{1:^}select @{2} from my_table where id = @{3}
    %
    # replace template
    select @{2} from my_table where id = @{3}
    %%
    # use default options by leaving this blank
    %
    @{1:^}select count(distinct @{2}) from @{3}
    %
    select count(*) from (select distinct @{1} from @{2}) as t123
    
    Input: select count(distinct author) from books where entity != "AI"
    
    Rewritten: select count(*) from (select distinct author from books where entity != "AI") as t123
    %%
    %
    from mytable
    %
    from mytable force index (myindex)
    
    Input: select name from mytable where id=42
    
    Rewritten: select name from mytable force index (myindex) where id=42
    %%
    %
    @{1:^}select @{2} from mytable
    %
    select @{2} from mytable force index (myindex)
    %%
    %
    @{1:^}select zip_code from address_book where str_id = @{1:["]}@{2:[[:digit:]]+}@{3:["]}
    %
    select zip_code from address_book where id = @{2}
    
    Input: select zip_code from address_book where str_id = "1234"
    
    Rewritten: select zip_code from address_book where id = 1234
    %%
    regex_grammar: EPosix
    continue_if_matched: true
    %
    wrong_table_name
    %
    correct_table_name
    
    %%
    regex_grammar: EPosix
    %
    wrong_column_name
    %
    correct_column_name
    [Rewrite]
    type = filter
    module = rewritefilter
    template_file = /path/to/template_file.rf
    ...
    
    [Router]
    type=service
    ...
    filters=Rewrite
    %%
    options
    %
    match template
    %
    replace template
    case_sensitive: true
    %%
    case_sensitive: false
    %
    @{1:^}select @{2}
    from mytable
    where user = @{3}
    %
    select @{2} from mytable where user = @{3}
    and @{3} in (select user from approved_users)
    { "templates" :
        [
            {
                "case_sensitive" : false,
                "match_template" : "@{1:^}select @{2} from mytable where user = @{3}",
                "replace_template" : "select @{2} from mytable where user = @{3}
    and @{3} in (select user from approved_users)"
            }
        ]
    }
    maxctrl alter filter Rewrite log_replacement=false
    Overview
  • Security

  • Limitations

  • Configuration

  • Settings

    • rules

    • warn_type_mismatch

    • large_payload

  • Rules

    • replace

    • obfuscate

    • with

  • Module commands

    • reload

  • Example

    • Configuration

    • masking_rules.json

  • Overview

    With the masking filter it is possible to obfuscate the returned value of a particular column.

    For instance, suppose there is a table person that, among other columns, contains the column ssn where the social security number of a person is stored.

    With the masking filter it is possible to specify that when the ssn field is queried, a masked value is returned unless the user making the query is a specific one. That is, when making the query

    instead of getting the real result, as in

    the ssn would be masked, as in

    Note that the masking filter should be viewed as a best-effort solution intended for protecting against accidental misuse rather than malicious attacks.

    Security

    From MaxScale 2.3 onwards, the masking filter will reject statements that use functions in conjunction with columns that should be masked. Allowing function usage provides a way for circumventing the masking, unless a firewall filter is separately configured and installed.

    Please see the configuration parameter prevent_function_usage for how to change the default behaviour.

    From MaxScale 2.3.5 onwards, the masking filter will check the definition of user variables and reject statements that define a user variable using a statement that refers to columns that should be masked.

    Please see the configuration parameter check_user_variables for how to change the default behaviour.

    From MaxScale 2.3.5 onwards, the masking filter will examine unions and if the second or subsequent SELECT refer to columns that should be masked, the statement will be rejected.

    Please see the configuration parameter check_unions for how to change the default behaviour.

    From MaxScale 2.3.5 onwards, the masking filter will examine subqueries and if a subquery refers to columns that should be masked, the statement will be rejected.

    Please see the configuration parameter check_subqueries for how to change the default behaviour.

    Note that in order to ensure that it is not possible to get access to masked data, the privileges of the users should be minimized. For instance, if a user can create tables and perform inserts, he or she can execute something like

    to get access to the cleartext version of a masked field ssn.

    From MaxScale 2.3.5 onwards, the masking filter will, if any of theprevent_function_usage, check_user_variables, check_unions orcheck_subqueries parameters is set to true, block statements that cannot be fully parsed.

    Please see the configuration parameter require_fully_parsed for how to change the default behaviour.

    From MaxScale 2.3.7 onwards, the masking filter will treat any strings passed to functions as if they were fields. The reason is that as the MaxScale query classifier is not aware of whether ANSI_QUOTES is enabled or not, it is possible to bypass the masking by turning that option on.

    Before this change, the content of the field ssn would have been returned in clear text even if the column should have been masked.

    Note that this change will mean that there may be false positives if ANSI_QUOTES is not enabled and a string argument happens to be the same as the name of a field to be masked.

    Please see the configuration parameter [treat_string_arg_as_field(#treat_string_arg_as_field) for how to change the default behaviour.

    Limitations

    The masking filter can only be used for masking columns of the following types: BINARY, VARBINARY, CHAR, VARCHAR, BLOB, TINYBLOB,MEDIUMBLOB, LONGBLOB, TEXT, TINYTEXT, MEDIUMTEXT, LONGTEXT,ENUM and SET. If the type of the column is something else, then no masking will be performed.

    Currently, the masking filter can only work on packets whose payload is less than 16MB. If the masking filter encounters a packet whose payload is exactly that, thus indicating a situation where the payload is delivered in multiple packets, the value of the parameter large_payloads specifies how the masking filter should handle the situation.

    Configuration

    The masking filter is taken into use with the following kind of configuration setup.

    Settings

    The masking filter has one mandatory parameter - rules.

    rules

    • Type: path

    • Mandatory: Yes

    • Dynamic: Yes

    Specifies the path of the file where the masking rules are stored. A relative path is interpreted relative to the module configuration directory of MariaDB MaxScale. The default module configuration directory is_/etc/maxscale.modules.d_.

    warn_type_mismatch

    • Type: enum

    • Mandatory: No

    • Dynamic: Yes

    • Values: never, always

    • Default: never

    With this optional parameter the masking filter can be instructed to log a warning if a masking rule matches a column that is not of one of the allowed types.

    large_payload

    • Type: enum

    • Mandatory: No

    • Dynamic: Yes

    • Values: ignore, abort

    • Default: abort

    This optional parameter specifies how the masking filter should treat payloads larger than 16MB, that is, payloads that are delivered in multiple MySQL protocol packets.

    The values that can be used are ignore, which means that columns in such payloads are not masked, and abort, which means that if such payloads are encountered, the client connection is closed. The default is abort.

    Note that the aborting behaviour is applied only to resultsets that contain columns that should be masked. There are no limitations on resultsets that do not contain such columns.

    prevent_function_usage

    • Type: bool

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    This optional parameter specifies how the masking filter should behave if a column that should be masked, is used in conjunction with some function. As the masking filter works only on the basis of the information in the returned result-set, if the name of a column is not present in the result-set, then the masking filter cannot mask a value. This means that the masking filter basically can be bypassed with a query like:

    If the value of prevent_function_usage is true, then all statements that contain functions referring to masked columns will be rejected. As that means that also queries using potentially harmless functions, such as LENGTH(masked_column), are rejected as well, this feature can be turned off. In that case, the firewall filter should be setup to allow or reject the use of certain functions.

    require_fully_parsed

    • Type: bool

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    This optional parameter specifies how the masking filter should behave in case any of prevent_function_usage, check_user_variables,check_unions or check_subqueries is true and it encounters a statement that cannot be fully parsed,

    If true, then statements that cannot be fully parsed (due to a parser limitation) will be blocked.

    Note that if this parameter is set to false, then prevent_function_usage,check_user_variables, check_unions and check_subqueries are rendered less effective, as it with a statement that cannot be fully parsed may be possible to bypass the protection that they are intended to provide.

    treat_string_arg_as_field

    • Type: bool

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    This optional parameter specifies how the masking filter should treat strings used as arguments to functions. If true, they will be handled as fields, which will cause fields to be masked even if ANSI_QUOTES has been enabled and " is used instead of backtick.

    check_user_variables

    • Type: bool

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    This optional parameter specifies how the masking filter should behave with respect to user variables. If true, then a statement like

    will be rejected if ssn is a column that should be masked.

    check_unions

    • Type: bool

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    This optional parameter specifies how the masking filter should behave with respect to UNIONs. If true, then a statement like

    will be rejected if b is a column that should be masked.

    check_subqueries

    • Type: bool

    • Mandatory: No

    • Dynamic: Yes

    • Default: true

    This optional parameter specifies how the masking filter should behave with respect to subqueries. If true, then a statement like

    will be rejected if a is a column that should be masked.

    Rules

    The masking rules are expressed as a JSON object.

    The top-level object is expected to contain a key rules whose value is an array of rule objects.

    Each rule in the rules array is a JSON object, expected to contain the keys replace, with, applies_to andexempted. The two former ones are obligatory and the two latter ones optional.

    replace

    The value of this key is an object that specifies the column whose values should be masked. The object must contain the keycolumn and may contain the keys table and database. The value of these keys must be a string.

    If only column is specified, then a column with that name matches irrespective of the table and database. If table is specified, then the column matches only if it is in a table with the specified name, and if database is specified when the column matches only if it is in a database with the specified name.

    NOTE If a rule contains a table/database then if the resultset does not contain table/database information, it will always be considered a match if the column matches. For instance, given the rule above, if there is a table person2, also containing an ssn field, then a query like

    will not return masked values, but a query like

    will only return masked values, even if the ssn values fromperson2 in principle should not be masked. The same effect is observed even with a nonsensical query like

    even if nothing from person2 should be masked. The reason is that as the resultset contains no table information, the values must be masked if the column name matches, as otherwise the masking could easily be circumvented with a query like

    The optional key match makes partial replacement of the original value possible: only the matched part would be replaced with the fill character. The match value must be a valid pcre2 regular expression.

    obfuscate

    The obfuscate rule allows the obfuscation of the value by passing it through an obfuscation algorithm. Current solution uses a non-reversible obfuscation approach.

    However, note that although it is in principle impossible to obtain the original value from the obfuscated one, if the range of possible original values is limited, it is straightforward to figure out the possible original values by running all possible values through the obfuscation algorithm and then comparing the results.

    The minimal configuration is:

    Output example for Db field name = 'remo'

    with

    The value of this key is an object that specifies what the value of the matched column should be replaced with for the replace rule. Currently, the object is expected to contain either the key value or the key fill. The value of both must be a string with length greater than zero. If both keys are specified, value takes precedence. If fill is not specified, the default X is used as its value.

    If value is specified, then its value is used to replace the actual value verbatim and the length of the specified value must match the actual returned value (from the server) exactly. If the lengths do not match, the value offill is used to mask the actual value.

    When the value of fill (fill-value) is used for masking the returned value, the fill-value is used as many times as necessary to match the length of the return value. If required, only a part of the fill-value may be used in the end of the mask value to get the lengths to match.

    applies_to

    With this optional key, whose value must be an array of strings, it can be specified what users the rule is applied to. Each string should be a MariaDB account string, that is, % is a wildcard.

    If this key is not specified, then the masking is performed for all users, except the ones exempted using the key exempted.

    exempted

    With this optional key, whose value must be an array of strings, it can be specified what users the rule is not applied to. Each string should be a MariaDB account string, that is, % is a wildcard.

    Module commands

    Read Module Commands documentation for details about module commands.

    The masking filter supports the following module commands.

    reload

    Reload the rules from the rules file. The new rules are taken into use only if the loading succeeds without any errors.

    MyMaskingFilter refers to a particular filter section in the MariaDB MaxScale configuration file.

    Example

    In the following we configure a masking filter MyMasking that should always log a warning if a masking rule matches a column that is of a type that cannot be masked, and that should abort the client connection if a resultset package is larger than 16MB. The rules for the masking filter are in the file masking_rules.json.

    Configuration

    masking_rules.json

    The rules specify that the data of a column whose name is ssn, should be replaced with the string 012345-ABCD. If the length of the data is not exactly the same as the length of the replacement value, then the data should be replaced with as many X characters as needed.

    This page is licensed: CC BY-SA / Gnu FDL

    Masking

    MaxScale 25.01 Transaction Performance Monitoring Filter

    Transaction Performance Monitoring Filter

    Note: This module is experimental and must be built from source. The module is deprecated in MaxScale 23.08 and might be removed in a future release.

    • Transaction Performance Monitoring Filter

    Overview

    The Transaction Performance Monitoring (TPM) filter is a filter module for MaxScale that monitors every SQL statement that passes through the filter. The filter groups a series of SQL statements into a transaction by detecting 'commit' or 'rollback' statements. It logs all committed transactions with necessary information, such as timestamp, client, SQL statements, latency, etc., which can be used later for transaction performance analysis.

    Configuration

    The configuration block for the TPM filter requires the minimal filter options in it's section within the maxscale.cnf file, stored in /etc/maxscale.cnf.

    Filter Options

    The TPM filter does not support any filter options currently.

    Settings

    The TPM filter accepts a number of optional parameters.

    Filename

    The name of the output file created for performance logging. The default filename is tpm.log.

    Source

    The optional source parameter defines an address that is used to match against the address from which the client connection to MaxScale originates. Only sessions that originate from this address will be logged.

    User

    The optional user parameter defines a user name that is used to match against the user from which the client connection to MaxScale originates. Only sessions that are connected using this username are logged.

    Delimiter

    The optional delimiter parameter defines a delimiter that is used to distinguish columns in the log. The default delimiter is :::.

    Query_delimiter

    The optional query_delimiter defines a delimiter that is used to distinguish different SQL statements in a transaction. The default query delimiter is @@@.

    Named_pipe

    named_pipe is the path to a named pipe, which TPM filter uses to communicate with 3rd-party applications (e.g., ). Logging is enabled when the router receives the character '1' and logging is disabled when the router receives the character '0' from this named pipe. The default named pipe is /tmp/tpmfilter and logging is disabled by default.

    For example, the following command enables the logging:

    Similarly, the following command disables the logging:

    Log Output Format

    For each transaction, the TPM filter prints its log in the following format:

    <timestamp> | <server_name> | <user_name> | <latency of the transaction> | <latencies of individual statements in the transaction> (delimited by 'query_delimiter') | <actual SQL statements>

    Examples

    Example 1 - Log Transactions for Performance Analysis

    You want to log every transaction with its SQL statements and latency for future transaction performance analysis.

    Add a filter with the following definition:

    After the filter reads the character '1' from its named pipe, the following is an example log that is generated from the above TPM filter with the above configuration:

    Note that 3 and 6 are latencies of each transaction in milliseconds, while 0.165 and 0.123 are latencies of the first statement of each transaction in milliseconds.

    This page is licensed: CC BY-SA / Gnu FDL

    > SELECT name, ssn FROM person;
    +-------+-------------+
    + name  | ssn         |
    +-------+-------------+
    | Alice | 721-07-4426 |
    | Bob   | 435-22-3267 |
    ...
    +-------+-------------+
    + name  | ssn         |
    +-------+-------------+
    | Alice | XXX-XX-XXXX |
    | Bob   | XXX-XX-XXXX |
    ...
    CREATE TABLE cheat (revealed_ssn TEXT);
    INSERT INTO cheat SELECT ssn FROM users;
    SELECT revealed_ssn FROM cheat;
    mysql> set @@sql_mode = 'ANSI_QUOTES';
    mysql> select concat("ssn") from managers;
    [Mask-SSN]
    type=filter
    module=masking
    rules=...
    
    [SomeService]
    type=service
    ...
    filters=Mask-SSN
    rules=/path/to/rules-file
    warn_type_mismatch=always
    large_payload=ignore
    SELECT CONCAT(masked_column) FROM tbl;
    prevent_function_usage=false
    require_fully_parsed=false
    treat_string_arg_as_field=false
    set @a = (select ssn from customer where id = 1);
    check_user_variables=false
    SELECT a FROM t1 UNION SELECT b FROM t2;
    check_unions=false
    SELECT * FROM (SELECT a AS b FROM t1) AS t2;
    check_subqueries=false
    {
        "rules": [ ... ]
    }
    {
        "rules": [
            {
                "replace": { ... },
                "with": { ... },
                "applies_to": [ ... ],
                "exempted": [ ... ]
            }
        ]
    }
    {
        "rules": [
            {
                "replace": {
                    "database": "db1",
                    "table": "person",
                    "column": "ssn"
                },
                "with": { ... },
                "applies_to": [ ... ],
                "exempted": [ ... ]
            }
        ]
    }
    SELECT ssn FROM person2;
    SELECT ssn FROM person UNION SELECT ssn FROM person2;
    SELECT ssn FROM person2 UNION SELECT ssn FROM person2;
    SELECT ssn FROM person UNION SELECT ssn FROM person;
    "replace": {
                    "column": "ssn",
                    "match": "(123)"
                },
                "with": {
                    "fill": "X#"
                }
    "obfuscate": {
                    "column": "name"
                }
    SELECT name from db1.tbl1;`
    
    +------+
    | name |
    +------+
    | $-~) |
    +------+
    {
        "rules": [
            {
                "replace": {
                    "column": "ssn"
                },
                "with": {
                    "value": "XXX-XX-XXXX"
                },
                "applies_to": [ ... ],
                "exempted": [ ... ]
            },
            {
                "replace": {
                    "column": "age"
                },
                "with": {
                    "fill": "*"
                },
                "applies_to": [ ... ],
                "exempted": [ ... ]
            },
            {
                "replace": {
                    "column": "creditcard"
                },
                "with": {
                    "value": "1234123412341234",
                    "fill": "0"
                },
                "applies_to": [ ... ],
                "exempted": [ ... ]
            },
        ]
    }
    {
        "rules": [
            {
                "replace": { ... },
                "with": { ... },
                "applies_to": [ "'alice'@'host'", "'bob'@'%'" ],
                "exempted": [ ... ]
            }
        ]
    }
    {
        "rules": [
            {
                "replace": { ... },
                "with": { ... },
                "applies_to": [ ... ],
                "exempted": [ "'admin'" ]
            }
        ]
    }
    MaxScale> call command masking reload MyMaskingFilter
    [MyMasking]
    type=filter
    module=masking
    warn_type_mismatch=always
    large_payload=abort
    rules=masking_rules.json
    
    [MyService]
    type=service
    ...
    filters=MyMasking
    {
        "rules": [
            {
                "replace": {
                    "column": "ssn"
                },
                "with": {
                    "value": "012345-ABCD",
                    "fill": "X"
                }
            }
        ]
    }
    prevent_function_usage
    require_fully_parsed
    treat_string_arg_as_field
    check_user_variables
    check_unions
    check_subqueries
    applies_to
    exempted

    Delimiter

  • Query_delimiter

  • Named_pipe

  • Overview
    Configuration
    Filter Options
    Settings
    Filename
    Source
    User
    Log Output Format
    Examples
    Example 1 - Log Transactions for Performance Analysis
    DBSeer
    [MyLogFilter]
    type=filter
    module=tpmfilter
    
    [MyService]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=MyLogFilter
    filename=/tmp/SqlQueryLog
    source=127.0.0.1
    user=john
    delimiter=:::
    query_delimiter=@@@
    named_pipe=/tmp/tpmfilter
    $ echo '1' > /tmp/tpmfilter
    $ echo '0' > /tmp/tpmfilter
    [PerformanceLogger]
    type=filter
    module=tpmfilter
    delimiter=:::
    query_delimiter=@@@
    filename=/var/logs/tpm/perf.log
    named_pipe=/tmp/tpmfilter
    
    [Product-Service]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=PerformanceLogger
    1484086477::::server1::::root::::3::::0.165@@@@0.108@@@@0.102@@@@0.092@@@@0.121@@@@0.122@@@@0.110@@@@2.081::::UPDATE WAREHOUSE SET W_YTD = W_YTD + 3630.48  WHERE W_ID = 2 @@@@SELECT W_STREET_1, W_STREET_2, W_CITY, W_STATE, W_ZIP, W_NAME FROM WAREHOUSE WHERE W_ID = 2@@@@UPDATE DISTRICT SET D_YTD = D_YTD + 3630.48 WHERE D_W_ID = 2 AND D_ID = 9@@@@SELECT D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP, D_NAME FROM DISTRICT WHERE D_W_ID = 2 AND D_ID = 9@@@@SELECT C_FIRST, C_MIDDLE, C_LAST, C_STREET_1, C_STREET_2, C_CITY, C_STATE, C_ZIP, C_PHONE, C_CREDIT, C_CREDIT_LIM, C_DISCOUNT, C_BALANCE, C_YTD_PAYMENT, C_PAYMENT_CNT, C_SINCE FROM CUSTOMER WHERE C_W_ID = 2 AND C_D_ID = 9 AND C_ID = 1025@@@@UPDATE CUSTOMER SET C_BALANCE = 1007749.25, C_YTD_PAYMENT = 465215.47, C_PAYMENT_CNT = 203 WHERE C_W_ID = 2 AND C_D_ID = 9 AND C_ID = 1025@@@@INSERT INTO HISTORY (H_C_D_ID, H_C_W_ID, H_C_ID, H_D_ID, H_W_ID, H_DATE, H_AMOUNT, H_DATA)  VALUES (9,2,1025,9,2,'2017-01-10 17:14:37',3630.48,'locfljbe    xtnfqn')
    1484086477::::server1::::root::::6::::0.123@@@@0.087@@@@0.091@@@@0.098@@@@0.078@@@@0.106@@@@0.094@@@@0.074@@@@0.089@@@@0.073@@@@0.098@@@@0.073@@@@0.088@@@@0.072@@@@0.087@@@@0.071@@@@0.085@@@@0.078@@@@0.088@@@@0.098@@@@0.081@@@@0.076@@@@0.082@@@@0.073@@@@0.077@@@@0.070@@@@0.105@@@@0.093@@@@0.088@@@@0.089@@@@0.087@@@@0.087@@@@0.086@@@@1.883::::SELECT C_DISCOUNT, C_LAST, C_CREDIT, W_TAX  FROM CUSTOMER, WAREHOUSE WHERE W_ID = 2 AND C_W_ID = 2 AND C_D_ID = 10 AND C_ID = 1267@@@@SELECT D_NEXT_O_ID, D_TAX FROM DISTRICT WHERE D_W_ID = 2 AND D_ID = 10 FOR UPDATE@@@@UPDATE DISTRICT SET D_NEXT_O_ID = D_NEXT_O_ID + 1 WHERE D_W_ID = 2 AND D_ID = 10@@@@INSERT INTO OORDER (O_ID, O_D_ID, O_W_ID, O_C_ID, O_ENTRY_D, O_OL_CNT, O_ALL_LOCAL) VALUES (286871, 10, 2, 1267, '2017-01-10 17:14:37', 7, 1)@@@@INSERT INTO NEW_ORDER (NO_O_ID, NO_D_ID, NO_W_ID) VALUES ( 286871, 10, 2)@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 24167@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 24167 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 96982@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 96982 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 40679@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 40679 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 31459@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 31459 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 6143@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 6143 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 12001@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 12001 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 40407@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05,        S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 40407 AND S_W_ID = 2 FOR UPDATE@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,1,24167,2,7,348.31998,'btdyjesowlpzjwnmxdcsion')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,2,96982,2,1,4.46,'kudpnktydxbrbxibbsyvdiw')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,3,40679,2,7,528.43,'nhcixumgmosxlwgabvsrcnu')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,4,31459,2,9,341.82,'qbglbdleljyfzdpfbyziiea')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,5,6143,2,3,152.67,'tmtnuupaviimdmnvmetmcrc')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,6,12001,2,5,304.3,'ufytqwvkqxtmalhenrssfon')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID,  OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,7,40407,2,1,30.32,'hvclpfnblxchbyluumetcqn')@@@@UPDATE STOCK SET S_QUANTITY = 65 , S_YTD = S_YTD + 7, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 24167 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 97 , S_YTD = S_YTD + 1, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 96982 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 58 , S_YTD = S_YTD + 7, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 40679 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 28 , S_YTD = S_YTD + 9, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 31459 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 86 , S_YTD = S_YTD + 3, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 6143 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 13 , S_YTD = S_YTD + 5, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 12001 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 44 , S_YTD = S_YTD + 1, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0  WHERE S_I_ID = 40407 AND S_W_ID = 2
    ...

    MaxScale 25.01 Workload Capture and Replay

    The WCAR (Write Consistent Archive and Replay) module is a sophisticated feature designed to capture and archive client traffic passing through a MaxScale instance. This allows system administrators and developers to process and store vast volumes of data related to client-server interactions in a reliable manner. By harnessing this captured data, users gain the flexibility to replay and simulate the varied client activity typically seen in a production environment.

    One of the module's primary advantages is that it removes the necessity of creating explicit traffic generators, which can be resource-intensive and complex to maintain. Instead, the WCAR module provides a seamless method for mirroring realistic client interactions and behavior patterns, which can be critical for testing, debugging, and optimizing system performance.

    Additionally, by facilitating traffic replay, the WCAR module aids in identifying potential system vulnerabilities and performance bottlenecks, allowing for preemptive optimization. This proactive approach ensures that systems are well-prepared for live production scenarios, enhancing overall efficiency and uptime.

    In essence, the WCAR module not only preserves detailed and valuable traffic data but also empowers users with the tools to analyze and refine their systems through accurate simulation, paving the way for robust and resilient system architectures.

    Workload Capture and Replay

    Overview

    The WCAR filter (module wcar) captures client traffic and stores it in a replayable format.

    WCAR is designed to capture traffic on a production MaxScale instance. The captured data can then be used as a reproducible way of generating client traffic without having to write application-specific traffic generators.

    The captured workloads can be used:to verify that upgrades of MariaDB behave as expected. to repeatedly measure effects of configuration changes, which is useful for database tuning.

    • investigate why certain scenarios take longer then expected, as a kind of SQL debugging tool.

    Prerequisites

    • Both the capture MaxScale and replay MaxScale servers must use the same linux distribution and CPU architecture. For example, if the capture was taken on an x86_64 RHEL 8 instance, the replay should also happen on an x86_64 RHEL 8 instance. Captured workloads are however usually compatible across different linux distributions that use the same CPU architecture.

    • The capture MariaDB instance must have binlogging enabled (log-bin=1)

    Capture

    Quick start

    Workload capture can be used without definitions in static configuration files and without a MaxScale restart.

    If you have an existing routing service named, e.g., RWS-Router in your configuration you can attach a capture filter to it dynamically:

    You can then start a capture with

    If limiting options were given the capture will stop automatically when one of the limits is triggered. You can also stop the capture at any time with:

    See to see how the captured files are used.

    When capture is no longer needed you can remove it with:

    File based configuration

    Define a capture filter by adding the following configuration object and add it to each service whose traffic is to be captured. The traffic from all services that use the filter will be combined so only use the filter in services that point to the same database cluster.

    Example configuration file

    Here is an example configuration for capturing from a single MariaDB server, where capture starts when MaxScale starts and stops when MaxScale is stopped (start_capture=true). MaxScale listens on port 4006 and connects to MariaDB on port 3306.

    Capturing Traffic

    This section explains how capture is done with configuration value start_capture=true.

    Two things are needed to replay a workload: the client traffic that's captured by MaxScale and a backup of the database that is used to initialize the replay server. The backup should be taken from the point in time where the capture starts and the simplest way to achieve this is to take a logical backup by doing the following.

    • Stop MaxScale

    • Take a backup of the database with mariadb-dump --all-databases --system=all

    • Start MaxScale

    Once MaxScale has been started, the captured traffic will be written to files in/var/lib/maxscale/wcar/<name> where <name> is the name of the filter (CAPTURE_FLTR in the examples).

    Each capture will generate a number of files named NAME_YYYY-MM-DD_HHMMSS.SUFFIX where NAME is the capture name (defaults to capture), YYYY-MM-DD is the date and HHMMSS is the time and the SUFFIX is one of .cx, .ex or.tx. For example, a capture started on the 18th of April 2024 at 10:26:11 would generate a file named capture_2024-04-18_102611.cx.

    Stopping the Capture

    To stop the capture, simply stop MaxScale, or issue the command:

    where "CAPTURE_FLTR" is the name given to the filter as in the example configuration above.

    To disable capturing altogether, remove the capture filter from the configuration and remove it from all services that it was added to. Restart MaxScale.

    If the replay is to take place on another server, the results can be collected easily from /var/lib/maxscale/wcar/ with the following command.

    Once the capture tarball has been generated, copy it to the replay server. You might then want to delete the directories on the capture server from /var/lib/maxscale/wcar/* to save space (and not copy them again later).

    Commands

    Each of the commands can be called with the following syntax.

    The <filter> is the name of the filter instance. In the example configuration, the value is CAPTURE_FLTR. The [options] is a list of optional arguments that the command might expect.

    start <filter> [options]

    Starts a new capture. Issuing a start command will stop any ongoing capture.

    The start command supports optional key-value pairs. If the values are also defined in the configuration file the command line options have priority. The supported keys are:

    • prefix The prefix added to capture files. The default value is capture.

    • duration Limit capture to this duration. See also configuration file value .

    • size Limit capture to approximately this many bytes in the file system. See also configuration file value .

    The start command options are not persistent, and only apply to the capture that was thus started.

    For example, starting a capture with the below command would create a capture file named Scenario1_2024-04-18_102605.cx and limit the file system usage to approximately 10GiB. If capture_duration was defined in the configuration file it would also be used.

    If both duration and size are specified, the one that triggers first, stops the capture.

    Running the same command again, but without size=10G, the capture_size used would be that defined in the configuration file or no limit if there was no such definition.

    stop <filter>

    Stops the currently active capture if one is in progress.

    Replay

    Installation

    Install the required packages on the MaxScale server where the replay is to be done. An additional dependency that must be manually installed is Python, version 3.7 or newer. On most linux distributions a new enough version is available as the default Python interpreter.

    The replay consists of restoring the database to the point in time where the capture was started. Start by restoring the replay database to this state. Once the database has been restored from the backup, copy the capture files over to the replay MaxScale server.

    Preparing the Replay MariaDB Database

    Full Restore

    Start by restoring the database from the backup to put it at the point in time where the capture was started. The GTID position of the first commit within the capture can be seen in the output of the summary command:

    If the captured data has not been transformed to replay format yet, the command will perform the transformation before displaying the summary.

    Run maxplayer --help to see the command line options. The help output is also shown at the end of this file.

    The replay also requires a user account using which the captured traffic is replayed. This user must have access to all the tables in question. In practice the simplest way to do this for testing is to create the user as follows:

    Restore for read-only Replay

    For captures that are intended for read-only Replay, it may not be as important that the servers to be tested against are in the exact GTID the capture server was when capture started. In fact, it may be advantageous that the servers are at the state after the capture finished.

    On the other hand, Replay also supports write-only. Following the Full Restore procedure above and then running a write-only Replay prepares the replay server(s) for easily running read-only multiple times. This way of running read-only may, for example, be used when fine tuning server settings.

    Replaying the Capture

    When replay is first done, the capture files will be transformed in-place. Transform can be run separately as well. Depending on the size and structure of the capture file, Transform can use up to twice the space of the capture.ex file. The files with extension .ex contain most of the captured data (events).

    Start by copying the replay file tarball created earlier (captures.tar.gz) to the replay MaxScale server and copy it to a directory of your choice (here called/path/to/capture-dir). Then extract the files.

    After this, replay the workload against the baseline MariaDB setup:

    Once the baseline replay results have been generated, run the replay again but this time against the new MariaDB setup to which the baseline is compared to:

    After both replays have been completed, the results can be post-processed and visualized.

    Visualizing

    The results of the captured replay must first be post-processed into summaries that the visualization will then use. First, the canonicals.csv file must be generated that is needed in the post-processing:

    After that, the baseline and comparison replay results can be post-processed into summaries using the maxpostprocess command:

    The visualization itself is done with the maxvisualize program. The visualization will open up a browser window to show the visualization. If no browser opens up, the visualization URL is also printed into the command line which by default should be http://localhost:8866/.

    Settings

    capture_dir

    • Type: path

    • Default: /var/lib/maxscale/wcar/

    • Mandatory: No

    • Dynamic: No

    Directory under which capture directories are stored. Each capture directory has the name of the filter. In the examples above the name "CAPTURE_FLTR" was used.

    start_capture

    • Type: boolean

    • Default: false

    • Mandatory: No

    • Dynamic: No

    Start capture when maxscale starts.

    capture_duration

    • Type:

    • Default: 0s

    • Maximum: Unlimited in MaxScale, 5min in MaxScale Lite.

    • Mandatory: No

    Limit capture to this duration. If set to zero there is no limit.

    capture_size

    • Type:

    • Default: 0

    • Maximum: Unlimited in MaxScale, 10MB in MaxScale Lite.

    • Mandatory: No

    Limit capture to approximately this many bytes in the file system. If set to zero there is no limit.

    maxplayer command line options

    Limitations

    • KILL commands do not work correctly during replay and may kill the wrong session ()

    • COM_STMT_BULK_EXECUTE is not captured ()

    • COM_STMT_EXECUTE that uses a cursor is replayed without a cursor ()

    • For MyISAM and Aria tables, this will cause the table level lock to be held for a shorter time.

    This page is licensed: CC BY-SA / Gnu FDL

    Quick start

  • File based configuration

  • Example configuration file

  • Capturing Traffic

  • Stopping the Capture

  • Commands

    • start [options]

    • stop

  • Replay

    • Installation

    • Preparing the Replay MariaDB Database

      • Full Restore

  • Visualizing

  • Settings

    • capture_dir

    • start_capture

    • capture_duration

  • maxplayer command line options

  • Limitations

  • Dynamic: No
    Dynamic: No

    Execution of a COM_STMT_SEND_LONG_DATA will not work (MXS-5060)

  • The capture files are not necessarily compatible with different linux distributions and CPU architectures than the original capture server has. Different combinations will require further testing, and once done, this document will be updated.

  • Workload Capture and Replay
    Overview
    Prerequisites
    Capture
    Replay
    'capture_duration'
    'capture_size'
    duration
    size
    MXS-5056
    MXS-5057
    MXS-5059

    MaxScale 25.01 Regex Filter

    Regex Filter

    maxctrl create filter CAPTURE_FLTR wcar
    maxctrl link service RWS-Router CAPTURE_FLTR
    maxctrl call command wcar start CAPTURE_FLTR <options>
    maxctrl call command wcar stop CAPTURE_FLTR
    maxctrl unlink service RWS-Router CAPTURE_FLTR
    maxctrl destroy filter CAPTURE_FLTR
    [CAPTURE_FLTR]
    type=filter
    module=wcar
    capture_duration=1h # Limit capture duration to one hour
    capture_size=1Gi    # Limit capture size to 1GiB
    start_capture=true  # Start capturing immediately after starting MaxScale
    [server1]
    type=server
    address=127.0.0.1
    port=3306
    
    [MariaDB-Monitor]
    type=monitor
    module=mariadbmon
    servers=server1
    user=maxuser
    password=maxpwd
    
    [CAPTURE_FLTR]
    type=filter
    module=wcar
    capture_duration=1h     # Limit capture duration to one hour
    capture_size=1Gi        # Limit capture size to 1GiB
    start_capture=true      # Start capturing immediately after starting MaxScale
    
    [RWS-Router]
    type=service
    router=readwritesplit
    cluster=MariaDB-Monitor
    user=maxuser
    password=maxpwd
    filters=CAPTURE_FLTR
    
    [RWS-Listener]
    type=listener
    service=RWS-Router
    protocol=MariaDBClient
    port=4006
    maxctrl call command wcar stop CAPTURE_FLTR
    tar -caf captures.tar.gz -C /var/lib/maxscale wcar
    maxctrl call command wcar <command> <filter> [options]
    maxctrl call command wcar start CAPTURE_FLTR prefix=Scenario1 size=10G
    maxctrl call command wcar stop CAPTURE_FLTR
    maxplayer summary /path/to/capture.cx
    CREATE USER 'maxreplay'@'%' IDENTIFIED BY 'replay-pw';
    GRANT ALL ON *.* TO 'maxreplay'@'%';
    cd /path/to/capture-dir
    tar -xaf captures.tar.gz
    maxplayer replay --user maxreplay --password replay-pw --host <host:port> --output baseline-result.csv /path/to/capture.cx
    maxplayer replay --user maxreplay --password replay-pw --host <host:port> --output comparison-result.csv /path/to/capture.cx
    maxplayer canonicals /path/to/capture.cx > canonicals.csv
    maxpostprocess canonicals.csv baseline-result.csv -o baseline-summary.json
    maxpostprocess canonicals.csv comparison-result.csv -o comparison-summary.json
    maxvisualize baseline-summary.json comparison-summary.json
    maxplayer -u user -p pwd --speed 1.5 -i 5s -o baseline.csv capture_2024-09-06_090002.cx --help
    Usage: maxplayer [OPTION]... [COMMAND] FILE
    
    Commands: (default: replay)
    summary    Show a summary of the capture.
    replay     Replay the capture.
    convert    Converts the input file (either .cx or .rx) to a replay file (.rx or .csv).
    canonicals List the canonical forms of the captured SQL as CSV.
    dump-data  Dump capture data as SQL.
    show       Show the SQL of one or more events.
    
    Options:
    --user          User name for login to the replay server.
    -u              This version does not support using the actual user names
                    that were used during capture.
    
    --password      Only clear text passwords are supported as of yet.
    -p
    
    --host          The address of the replay server in <IP>:<port> format.
    -h              E.g. 127.0.0.1:4006
    
    --output        The name of the output file: e.g. baseline.csv.
    -o
    
    --report        Periodically report statistics of ongoing operations.
    -r              The option takes a duration, such as 10s.
    
    --report-file   The --report option by default writes to stdout.
    -R              Provide the name of the file to write to. The file will
                    be truncated every time it is written to, allowing for a
                    simple status window by running 'watch cat <path-to-file>'
                    in a terminal.
    
    --speed         The value is a multiplier. 2.5 is 2.5x speed and 0.5 is half speed.
    -s              A value of zero means no limit, or replay as fast as possible.
                    A multiplier of 2.5 might not have any effect as the actual time spent
                    depends on many factors, such as the captured volume and replay server.
    
    --idle-wait     Relates to playback speed, and can be used together with --speed.
    -i              During capture there can be long delays where there is no traffic.
                    One hour of no capture traffic would mean replay waits for one hour.
                    idle-wait allows to move simulation time forwards when such gaps
                    occure. A 'gap' starts when all prior queries have fully executed.
                    --idle-wait takes a duration value. A negative value turns the feature off,
                                i.e. the one hour wait would happen.
                    --idle-wait 0s means time moves to the event start-time immediately
                                when a gap is detected, i.e., all gaps are skipped over.
                    --idle-wait 10s means time moves to the event start-time 10 seconds
                                (wall time) after the gap was detected. Shorter
                                gaps than 10 seconds will thus be fully waited for.
                    --idle-wait has a default value of 1 second.
                    Examples: 1h, 60m, 3600s, 3600000ms, which all define the same duration.
    
    --query-filter  Options: none, write-only, read-only. Default: none.
    -f              Replay can optionally apply only writes or only reads. This option is useful
                    once the databases to be tested have been prepared (see full documentation)
                    and optionally either a write-only run, or a full replay has been run.
                    Now multiple read-only runs against the server(s) are simple as no further
                    data syncronization is needed.
                    Note that this mode has its limitations as the query results may
                    be very different than what they were during capture.
    
    --analyze       Enabling this option will track the server Rows_read statistic for each query.
    -A              This will slow down the overall replay time. The query time measurements
                    are still valid, but currently this option should only be used when
                    it is of real value to know how many rows the server read for each query.
    
    --verbose       Verbose output. The option can be repeated for more verbosity: -vvv
    -v
    
    --version       Display the version number and copyrights.
    -V
    
    input file:       capture_2024-09-06_090002.cx
    -h --help         true
    -u --user         user
    -p --password     pwd
    -H --host         127.1.1.0:3306
    -o --output       baseline.csv
    -r --report       0ns
    -R --report-file
    -s --speed        1.5
    -i --idle-wait    5s
    -f --query-filter none
    -A --analyze      false
    -v --verbose      0
    -V --version      0.2
    Restore for read-only Replay
    Replaying the Capture
    capture_size

    Configuration

  • Settings

    • match

    • options

    • replace

  • Examples

    • Example 1 - Replace MySQL 5.1 create table syntax with that for later versions

  • Overview

    The Regex filter is a filter module for MariaDB MaxScale that is able to rewrite query content using regular expression matches and text substitution. The regular expressions use the PCRE2 syntax.

    PCRE2 library uses a different syntax than POSIX to refer to capture groups in the replacement string. The main difference is the usage of the dollar character instead of the backslash character for references e.g. $1 instead of\1. For more details about the replacement string differences, please read the Creating a new string with substitutions chapter in the PCRE2 manual.

    Configuration

    The following demonstrates a minimal configuration.

    Settings

    The Regex filter has two mandatory parameters: match and replace.

    match

    • Type: regex

    • Mandatory: Yes

    • Dynamic: Yes

    Defines the text in the SQL statements that is replaced.

    options

    • Type: enum

    • Mandatory: No

    • Dynamic: Yes

    • Values: ignorecase, case, extended

    • Default: ignorecase

    The options-parameter affects how the patterns are compiled as usual.

    replace

    • Type: string

    • Mandatory: Yes

    • Dynamic: Yes

    This is the text that should replace the part of the SQL-query matching the pattern defined in match.

    source

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The optional source parameter defines an address that is used to match against the address from which the client connection to MariaDB MaxScale originates. Only sessions that originate from this address will have the match and replacement applied to them.

    user

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The optional user parameter defines a username that is used to match against the user from which the client connection to MariaDB MaxScale originates. Only sessions that are connected using this username will have the match and replacement applied to them.

    log_file

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The optional log_file parameter defines a log file in which the filter writes all queries that are not matched and matching queries with their replacement queries. All sessions will log to this file so this should only be used for diagnostic purposes.

    log_trace

    • Type: string

    • Mandatory: No

    • Dynamic: Yes

    • Default: None

    The optional log_trace parameter toggles the logging of non-matching and matching queries with their replacements into the log file on the info level. This is the preferred method of diagnosing the matching of queries since the log level can be changed at runtime. For more details about logging levels and session specific logging, please read the Configuration Guide.

    Examples

    Example 1 - Replace MySQL 5.1 create table syntax with that for later versions

    MySQL 5.1 used the parameter TYPE = to set the storage engine that should be used for a table. In later versions this changed to be ENGINE =. Imagine you have an application that you cannot change for some reason, but you wish to migrate to a newer version of MySQL. The regexfilter can be used to transform the create table statements into the form that could be used by MySQL 5.5

    This page is licensed: CC BY-SA / Gnu FDL

    Regex Filter
    Overview
    [MyRegexFilter]
    type=filter
    module=regexfilter
    match=some string
    replace=replacement string
    
    [MyService]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=MyRegexfilter
    match=TYPE[ ]*=
    options=case
    replace=ENGINE =
    source=127.0.0.1
    user=john
    log_file=/tmp/regexfilter.log
    log_trace=true
    [CreateTableFilter]
    type=filter
    module=regexfilter
    options=ignorecase
    match=TYPE\s*=
    replace=ENGINE=
    
    [MyService]
    type=service
    router=readconnroute
    servers=server1
    user=myuser
    password=mypasswd
    filters=CreateTableFilter
    source
    user
    log_file
    log_trace

    MaxScale 25.01 Cache

    Cache

    This filter was introduced in MariaDB MaxScale 2.1.

    • Cache

    Overview

    From MaxScale version 2.2.11 onwards, the cache filter is no longer considered experimental. The following changes to the default behaviour have also been made:

    • The default value of cached_data is now thread_specific (used to beshared).

    • The default value of selects is now assume_cacheable (used to beverify_cacheable).

    The cache filter is a simple cache that is capable of caching the result of SELECTs, so that subsequent identical SELECTs are served directly by MaxScale, without the queries being routed to any server.

    By default the cache will be used and populated in the following circumstances:

    • There is no explicit transaction active, that is, autocommit is used,

    • there is an explicitly read-only transaction (that is,START TRANSACTION READ ONLY) active, or

    • there is a transaction active and no statement that modifies the database has been performed.

    In practice, the last bullet point basically means that if a transaction has been started with BEGIN, START TRANSACTION or START TRANSACTION READ WRITE, then the cache will be used and populated until the first UPDATE,INSERT or DELETE statement is encountered.

    That is, in default mode the cache effectively causes the system to behave as if the isolation level would be READ COMMITTED, irrespective of what the isolation level of the backends actually is.

    The default behaviour can be altered using the configuration parameter .

    By default it is assumed that all SELECT statements are cacheable, which means that also statements like SELECT LOCALTIME are cached. Please check for how to change the default behaviour.

    Limitations

    All of these limitations may be addressed in forthcoming releases.

    Prepared Statements

    Resultsets of prepared statements are not cached.

    Multi-statements

    Multi-statements are always sent to the backend and their result isnot cached.

    Security

    The cache is not aware of grants.

    The implication is that unless the cache has been explicitly configured who the caching should apply to, the presence of the cache may provide a user with access to data he should not have access to.

    Please read the section for more detailed information.

    However, from 2.5 onwards it is possible to configure the cache to cache the data of each user separately, which effectively means that there can be no unintended sharing. Please see for how to change the default behaviour.

    information_schema

    When is enabled, SELECTs targeting tables in information_schema are not cached. The reason is that as the content of the tables changes as the side-effect of something else, the cache would not know when to invalidate the cache-entries.

    Invalidation

    Since MaxScale 2.5, the cache is capable of invalidating entries in the cache when a modification (UPDATE, INSERT or DELETE) that may affect those entries is made.

    The cache invalidation works on the table-level, that is, a modification made to a particular table will cause all cache entries that refer to that table to be invalidated, irrespective of whether the modification actually has an impact on the cache entries or not. For instance, suppose the result of the following SELECT has been cached

    An insert like

    will cause the cache entry containing the result of that SELECT to be invalidated even if the INSERT actually does not affect it. Please see for how to enable the invalidation.

    When invalidation has been enabled MaxScale must be able to completely parse a SELECT statement for its results to be stored in the cache. The reason is that in order to be able to invalidate cache entries, MaxScale must know what tables a SELECT statement depends upon. Consequently, if (and only if) invalidation has been enabled and MaxScale fails to parse a statement, the result of that particular statement will not be cached.

    When invalidation has been enabled, MaxScale will also parse all UPDATE, INSERT and DELETE statements, in order to find out what tables are modified. If that parsing fails, MaxScale will by default clear the entire cache. The reason is that unless MaxScale can completely parse the statement it cannot know what tables are modified and hence not what cache entries should be invalidated. Consequently, to prevent stale data from being returned, the entire cache is cleared. The default behaviour can be changed using the configuration parameter .

    Note that what threading approach is used has a big impact on the invalidation. Please see for how the threading approach affects the invalidation.

    Note also that since the invalidation may not, depending on how the cache has been configured, be visible to all sessions of all users, it is still important to configure a reasonable and TTL.

    Best Efforts

    The invalidation offered by the MaxScale cache can be said to be of_best efforts_ quality. The reason is that in order to ensure that the cache in all circumstances reflects the state in the actual database, would require that the operations involving the cache and the MariaDB server are synchronized, which would cause an unacceptable overhead.

    What best efforts means in this context is best illustrated using an example.

    Suppose a client executes the statement SELECT * FROM tbl and that the result is cached. Next time that or any other client executes the same statement, the result is returned from the cache and the MariaDB server will not be accessed at all.

    If a client now executes the statement INSERT INTO tbl VALUES (...), the cached value for the SELECT statement above and all other statements that are dependent upon tbl will be invalidated. That is, the next time someone executes the statement SELECT * FROM tbl the result will again be fetched from the MariaDB server and stored to the cache.

    However, suppose some client executes the statement SELECT COUNT(*) FROM tbl at the same time someone else executes the INSERT ... statement. A possible chain of events is as follows:

    That is, the SELECT is performed in the database server before theINSERT. However, since the timelines are proceeding independently of each other, the events may be re-ordered as far as the cache is concerned.

    That is, the cached value for SELECT COUNT(*) FROM tbl will reflect the situation before the insert and will thus not be correct.

    The stale result will be returned until the value has reached its time-to-live or its invalidation is caused by some update operation.

    Configuration

    The cache is simple to add to any existing service. However, some experimentation may be required in order to find the configuration settings that provide the maximum benefit.

    Each configured cache filter uses a storage of its own. That is, if there are two services, each configured with a specific cache filter, then, even if queries target the very same servers the cached data will not be shared.

    Two services can use the same cache filter, but then either the services should use the very same servers or a completely different set of servers, where the used table names are different. Otherwise there can be unintended sharing.

    Settings

    The cache filter has no mandatory parameters but a range of optional ones. Note that it is advisable to specify max_size to prevent the cache from using up all memory there is, in case there is very little overlap among the queries.

    storage

    • Type: string

    • Mandatory: No

    • Dynamic: No

    • Default: storage_inmemory

    The name of the module that provides the storage for the cache. That module will be loaded and provided with the value of storage_options as argument. For instance:

    See for what storage modules are available.

    storage_options

    • Type: string

    • Mandatory: No

    • Dynamic: No

    • Default:

    NOTE Deprecated in 23.02.

    A string that is provided verbatim to the storage module specified in storage, when the module is loaded. Note that the needed arguments and their format depend upon the specific module.

    From 23.02 onwards, the storage module configuration should be provided using nested parameters.

    hard_ttl

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: 0s (no limit)

    Hard time to live; the maximum amount of time the cached result is used before it is discarded and the result is fetched from the backend (and cached). See also .

    soft_ttl

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: 0s (no limit)

    Soft time to live; the amount of time - in seconds - the cached result is used before it is refreshed from the server. When soft_ttl has passed, the result will be refreshed when the first client requests the value.

    However, as long as has not passed, all other clients requesting the same value will use the result from the cache while it is being fetched from the backend. That is, as long as soft_ttl but not hard_ttl has passed, even if several clients request the same value at the same time, there will be just one request to the backend.

    If the value of soft_ttl is larger than hard_ttl it will be adjusted down to the same value.

    max_resultset_rows

    • Type: count

    • Mandatory: No

    • Dynamic: No

    • Default: 0 (no limit)

    Specifies the maximum number of rows a resultset can have in order to be stored in the cache. A resultset larger than this, will not be stored.

    max_resultset_size

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: 0 (no limit)

    Specifies the maximum size of a resultset, for it to be stored in the cache. A resultset larger than this, will not be stored. The size can be specified as described .

    Note that the value of max_resultset_size should not be larger than the value of max_size.

    max_count

    • Type: count

    • Mandatory: No

    • Dynamic: No

    • Default: 0 (no limit)

    The maximum number of items the cache may contain. If the limit has been reached and a new item should be stored, then an older item will be evicted.

    Note that if cached_data is thread_specific then this limit will be applied to each cache separately. That is, if a thread specific cache is used, then the total number of cached items is #threads * the value of max_count.

    max_size

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: 0 (no limit)

    The maximum size the cache may occupy. If the limit has been reached and a new item should be stored, then some older item(s) will be evicted to make space.

    Note that if cached_data is thread_specific then this limit will be applied to each cache separately. That is, if a thread specific cache is used, then the total size is #threads * the value of max_size.

    rules

    • Type: path

    • Mandatory: No

    • Dynamic: Yes

    • Default: "" (no rules)

    Specifies the path of the file where the caching rules are stored. A relative path is interpreted relative to the data directory of MariaDB MaxScale.

    Note that the rules will be reloaded, and applied if different, every time a dynamic configuration change is made. Thus, to cause a reloading of the rules, alter the rules parameter to the same value it has.

    cached_data

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Values: shared, thread_specific

    An enumeration option specifying how data is shared between threads. The allowed values are:

    • shared: The cached data is shared between threads. On the one hand it implies that there will be synchronization between threads, on the other hand that all threads will use data fetched by any thread.

    • thread_specific: The cached data is specific to a thread. On the one hand it implies that no synchronization is needed between threads, on the other hand that the very same data may be fetched and stored multiple times.

    Default is thread_specific. See max_count and max_size what implication changing this setting to shared has.

    selects

    • Type:

    • Mandatory: No

    • Dynamic: Yes

    • Values: assume_cacheable, verify_cacheable

    An enumeration option specifying what approach the cache should take with respect to SELECT statements. The allowed values are:

    • assume_cacheable: The cache can assume that all SELECT statements, without exceptions, are cacheable.

    • verify_cacheable: The cache can not assume that all SELECT statements are cacheable, but must verify that.

    Default is assume_cacheable. In this case, all SELECT statements are assumed to be cacheable and will be parsed only if some specific rule requires that.

    If verify_cacheable is specified, then all SELECT statements will be parsed and only those that are safe for caching - e.g. do not call any non-cacheable functions or access any non-cacheable variables - will be subject to caching.

    If verify_cacheable has been specified, the cache will not be used in the following circumstances:

    • The SELECT uses any of the following functions: BENCHMARK,CONNECTION_ID, CONVERT_TZ, CURDATE, CURRENT_DATE, CURRENT_TIMESTAMP,CURTIME, DATABASE, ENCRYPT, FOUND_ROWS, GET_LOCK

    Note that parsing all SELECT statements carries a performance cost. Please read for more details.

    cache_in_transactions

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Values: never, read_only_transactions, all_transactions

    An enumeration option specifying how the cache should behave when there are active transactions:

    • never: When there is an active transaction, no data will be returned from the cache, but all requests will always be sent to the backend. The cache will be populated inside explicitly read-only transactions. Inside transactions that are not explicitly read-only, the cache will be populated until the first non-SELECT statement.

    • read_only_transactions: The cache will be used and populated inside explicitly read-only transactions. Inside transactions that are not explicitly read-only, the cache will be populated, but not used until the first non-SELECT statement.

    • all_transactions: The cache will be used and populated inside explicitly read-only transactions. Inside transactions that are not explicitly read-only, the cache will be used and populated until the first non-SELECT statement.

    Default is all_transactions.

    The values read_only_transactions and all_transactions have roughly the same effect as changing the isolation level of the backend to read_committed.

    debug

    • Type: number

    • Mandatory: No

    • Dynamic: Yes

    • Default: 0

    An integer value, using which the level of debug logging made by the cache can be controlled. The value is actually a bitfield with different bits denoting different logging.

    • 0 (0b00000) No logging is made.

    • 1 (0b00001) A matching rule is logged.

    • 2 (0b00010

    Default is 0. To log everything, give debug a value of 31.

    enabled

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: true

    Specifies whether the cache is initially enabled or disabled.

    The value affects the initial state of the MaxScale user variables using which the behaviour of the cache can be modified at runtime. Please see for details.

    invalidate

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Values: never, current

    An enumeration option specifying how the cache should invalidate cache entries.

    The effect of current depends upon the value of cached_data. If the value is shared, that is, all threads share the same cache, then the effect of an invalidation is immediately visible to all sessions, as there is just one cache. However, if the value is thread_specific, then an invalidation will affect only the cache that the session happens to be using.

    If it is important and sufficient that an application immediately sees a change that it itself has caused, then a combination of invalidate=current and cached_data=thread_specific can be used.

    If it is important that an application immediately sees all changes, irrespective of who has caused them, then a combination of invalidate=current and cached_data=shared must be used.

    clear_cache_on_parse_errors

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: true

    This boolean option specifies how the cache should behave in case of parsing errors when invalidation has been enabled.

    • true: If the cache fails to parse an UPDATE/INSERT/DELETE statement then all cached data will be cleared.

    • false: A failure to parse an UPDATE/INSERT/DELETE statement is ignored and no invalidation will take place due that statement.

    The default value is true.

    Changing the value to false may mean that stale data is returned from the cache, if an UPDATE/INSERT/DELETE cannot be parsed and the statement affects entries in the cache.

    users

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Values: mixed, isolated

    An enumeration option specifying how the cache should cache data for different users.

    Note that if isolated has been specified, then each user will conceptually have a cache of his own, which is populated independently from each other. That is, if two users make the same query, then the data will be fetched twice and also stored twice. So, a isolated cache will in general use more memory and cause more traffic to the backend compared to a mixed cache.

    timeout

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: 5s

    The timeout used when performing operations to distributed storages such as redis or memcached.

    Runtime Configuration

    The cache filter can be configured at runtime by executing SQL commands. If there is more than one cache filter in a service, only the first cache filter will be able to process the variables. The remaining filters will not see them and thus configuring them at runtime is not possible.

    @maxscale.cache.populate

    Using the variable @maxscale.cache.populate it is possible to specify at runtime whether the cache should be populated or not. Its initial value is the value of the configuration parameter enabled. That is, by default the value is true.

    The purpose of this variable is make it possible for an application to decide statement by statement whether the cache should be populated.

    In the example above, the first SELECT will always be sent to the server and the result will be cached, provided the actual cache rules specifies that it should be. The second SELECT may be served from the cache, depending on the value of @maxscale.cache.use (and the cache rules).

    The value of @maxscale.cache.populate can be queried

    but only after it has been explicitly set once.

    @maxscale.cache.use

    Using the variable @maxscale.cache.use it is possible to specify at runtime whether the cache should be used or not. Its initial value is the value of the configuration parameter enabled. That is, by default the value is true.

    The purpose of this variable is make it possible for an application to decide statement by statement whether the cache should be used.

    The first SELECT will be served from the cache, providing the rules specify that the statement should be cached, the cache indeed contains the result and the date is not stale (as specified by the TTL).

    If the data is stale, the SELECT will be sent to the server and the cache entry will be updated, irrespective of the value of@maxscale.cache.populate.

    If @maxscale.cache.use is true but the result is not found in the cache, and the result is subsequently fetched from the server, the result will not be added to the cache, unless@maxscale.cache.populate is also true.

    The value of @maxscale.cache.use can be queried

    but only after it has explicitly been set once.

    @maxscale.cache.soft_ttl

    Using the variable @maxscale.cache.soft_ttl it is possible at runtime to specify in seconds what soft ttl should be applied. Its initial value is the value of the configuration parameter soft_ttl. That is, by default the value is 0.

    The purpose of this variable is make it possible for an application to decide statement by statement what soft ttl should be applied.

    When data is SELECTed from the unimportant table unimportant, the data will be returned from the cache provided it is no older than 10 minutes, but when data is SELECTed from the important table important, the data will be returned from the cache provided it is no older than 1 minute.

    Note that @maxscale.cache.hard_ttl overrules @maxscale.cache.soft_ttl in the sense that if the former is less that the latter, then soft ttl will, when used, be adjusted down to the value of hard ttl.

    The value of @maxscale.cache.soft_ttl can be queried

    but only after it has explicitly been set once.

    @maxscale.cache.hard_ttl

    Using the variable @maxscale.cache.hard_ttl it is possible at runtime to specify in seconds what hard ttl should be applied. Its initial value is the value of the configuration parameter hard_ttl. That is, by default the value is 0.

    The purpose of this variable is make it possible for an application to decide statement by statement what hard ttl should be applied.

    Note that as @maxscale.cache.hard_ttl overrules @maxscale.cache.soft_ttl, is is important to ensure that the former is at least as large as the latter and for best overall performance that it is larger.

    The value of @maxscale.cache.hard_ttl can be queried

    but only after it has explicitly been set once.

    Client Driven Caching

    With @maxscale.cache.populate and @maxscale.cache.use is it possible to make the caching completely client driven.

    Provide no rules file, which means that all SELECT statements are subject to caching and that all users receive data from the cache. Set the startup mode of the cache to disabled.

    Now, in order to mark statements that should be cached, set@maxscale.cache.populate to true, and perform those SELECTs.

    Note that those SELECTs must return something in order for the statement to be marked for caching.

    After this, the value of @maxscale.cache.use will decide whether or not the cache is considered.

    With @maxscale.cache.use being true, the cache is considered and the result returned from there, if not stale. If it is stale, the result is fetched from the server and the cached entry is updated.

    By setting a very long TTL it is possible to prevent the cache from ever considering an entry to be stale and instead manually cause the cache to be updated when needed.

    Threads, Users and Invalidation

    What caching approach is used and how different users are treated has a significant impact on the behaviour of the cache. In the following the implication of different combinations is explained.

    cached_data/users
    mixed
    isolated

    Invalidation

    Invalidation takes place only in the current cache, so how visible the invalidation is, depends upon the configuration value ofcached_data.

    cached_data=thread_specific

    The invalidation is visible only to the sessions that are handled by the same worker thread where the invalidation occurred. Sessions of the same or other users that are handled by different worker threads will not see the new value before the TTL causes the value to be refreshed.

    cache_data=shared

    The invalidation is immediately visible to all sessions of all users.

    Rules

    The caching rules are expressed as a JSON object or as an array of JSON objects.

    There are two decisions to be made regarding the caching; in what circumstances should data be stored to the cache and in what circumstances should the data in the cache be used.

    Expressed in JSON this looks as follows

    or, in case an array is used, as

    The store field specifies in what circumstances data should be stored to the cache and the use field specifies in what circumstances the data in the cache should be used. In both cases, the value is a JSON array containing objects.

    If an array of rule objects is specified, then, when looking for a rule that matches, the store field of each object are evaluated in sequential order until a match is found. Then, the use field of that object is used when deciding whether data in the cache should be used.

    When to Store

    By default, if no rules file have been provided or if the store field is missing from the object, the results of all queries will be stored to the cache, subject to max_resultset_rows and max_resultset_size cache filter parameters.

    By providing a store field in the JSON object, the decision whether to store the result of a particular query to the cache can be controlled in a more detailed manner. The decision to cache the results of a query can depend upon

    • the database,

    • the table,

    • the column, or

    • the query itself.

    Each entry in the store array is an object containing three fields,

    where,

    • the attribute can be database, table, column or query,

    • the op can be =, !=, like or unlike, and

    If op is = or != then value is used as a string; if it is like or unlike, then value is interpreted as a pcre2 regular expression. Note though that if attribute is database, table or column, then the string is interpreted as a name, where a dot . denotes qualification or scoping.

    The objects in the store array are processed in order. If the result of a comparison is true, no further processing will be made and the result of the query in question will be stored to the cache.

    If the result of the comparison is false, then the next object is processed. The process continues until the array is exhausted. If there is no match, then the result of the query is not stored to the cache.

    Note that as the query itself is used as the key, although the following queries

    and

    target the same table and produce the same results, they will be cached separately. The same holds for queries like

    and

    as well. Although they conceptually are identical, there will be two cache entries.

    Note that if a column has been specified in a rule, then a statement will match irrespective of where that particular column appears. For instance, if a rule specifies that the result of statements referring to the column a should be cached, then the following statement will match

    and so will

    Qualified Names

    When using = or != in the rule object in conjunction with database,table and column, the provided string is interpreted as a name, that is, dot (.) denotes qualification or scope.

    In practice that means that if attribute is database then value may not contain a dot, if attribute is table then value may contain one dot, used for separating the database and table names respectively, and if attribute is column then value may contain one or two dots, used for separating table and column names, or database, table and column names.

    Note that if a qualified name is used as a value, then all parts of the name must be available for a match. Currently Maria DB MaxScale may not always be capable of deducing in what table a particular column is. If that is the case, then a value like tbl.field may not necessarily be a match even if the field is field and the table actually is tbl.

    Implication of the default database

    If the rules concerns the database, then only if the statement refers to no specific database, will the default database be considered.

    Regexp Matching

    The string used for matching the regular expression contains as much information as there is available. For instance, in a situation like

    the string matched against the regular expression will be somedb.tbl.fld.

    Examples

    Cache all queries targeting a particular database.

    Cache all queries not targeting a particular table

    That will exclude queries targeting table tbl1 irrespective of which database it is in. To exclude a table in a particular database, specify the table name using a qualified name.

    Cache all queries containing a WHERE clause

    Note that this will actually cause all queries that contain WHERE anywhere, to be cached.

    When to Use

    By default, if no rules file have been provided or if the use field is missing from the object, all users may be returned data from the cache.

    By providing a use field in the JSON object, the decision whether to use data from the cache can be controlled in a more detailed manner. The decision to use data from the cache can depend upon

    • the user.

    Each entry in the use array is an object containing three fields,

    where,

    • the attribute can be user,

    • the op can be =, !=, like or unlike, and

    • the value a string.

    If op is = or != then value is interpreted as a MariaDB account string, that is, % means indicates wildcard, but if op is like orunlike it is simply assumed value is a pcre2 regular expression.

    For instance, the following are equivalent:

    Note that if op is = or != then the usual assumptions apply, that is, a value of bob is equivalent with 'bob'@'%'. If like or unlike is used, then no assumptions apply, but the string is used verbatim as a regular expression.

    The objects in the use array are processed in order. If the result of a comparison is true, no further processing will be made and the data in the cache will be used, subject to the value of ttl.

    If the result of the comparison is false, then the next object is processed. The process continues until the array is exhausted. If there is no match, then data in the cache will not be used.

    Note that use is relevant only if the query is subject to caching, that is, if all queries are cached or if a query matches a particular rule in the store array.

    Examples

    Use data from the cache for all users except admin (actually 'admin'@'%'), regardless of what host the admin user comes from.

    Security

    As the cache is not aware of grants, unless the cache has been explicitly configured who the caching should apply to, the presence of the cache may provide a user with access to data he should not have access to. Note that the following applies only if users=mixed has been configured. If users=isolated has been configured, then there can never be any unintended sharing between users.

    Suppose there is a table access that the user alice has access to, but the user bob does not. If bob tries to access the table, he will get an error as reply:

    If we now setup caching for the table, using the simplest possible rules file, bob will get access to data from the table, provided he executes a select identical with one alice has executed.

    For instance, suppose the rules look as follows:

    If alice now queries the table, she will get the result, which also will be cached:

    If bob now executes the very same query, and the result is still in the cache, it will be returned to him.

    That can be prevented, by explicitly declaring in the rules that the caching should be applied to alice only.

    With these rules in place, bob is again denied access, since queries targeting the table access will in his case not be served from the cache.

    Storage

    There are two types of storages that can be used; local and shared.

    The only local storage implementation is storage_inmemory that simply stores the cache values in memory. The storage is not persistent and is destroyed when MaxScale terminates. Since the storage exists in the MaxScale process, it is very fast and provides almost always a performance benefit.

    Currently there are two shared storages; storage_memcached andstorage_redis that are implemented using and respectively.

    The shared storages are accessed across the network and consequently it is_not_ self-evident that their use will provide any performance benefit. Namely, irrespective of whether the data is fetched from the cache or from the server there will be a network hop and often that network hop is, as far as the performance goes, what costs the most.

    The presence of a shared cache may provide a performance benefit_if the network between MaxScale and the storage server (memcached or_ &#xNAN;Redis) is faster than the network between MaxScale and the database &#xNAN;server, if the used SELECT statements are heavy (that is, take a significant amount of time) to process for the database server, or

    • if the presence of the cache reduces the overall load of an otherwise overloaded database server.

    As a general rule a shared storage should not be used without first assessing its value using a realistic workload.

    storage_inmemory

    This simple storage module uses the standard memory allocator for storing the cached data.

    This storage module takes no arguments.

    storage_memcached

    This storage module uses for storing the cached data.

    Multiple MaxScale instances can share the same memcached server and items cached by one MaxScale instance will be used by the other. Note that all MaxScale instances should have exactly the same configuration, as otherwise there can be unintended sharing.

    storage_memcache has the following parameters:

    server

    • Type: The Memcached server address specified as host[:port]

    • Mandatory: Yes

    • Dynamic: No

    If no port is provided, then the default port 11211 will be used.

    max_value_size

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: 1Mi

    By default, the maximum size of a value stored to memcached is 1MiB, but that can be configured to something else, in which case this parameter should be set accordingly.

    The value of max_value_size will be used for capping max_resultset_size, that is, if memcached has been configured to allow larger values than 1MiB but max_value_size has not been set accordingly, only resultsets up to 1MiB in size will be cached.

    Example

    From MaxScale 23.02 onwards, the storage configuration should be provided as nested parameters.

    Although deprecated in 23.02, the configuration can also be provided using storage_options:

    Limitations

    • Invalidation is not supported.

    • Configuration values given to max_size and max_count are ignored.

    Security

    Neither the data in the memcached server nor the traffic between MaxScale and the memcached server is encrypted. Consequently, anybody with access to the memcached server or to the network have access to the cached data.

    storage_redis

    This storage module uses for storing the cached data.

    Note that Redis should be configured with no idle timeout or with a timeout that is very large. Otherwise MaxScale may have to repeatedly connect to Redis, which will hurt both the functionality and the performance.

    Multiple MaxScale instances can share the same redis server and items cached by one MaxScale instance will be used by the other. Note that all MaxScale instances should have exactly the same configuration, as otherwise there can be unintended sharing.

    If storage_redis cannot connect to the Redis server, caching will silently be disabled and a connection attempt will be made after a interval.

    If a timeout error occurs during an operation, reconnecting will be attempted after a delay, which will be an increasing multiple of timeout. For example, if timeout is the default 5 seconds, then reconnection attempts will first be made after 10 seconds, then after 15 seconds, then 20 and so on. However, once 60 seconds have been reached, the delay will no longer be increased but the delay will stay at one minute. Note that each time a reconnection attempt is made, unless the reason for the timeout has disappeared, the client will be stalled for timeout seconds.

    storage_redis has the following parameters:

    server

    • Type: The Redis server address specified as host[:port]

    • Mandatory: Yes

    • Dynamic: No

    If no port is provided, then the default port 6379 will be used.

    username

    • Type: string

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    Please see for more information.

    password

    • Type: string

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    Please see for more information.

    ssl

    • Type:

    • Mandatory: No

    • Dynamic: No

    • Default: false

    Please see for more information.

    ssl_cert

    • Type: Path to existing readable file.

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    The SSL client certificate that MaxScale should use with the Redis server. The certificate must match the key defined in ssl_key.

    Please see for more information.

    ssl_key

    • Type: Path to existing readable file.

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    The SSL client private key MaxScale should use with the Redis server.

    Please see for more information.

    ssl_ca

    • Type: Path to existing readable file.

    • Mandatory: No

    • Dynamic: No

    • Default: ""

    The Certificate Authority (CA) certificate for the CA that signed the certificate specified with ssl_cert.

    Please see for more information.

    Authentication

    If password is provided, MaxScale will authenticate against Redis when a connection has been created. The authentication is performed using the command, with only the password as argument, if no username was provided in the configuration, or username and password as arguments, if both were.

    Note that if the authentication is in the Redis configuration file specified using requirepass, then only the password should be provided. If the Redis server version is 6 or higher and the Redis ACL system is used, then both username and password must be provided.

    SSL

    If ssl_key, ssl_cert and ssl_ca are provided, then SSL/TLS will be used in the communication with the Redis server, if ssl is set to true.

    Note that the SSL/TLS support is only available in Redis from version 6 onwards and that the support is not by default built into Redis, but has to be specifically enabled at compile time as explained .

    Example

    From MaxScale 23.02 onwards, the storage configuration should be provided as nested parameters.

    Although deprecated in 23.02, the configuration can also be provided using storage_options:

    Limitations

    • There is no distinction between soft and hard ttl, but only hard ttl is used.

    • Configuration values given to max_size and max_count are ignored.

    Invalidation

    storage_redis supports invalidation, but the caveats documented are of greater significance since also the communication between the cache and the cache storage is asynchronous and takes place over the network.

    NOTE If invalidation is turned on after caching has been used (in non-invalidation mode), redis must be flushed as otherwise there will be entries in the cache that will not be affected by the invalidation.

    Security

    The data in the redis server is not encrypted. Consequently, anybody with access to the redis server has access to the cached data.

    Unless has been enabled, anybody with access to the network has access to the cached data.

    Example

    In the following we define a cache MyCache that uses the cache storage modulestorage_inmemory and whose soft ttl is 30 seconds and whose hard ttl is45 seconds. The cached data is shared between all threads and the maximum size of the cached data is 50 mebibytes. The rules for the cache are in the filecache_rules.json.

    Configuration

    cache_rules.json

    The rules specify that the data of the table sbtest should be cached.

    Performance

    When the cache filter was introduced, the most significant factor affecting the performance of the cache was whether the statements needed to be parsed. Initially, all statements were parsed in order to exclude SELECT statements that use non-cacheable functions, access non-cacheable variables or refer to system or user variables. Later, the default value of the selects parameter was changed to assume_cacheable, to maximize the default performance.

    With the default configuration, the cache itself will not cause the statements to be parsed. However, even with assume_cacheable configured, a rule referring specifically to a database, table or column will still cause the statement to be parsed.

    For instance, a simple rule like

    cannot be fulfilled without parsing the statement.

    If the rule is instead expressed using a regular expression

    then the statement will not be parsed.

    However, when the was introduced, the parsing cost was significantly reduced and currently the cost for parsing and regular expression matching is roughly the same.

    In the following is a table with numbers giving a rough picture of the relative cost of different approaches.

    In the table, regexp match means that the cacheable statements were picked out using a rule like

    while exact match means that the cacheable statements were picked out using a rule like

    The exact match rule requires all statements to be parsed.

    As the purpose of the test is to illustrate the overhead of different approaches, the rules were formulated so that all SELECT statements would match.

    Note that these figures were obtained by running sysbench, MaxScale and the server in the same computer, so they are only indicative.

    selects
    Rule
    qps

    For comparison, without caching, the qps is 33.

    As can be seen, due to the query classifier cache there is no difference between exact and regex based matching.

    Summary

    For maximum performance:

    • Arrange the situation so that the default selects=assume_cacheable can be used, and use no rules.

    Otherwise it is mostly a personal preference whether exact or regex based rules are used. However, one should always test with real data and real queries before choosing one over the other.

    This page is licensed: CC BY-SA / Gnu FDL

    information_schema

    soft_ttl

  • max_resultset_rows

  • max_resultset_size

  • max_count

  • max_size

  • rules

  • cached_data

  • selects

  • cache_in_transactions

  • debug

  • enabled

  • invalidate

  • clear_cache_on_parse_errors

  • users

  • timeout

  • @maxscale.cache.hard_ttl

    Regexp Matching

  • Examples

  • When to Use

    • Examples

  • max_value_size

  • Example

  • Limitations

  • Security

  • storage_redis

    • server

    • username

    • password

  • Default: thread_specific

    Default: assume_cacheable

    ,
    IS_FREE_LOCK
    ,
    IS_USED_LOCK
    ,
    LAST_INSERT_ID
    ,
    LOAD_FILE
    ,
    LOCALTIME
    ,
    LOCALTIMESTAMP
    ,
    MASTER_POS_WAIT
    ,
    NOW
    ,
    RAND
    ,
    RELEASE_LOCK
    ,
    SESSION_USER
    ,
    SLEEP
    ,
    SYSDATE
    ,
    SYSTEM_USER
    ,
    UNIX_TIMESTAMP
    ,
    USER
    ,
    UUID
    ,
    UUID_SHORT
    .
  • The SELECT accesses any of the following fields: CURRENT_DATE,CURRENT_TIMESTAMP, LOCALTIME, LOCALTIMESTAMP

  • The SELECT uses system or user variables.

  • Default: all_transactions

  • ) A non-matching rule is logged.
  • 4 (0b00100) A decision to use data from the cache is logged.

  • 8 (0b01000) A decision not to use data from the cache is logged.

  • 16 (0b10000) Higher level decisions are logged.

  • Default: never

    Default: mixed

    the value a string.

    verify_cacheable

    regexp match

    80

    verify_cacheable

    exact match

    80

    thread_specific

    No thread contention. Data/work duplicated across threads. May cause unintended sharing.

    No thread contention. Data/work duplicated across threads and users. No unintended sharing. Requires the most amount of memory.

    shared

    Thread contention under high load. No duplicated data/work. May cause unintended sharing. Requires the least amount of memory.

    Thread contention under high load. Data/work duplicated across users. No unintended sharing.

    assume_cacheable

    none

    100

    assume_cacheable

    regexp match

    83

    assume_cacheable

    exact match

    83

    verify_cacheable

    none

    80

    Overview
    Limitations
    Prepared Statements
    Multi-statements
    Security
    Invalidation
    Best Efforts
    Configuration
    Settings
    storage
    storage_options
    hard_ttl
    Runtime Configuration
    @maxscale.cache.populate
    @maxscale.cache.use
    @maxscale.cache.soft_ttl
    Client Driven Caching
    Threads, Users and Invalidation
    Invalidation
    cached_data=thread_specific
    cache_data=shared
    Rules
    When to Store
    Qualified Names
    Implication of the default database
    Security
    Storage
    storage_inmemory
    storage_memcached
    server
    Example
    Configuration
    cache_rules.json
    Performance
    Summary
    cache_in_transactions
    selects
    Security
    users
    invalidation
    invalidate
    clear_cache_on_parse_errors
    Threads, Users and Invalidation
    soft
    hard
    Storage
    duration
    soft_ttl
    duration
    hard_ttl
    size
    here
    size
    enum
    enum
    performance
    enum
    boolean
    Runtime Configuration
    enum
    boolean
    enum
    duration
    memcached
    redis
    memcached
    size
    redis
    timeout
    authentication
    authentication
    boolean
    ssl
    ssl
    ssl
    ssl
    auth
    here
    here
    SSL
    query classifier cache
    SELECT * FROM t WHERE a=1;
    INSERT INTO t SET a=42;
    Timeline 1                 Timeline 2
    
    Clients execute       INSERT ...                 SELECT COUNT(*) FROM tbl
    MaxScale -> DB                                   SELECT COUNT(*) FROM tbl
    MaxScale -> DB        INSERT ...
    MaxScale -> Cache     Delete invalidated values
    MaxScale -> Cache                                Store result and invalidation key
    [Cache]
    type=filter
    module=cache
    hard_ttl=30
    soft_ttl=20
    rules=...
    ...
    
    [Cached-Routing-Service]
    type=service
    ...
    filters=Cache
    storage=storage_redis
    hard_ttl=60s
    soft_ttl=60s
    max_resultset_rows=1000
    max_resultset_size=128Ki
    max_count=1000
    max_size=100Mi
    rules=/path/to/rules-file
    maxctrl alter filter MyCache rules='/path/to/rules-file'
    cached_data=shared
    selects=verify_cacheable
    cache_in_transactions=never
    debug=31
    enabled=false
    * `never`: No invalidation is performed. This is the default.
    * `current`: When a modification is made, entries in the cache used by
      the current session are invalidated. Other sessions that use the same
      cache will also be affected, but sessions that use another cache will
      not.
    * `mixed`: The data of different users is stored in the same
      cache. This is the default and may cause that a user can
      access data he should not have access to.
    * `isolated`: Each user has a unique cache and there can be
      no unintended sharing.
    timeout=7000ms
    SET @maxscale.cache.populate=TRUE;
    SELECT a, b FROM tbl;
    SET @maxscale.cache.populate=FALSE;
    SELECT a, b FROM tbl;
    SELECT @maxscale.cache.populate;
    SET @maxscale.cache.use=TRUE;
    SELECT a, b FROM tbl;
    SET @maxscale.cache.use=FALSE;
    SELECT a, b FROM tbl;
    SELECT @maxscale.cache.use;
    SET @maxscale.cache.soft_ttl=600;
    SELECT a, b FROM unimportant;
    SET @maxscale.cache.soft_ttl=60;
    SELECT c, d FROM important;
    SELECT @maxscale.cache.soft_ttl;
    SET @maxscale.cache.soft_ttl=600, @maxscale.cache.hard_ttl=610;
    SELECT a, b FROM unimportant;
    SET @maxscale.cache.soft_ttl=60, @maxscale.cache.hard_ttl=65;
    SELECT c, d FROM important;
    SELECT @maxscale.cache.hard_ttl;
    [TheCache]
    type=filter
    module=cache
    enabled=false
    SET @maxscale.cache.populate=TRUE;
    SELECT a, b FROM tbl1;
    SELECT c, d FROM tbl2;
    SELECT e, f FROM tbl3;
    SET @maxscale.cache.populate=FALSE;
    SET @maxscale.cache.use=TRUE;
    SELECT a, b FROM tbl1;
    SET @maxscale.cache.use=FALSE;
    UPDATE tbl1 SET a = ...;
    SET @maxscale.cache.populate=TRUE;
    SELECT a, b FROM tbl1;
    SET @maxscale.cache.populate=FALSE;
    {
        store: [ ... ],
        use: [ ... ]
    }
    [
        {
            store: [ ... ],
            use: [ ... ]
        },
        { ... }
    ]
    {
        "attribute": <string>,
        "op": <string>
        "value": <string>
    }
    SELECT * FROM db1.tbl
    USE db1;
    SELECT * FROM tbl
    SELECT * FROM tbl WHERE a = 2 AND b = 3;
    SELECT * FROM tbl WHERE b = 3 AND a = 2;
    SELECT a FROM tbl;
    SELECT b FROM tbl WHERE a > 5;
    USE somedb;
    SELECT fld FROM tbl;
    {
        "store": [
            {
                "attribute": "database",
                "op": "=",
                "value": "db1"
            }
        ]
    }
    {
        "store": [
            {
                "attribute": "table",
                "op": "!=",
                "value": "tbl1"
            }
        ]
    }
    {
        "store": [
            {
                "attribute": "table",
                "op": "!=",
                "value": "db1.tbl1"
            }
        ]
    }
    {
        "store": [
            {
                "attribute": "query",
                "op": "like",
                "value": ".*WHERE.*"
            }
        ]
    }
    {
        "attribute": <string>,
        "op": <string>
        "value": <string>
    }
    {
        "attribute": "user",
        "op": "=",
        "value": "'bob'@'%'"
    }
    
    {
        "attribute": "user",
        "op": "like",
        "value": "bob@.*"
    }
    {
        "use": [
            {
                "attribute": "user",
                "op": "!=",
                "value": "admin"
            }
        ]
    }
    MySQL [testdb]> select * from access;
    ERROR 1142 (42000): SELECT command denied to user 'bob'@'localhost' for table 'access'
    {
        "store": [
            {
                "attribute": "table",
                "op": "=",
                "value": "access"
            }
        ]
    }
    MySQL [testdb]> select * from access;
    +------+------+
    | a    | b    |
    +------+------+
    |   47 |   11 |
    +------+------+
    MySQL [testdb]> select current_user();
    +----------------+
    | current_user() |
    +----------------+
    | bob@127.0.0.1  |
    +----------------+
    1 row in set (0.00 sec)
    
    MySQL [testdb]> select * from access;
    +------+------+
    | a    | b    |
    +------+------+
    |   47 |   11 |
    +------+------+
    {
        "store": [
            {
                "attribute": "table",
                "op": "=",
                "value": "access"
            }
        ],
        "use": [
            {
                "attribute": "user",
                "op": "=",
                "value": "'alice'@'%'"
            }
        ]
    }
    storage=storage_inmemory
    storage=storage_memcached
    [Cache-Filter]
    type=filter
    module=cache
    storage=storage_memcached
    storage_memcached.server=192.168.1.31
    storage_memcached.max_value_size=10M
    storage_options="server=192.168.1.31,max_value_size=10M"
    storage=storage_redis
    [Cache-Filter]
    type=filter
    module=cache
    storage=storage_redis
    storage_redis.server=192.168.1.31
    storage_redis.username=hello
    storage_redis.password=world
    storage_options="server=192.168.1.31,username=hello,password=world"
    $ redis-cli flushall
    [MyCache]
    type=filter
    module=cache
    storage=storage_inmemory
    soft_ttl=30
    hard_ttl=45
    cached_data=shared
    max_size=50Mi
    rules=cache_rules.json
    
    [MyService]
    type=service
    ...
    filters=MyCache
    {
        "store": [
            {
                "attribute": "table",
                "op": "=",
                "value": "sbtest"
            }
        ]
    }
    {
        "store": [
            {
                "attribute": "database",
                "op": "=",
                "value": "db1"
            }
        ]
    }
    {
        "store": [
            {
                "attribute": "query",
                "op": "like",
                "value": "FROM db1\\..*"
            }
        ]
    }
    {
        "attribute": "query",
        "op": "unlike",
        "value": "FROM nomatch"
    }
    {
        "attribute": "database",
        "op": "!=",
        "value": "nomatch"
    }
    ssl
    ssl_cert
    ssl_key
    ssl_ca
    Authentication
    SSL
    Example
    Limitations
    Invalidation
    Security
    GTID-based replication