1 of 4

MaxScale Monitors

Ensure high availability with MaxScale Monitors. This reference details monitoring modules for MariaDB replication and Galera clusters, covering failover, role detection, and health checks.

Common Monitor Parameters

Configure standard settings for all MaxScale monitors. Learn about essential parameters like monitor intervals, backend timeouts, and user credentials for connecting to database servers.

This document settings supported by all monitors. These should be defined in the monitor section of the configuration file.

Settings

`module`

Type: string
Mandatory: Yes
Dynamic: No

The monitor module this monitor should use. Typically mariadbmon orgaleramon.

`user`

Type: string
Mandatory: Yes
Dynamic: Yes

Username used by the monitor to connect to the backend servers. If a server defines the monitoruser parameter, that value will be used instead.

`password`

Type: string
Mandatory: Yes
Dynamic: Yes

Password for the user defined with the user parameter. If a server defines the monitorpw parameter, that value will be used instead.

Note: In older versions of MaxScale this parameter was called passwd. The use of passwd was deprecated in MaxScale 2.3.0.

`role`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

the monitor should activate right after connecting to a server. If empty, no role is set. This setting may be useful if the same username is used for both monitors and services. As monitors and services require different privileges, these privileges can be granted to the monitor and the service roles separately instead of granting them all to the same user. MariaDB Monitor and Galera Monitor currently use this setting. If the monitor is configured to use a role, the role is taken into use even if the server uses a .

`servers`

Type: string
Mandatory: Yes
Dynamic: Yes

A comma-separated list of servers the monitor should monitor.

`monitor_interval`

Type:
Mandatory: No
Dynamic: Yes
Default: 2s

Defines how often the monitor updates the status of the servers. Choose a lower value if servers should be queried more often. The smallest possible value is 100 milliseconds. If querying the servers takes longer than monitor_interval, the effective update rate is reduced.

If no explicit unit is provided, the value is interpreted as milliseconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected.

`backend_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 3s

Controls the timeout for communicating with a monitored server. The parameter sets the timeout for connecting, writing and reading from a server.

The timeout is specified as documented . A value without a unit is rejected. The minimum value is 1 second.

`backend_connect_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 3s

This parameter controls the timeout for connecting to a monitored server. The timeout is specified as documented A value without a unit is rejected. The minimum value is 1 second.

This parameter has been deprecated since MaxScale 25.10.0 and is an alias of backend_timeout.

`backend_write_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 3s

Deprecated and ignored since MaxScale 25.10.0.

`backend_read_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 3s

Deprecated and ignored since MaxScale 25.08.0.

This parameter controls the timeout for reading a query result from a monitored server. The timeout is specified as documented A value without a unit is rejected, as are values specified in milliseconds. The minimum value is 1 second.

`backend_connect_attempts`

Type: number
Mandatory: No
Dynamic: Yes
Default: 1

This parameter defines the maximum times a backend connection is attempted every monitoring loop. Every attempt may take up to backend_connect_timeout seconds to perform. If none of the attempts are successful, the backend is considered to be unreachable and down.

`disk_space_threshold`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

This parameter duplicates the disk_space_threshold . If the parameter has not been specified for a server, then the one specified for the monitor is applied.

NOTE: Since MariaDB 10.4.7, MariaDB 10.3.17 and MariaDB 10.2.26, the information will be available only if the monitor user has the FILE privilege.

That is, if the disk configuration is the same on all servers monitored by the monitor, it is sufficient (and more convenient) to specify the disk space threshold in the monitor section, but if the disk configuration is different on all or some servers, then the disk space threshold can be specified individually for each server.

For example, suppose server1, server2 and server3 are identical in all respects. In that case we can specify disk_space_threshold in the monitor.

However, if the servers are heterogeneous with the disk used for the data directory mounted on different paths, then the disk space threshold must be specified separately for each server.

If most of the servers have the data directory disk mounted on the same path, then the disk space threshold can be specified on the monitor and separately on the server with a different setup.

Above, server1 has the disk used for the data directory mounted at /DbData while both server2 and server3 have it mounted on/data and thus the setting in the monitor covers them both.

`disk_space_check_interval`

Type:
Mandatory: No
Dynamic: Yes
Default: 0s

With this parameter it can be specified the minimum amount of time between disk space checks. If no explicit unit is provided, the value is interpreted as milliseconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. The default value is 0, which means that by default the disk space will not be checked.

Note that as the checking is made as part of the regular monitor interval cycle, the disk space check interval is affected by the value ofmonitor_interval. In particular, even if the value ofdisk_space_check_interval is smaller than that of monitor_interval, the checking will still take place at monitor_interval intervals.

`script`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

This command will be executed on a server state change. The parameter should be an absolute path to a command or the command should be in the executable path. The user running MaxScale should have execution rights to the file itself and the directory it resides in. The script may have placeholders which MaxScale will substitute with useful information when launching the script.

The placeholders and their substitution results are:

$INITIATOR -> IP and port of the server which initiated the event
$EVENT -> event description, e.g. "server_up"
$LIST -> list of IPs and ports of all servers

The expanded variable value can be an empty string if no servers match the variable's requirements. For example, if no primaries are available $MASTERLIST will expand into an empty string. The list-type substitutions will only contain servers monitored by the current monitor.

The above script could be executed as:

See section below for an example script.

Any output by the executed script will be logged into the MaxScale log. Each outputted line will be logged as a separate log message.

The log level on which the messages are logged depends on the format of the messages. If the first word in the output line is one of alert:, error:,warning:, notice:, info: or debug:, the message will be logged on the corresponding level. If the message is not prefixed with one of the keywords, the message will be logged on the notice level. Whitespace before, after or between the keyword and the colon is ignored and the matching is case-insensitive.

Currently, the script must not execute any of the following MaxCtrl calls as they cause a deadlock:

alter monitor to the monitor executing the script
stop monitor to the monitor executing the script
call command to a MariaDB-Monitor that is executing the script

`script_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 90s

The timeout for the executed script. If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

If the script execution exceeds the configured timeout, it is stopped by sending a SIGTERM signal to it. If the process does not stop, a SIGKILL signal will be sent to it once the execution time is greater than twice the configured timeout.

`events`

Type:
Mandatory: No
Dynamic: Yes
Values: master_down, master_up, slave_down

A list of event names which cause the script to be executed. If this option is not defined, all events cause the script to be executed. The list must contain a comma separated list of event names.

The following table contains all the possible event types and their descriptions.

Event Name

Description

`journal_max_age`

Type:
Mandatory: No
Dynamic: Yes
Default: 28800s

The maximum journal file age. If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the max age is seconds, a max age specified in milliseconds will be rejected, even if the duration is longer than a second.

When the monitor starts, it reads any stored journal files. If the journal file is older than the value of journal_max_age, it will be removed and the monitor starts with no prior knowledge of the servers.

`primary_state_sql`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

Defines custom SQL that is run on a primary server, i.e. a server with Master-status. The SQL is run on the primary when:

Monitor starts
Server gains Master-status
Server gained Master-status on a previous monitor tick but running the SQL failed due to a network failure

Use this setting to e.g. change global MariaDB Server settings depending on the server role. Multiple SQL queries can be set by combining them to a multiquery. If the query string spans multiple lines, each line after the first must start with empty space.

When testing this feature to see exactly when the queries are run, look for a log message like Monitor MyMonitor ran the SQL defined in primary_state_sql on MyServer1.

`replica_state_sql`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

Similar to primary_state_sql, but for replica servers, i.e. servers with Slave-status.

Monitor Crash Safety

Starting with MaxScale 2.2.0, the monitor modules keep an on-disk journal of the latest server states. This change makes the monitors crash-safe when options that introduce states are used. It also allows the monitors to retain stateful information when MaxScale is restarted.

For MySQL monitor, options that introduce states into the monitoring process are the detect_stale_master and detect_stale_slave options, both of which are enabled by default. Galeramon has the disable_master_failback parameter which introduces a state.

The default location for the server state journal is in/var/lib/maxscale/<monitor name>/monitor.dat where <monitor name> is the name of the monitor section in the configuration file. If MaxScale crashes or is shut down in an uncontrolled fashion, the journal will be read when MaxScale is started. To skip the recovery process, manually delete the journal file before starting MaxScale.

Script example

Below is an example monitor configuration which launches a script with all supported substitutions. The example script reads the results and prints it to file and sends it as email.

File "maxscale_monitor_alert_script.sh":

initiator="" parent="" children="" event="" node_list="" list="" master_list="" slave_list="" synced_list=""

process_arguments() { while [ "$1" != "" ]; do if [[ "$1" =~ ^--initiator=.* ]]; then initiator=${1#'--initiator='} elif [[ "$1" =~ ^--parent.* ]]; then parent=${1#'--parent='} elif [[ "$1" =~ ^--children.* ]]; then children=${1#'--children='} elif [[ "$1" =~ ^--event.* ]]; then event=${1#'--event='} elif [[ "$1" =~ ^--node_list.* ]]; then node_list=${1#'--node_list='} elif [[ "$1" =~ ^--list.* ]]; then list=${1#'--list='} elif [[ "$1" =~ ^--master_list.* ]]; then master_list=${1#'--master_list='} elif [[ "$1" =~ ^--slave_list.* ]]; then slave_list=${1#'--slave_list='} elif [[ "$1" =~ ^--synced_list.* ]]; then synced_list=${1#'--synced_list='} fi shift done }

process_arguments $@ read -r -d '' MESSAGE << EOM A server has changed state. The following information was provided:

Initiator: $initiator Parent: $parent Children: $children Event: $event Node list: $node_list List: $list Primary list: $master_list Replica list: $slave_list Synced list: $synced_list EOM

print message to file

echo "$MESSAGE" > /path/to/script_output.txt

email the message

echo "$MESSAGE" | mail -s "MaxScale received $event event for initiator $initiator." mariadb_admin@domain.com |

_{This page is licensed: CC BY-SA / Gnu FDL}

Galera Monitor

Monitor Galera Clusters with the galeramon module. This guide explains how to detect cluster state, assign read/write roles, and manage node priorities for effective load balancing.

Overview

The Galera Monitor monitors a Galera cluster. It detects whether nodes are part of the cluster and if they are in sync with the rest of the cluster. The monitor also assigns read and write roles to servers, allowing Galera clusters to be used with modules designed for traditional primary-replica clusters.

By default, the Galera Monitor will choose the node with the lowest wsrep_local_index value as the primary. This will mean that two MaxScales running on different servers will choose the same server as the primary.

WSREP Variables and Their Effects

The following WSREP variables are inspected by Galera Monitor to see whether a node is usable. If the node is not usable, it loses the Read and Write labels and will be in the Running state.

If wsrep_ready=0, the WSREP system is not yet ready and the Galera node cannot accept queries.
If wsrep_desync=1 is set, the node is desynced and is not participating in the Galera replication.
If wsrep_reject_queries=[ALL|ALL_KILL] is set, queries are refused and the node is unusable.

Galera clusters and replicas replicating from it

MaxScale 2.4.0 added support for replicas replicating off of Galera nodes. If a non-Galera server monitored by Galera Monitor is replicating from a Galera node also monitored by the same monitor, it will be assigned the Read, Running status as long as the replication works. This allows read-scaleout with Galera servers without increasing the size of the Galera cluster.

Required Grants

The Galera Monitor requires the REPLICA MONITOR grant to work:

With MariaDB Server 10.4 and earlier, REPLICATION CLIENT is required instead.

If set_donor_nodes is configured, the SUPER grant is required:

Configuration

A minimal configuration for a Galera Monitor requires a set of servers and a username and a password to connect to these servers. The user must have grants as described in the .

Common Monitor Settings

For a list of optional parameters that all monitors support, read the document.

Settings

`disable_master_failback`

Type:
Default: false
Dynamic: Yes

If a node marked as primary inside MaxScale happens to fail and the primary status is assigned to another node MaxScale will normally return the primary status to the original node after it comes back up. With this option enabled, if the primary status is assigned to a new node it will not be reassigned to the original node for as long as the new primary node is running. In this case the Master Stickiness status bit is set which will be visible in the maxctrl list servers output.

`available_when_donor`

Type:
Default: false
Dynamic: Yes

This option allows Galera nodes to be used normally when they are donors in an SST operation when the SST method is non-blocking (e.g. wsrep_sst_method=mariadb-backup).

Normally when an SST is performed, both participating nodes lose their Synced, Write or Read statuses. When this option is enabled, the donor is treated as if it was a normal member of the cluster (i.e. wsrep_local_state = 4). This is especially useful if the cluster drops down to one node and an SST is required to increase the cluster size.

The current list of non-blocking SST methods are xtrabackup, xtrabackup-v2 and mariadb-backup. Read the documentation for more details.

`disable_master_role_setting`

Type:
Default: false
Dynamic: Yes

This disables the assignment of primary and replica roles to the Galera cluster nodes. If this option is enabled, Synced is the only status assigned by this monitor.

`use_priority`

Type:
Default: false
Dynamic: Yes

Enable interaction with server priorities. This will allow the monitor to deterministically pick the write node for the monitored Galera cluster and will allow for controlled node replacement.

`root_node_as_master`

Type:
Default: false
Dynamic: Yes

This option controls whether the write primary Galera node requires wsrep_local_index value of 0. This option was introduced in MaxScale 2.1.0 and it is disabled by default in versions 2.1.5 and newer. In versions 2.1.4 and older, the option was enabled by default.

A Galera cluster will always have a node which has a wsrep_local_index value of 0. Based on this information, multiple MaxScale instances can always pick the same node for writes.

If the root_node_as_master option is disabled, the node with the lowest index will always be chosen as the primary. If it is enabled, only the node with a wsrep_local_index value of 0 can be chosen as the primary.

This parameter can work with disable_master_failback but using them together is not advisable: the intention of root_node_as_master is to make sure that all MaxScale instances that are configured to use the same Galera cluster will send writes to the same node. If disable_master_failback is enabled, this is no longer true if the Galera cluster reorganizes itself in a way that a different node gets the node index 0, writes would still be going to the old node that previously had the node index 0. A restart of one of the MaxScales or a new MaxScale joining the cluster will cause writes to be sent to the wrong node, thus resulting in an increasing the rate of deadlock errors and sub-optimal performance.

`set_donor_nodes`

Type:
Default: false
Dynamic: Yes

This option controls whether the global variable wsrep_sst_donor should be set in each cluster node with Read status. The variable contains a list of replica servers, automatically sorted, with possible primary candidates at its end.

The sorting is based either on wsrep_local_index or node server priority depending on the value of use_priority option. If no server has priority defined the sorting switches to wsrep_local_index. Node names are collected by fetching the result of the variable wsrep_node_name.

Example of variable being set in all replica nodes, assuming three nodes:

The monitor user requires the SUPER privilege to set the global variable.

Interaction with Server Priorities

If the use_priority option is set and a server is configured with the priority=<int> parameter, the monitor will use that as the basis on which the primary node is chosen. This requires the disable_master_role_setting to be undefined or disabled. The server with the lowest positive value of priority will be chosen as the primary node when a replacement Galera node is promoted to a primary server inside MaxScale. If all candidate servers have the same priority, the order of the servers in the servers parameter dictates which is chosen as the primary.

Nodes with a negative value (priority < 0) will never be chosen as the primary. This allows you to mark some servers as permanent replicas by assigning a non-positive value into priority. Nodes with the default priority of 0 are only selected if no nodes with higher priority are present and the normal node selection rules apply to them (i.e. selection is based on wsrep_local_index).

Here is an example.

In this example node-1 is always used as the primary if available. If node-1 is not available, then the next node with the highest priority rank is used. In this case it would be node-3. If both node-1 and node-3 were down, thennode-2 would be used. Because node-4 has a value of -1 in priority, it will never be the primary. Nodes without priority parameter are considered as having a priority of 0 and will be used only if all nodes with a positive priority value are not available.

With priority ranks you can control the order in which MaxScale chooses the primary node. This will allow for a controlled failure and replacement of nodes.

Switchover

Priorities can be used to force a runtime change of the primary server in a Galera Cluster. For example, if server1 has a priority of 1 and server2 a priority of 2 (with server1 being primary), the roles can be reversed with MaxCtrl:

This does not affect the Galera Cluster itself, just the roles MaxScale assigns to the servers. If multiple MaxScales monitor the same Galera Cluster without , the commands should be run on all MaxScales.

Bootstrap

Bootstrap (added in MaxScale 25.08.0) bootstraps an empty monitor (no servers), adding servers from an existing Galera cluster. Bootstrap requires the address of a server in the cluster to start from. The monitor connects to the address and reads the wsrep_incoming_addresses variable. The monitor adds the addresses listed as servers to both MaxScale and the monitor. Server names are auto-generated as in <monitor_name>-server, e.g. MyGaleraMonitor-server1.

Bootstrap accepts the following key-value arguments:

argument

type

default

description

The address and port-settings of the discovered servers are set to the values listed in wsrep_incoming_addresses. Other settings are copied from the server given in the template-setting, so that the discovered servers inherit e.g. TLS settings. If no server template is given, discovered servers will use . The server template must be a valid, existing server in MaxScale configuration. It need not be monitored by any monitor and its address and port-settings can point to a non-existing (but theoretically valid) network address. It can be configured in the config file or created runtime:

Bootstrap is incompatible with and will refuse to run if it is enabled.

Discover-replicas

Discover-replicas (added in MaxScale 25.08.0) is a manual command which adds any missing Galera servers to MaxScale and the monitor. The servers are discovered by fetching the value of the wsrep_incoming_addresses status variable (which contains the addresses of all Galera nodes) from the primary server and comparing it to the values of wsrep_node_incoming_address (which contains just the address of the current node) of already monitored servers. Any addresses listed in wsrep_incoming_addresses but not found in any existing server are then assumed to be missing servers. The missing servers are then added to MaxScale configuration and the monitor. The command can also optionally remove servers that are shut down or not part of the Galera cluster.

Discover-replicas accepts the following key-value arguments:

argument

type

default

description

Any discovered servers are added to MaxScale as if created via runtime maxctrl create server .... The servers are thus similar to any other runtime configured server and are visible in the GUI and maxctrl list servers. The address and port-settings of the discovered servers are set to the values listed in wsrep_incoming_addresses. Other settings are copied from the current primary server, so that the discovered servers inherit e.g. TLS settings. Discovered servers are named <monitor_name>-server, e.g. MyGaleraMonitor-server3.

A server can only be removed if it is not explicitly used by any other module, e.g. a service. Thus, this command is best used when services are configured with the -setting as the services will then automatically match any changes in the set of monitored servers.

_{This page is licensed: CC BY-SA / Gnu FDL}

MariaDB Monitor

Manage primary-replica clusters with the mariadbmon module. Learn to configure automatic failover, perform switchovers, and monitor replication lag to maintain database availability.

Overview

MariaDB Monitor monitors a Primary-Replica replication cluster. It probes the state of the backends and assigns server roles such as primary and replica, which are used by the routers when deciding where to route a query. It can also modify the replication cluster by performing failover, switchover and rejoin.

Required Grants

The monitor user requires the following grant:

If the monitor needs to query server disk space (for instance, disk_space_threshold is set), it needs the FILE privilege:

The CONNECTION ADMIN privilege is recommended since it allows the monitor to log in even if server connection limit has been reached.

, and require the following privilege:

% tabs %}

Cluster Manipulation Grants

If are used, the monitor requires several additional privileges. These privileges allow the monitor to set the read-only flag, modify replication connections and kill connections from clients that could interfere with an ongoing operation.

If is enabled, the monitor requires the EVENT privilege. SHOW DATABASES is also recommended to ensure monitor can see events for all databases.

If a separate replication user is defined (with replication_user andreplication_password), it requires the following grant:

Primary selection

Only one backend can be primary at any given time. A primary must be running (successfully connected to by the monitor) and its read_only-setting must be off. A primary may not be replicating from another server in the monitored cluster unless the primary is part of a multiprimary group. Primary selection prefers to select the server with the most replicas, possibly in multiple replication layers. Only replicas reachable by a chain of running relays or directly connected to the primary count. When multiple servers are tied for primary status, the server which appears earlier in the servers-setting of the monitor is selected.

Servers in a cyclical replication topology (multiprimary group) are interpreted as having all the servers in the group as replicas. Even from a multiprimary group only one server is selected as the overall primary.

After a primary has been selected, the monitor prefers to stick with the choice even if other potential primaries with more replica servers are available. Only if the current primary is clearly unsuitable does the monitor try to select another primary. An existing primary turns invalid if:

It is unwritable (read_only is on).
It has been down for more than failcount monitor passes and has no running replicas. Running replicas behind a downed relay count. A replica in this context is any server with at least a partially running replication connection (either io or sql thread is running). The replicas must also be down for more than failcount monitor passes to allow new master selection.
It did not previously replicate from another server in the cluster but it is now replicating.
It was previously part of a multiprimary group but is no longer, or the multiprimary group is replicating from a server not in the group.

Cases 1 and 2 cover the situations in which the DBA, an external script or even another MaxScale has modified the cluster such that the old primary can no longer act as primary. Cases 3 and 4 are less severe. In these cases the topology has changed significantly and the primary should be re-selected, although the old primary may still be the best choice.

The primary change described above is different from failover and switchover described in section A primary change only modifies the server roles inside MaxScale but does not modify the cluster other than changing the targets of read and write queries. Failover and switchover perform a primary change on their own.

As a general rule, it's best to avoid situations where the cluster has multiple standalone servers, separate primary-replica pairs or separate multiprimary groups. Due to primary invalidation rule 2, a standalone primary can easily lose the primary status to another valid primary if it goes down. The new primary probably does not have the same data as the previous one. Non-standalone primaries are less vulnerable, as a single running replica or multiprimary group member will keep the primary valid even when down.

Configuration

A minimal configuration for a monitor requires a set of servers for monitoring and a username and a password to connect to these servers.

From MaxScale 2.2.1 onwards, the module name is mariadbmon instead ofmysqlmon. The old name can still be used.

The grants required by user depend on which monitor features are used. A full list of the grants can be found in the section.

Common Monitor Settings

For a list of optional parameters that all monitors support, read the document.

Settings

These are optional parameters specific to the MariaDB Monitor. Failover, switchover and rejoin-specific parameters are listed in their own . Rebuild-related parameters are described in the . ColumnStore parameters are described in the .

`assume_unique_hostnames`

Type:
Mandatory: No
Dynamic: Yes
Default: true

When active, the monitor assumes that server hostnames and ports are consistent between the server definitions in the MaxScale configuration file and the "SHOW ALL SLAVES STATUS" outputs of the servers themselves. Specifically, the monitor assumes that if server A is replicating from server B, then A must have a replica connection with Master_Host andMaster_Port equal to B's address and port in the configuration file. If this is not the case, e.g. an IP is used in the server while a hostname is given in the file, the monitor may misinterpret the topology. The monitor attempts name resolution on the addresses if a simple string comparison does not find a match. Using exact matching addresses is, however, more reliable. In MaxScale 24.02.0, an alternative IP or hostname for a server can be given in .

This setting must be ON to use any cluster operation features such as failover or switchover, because MaxScale uses the addresses and ports in the configuration file when issuing "CHANGE MASTER TO"-commands.

If the network configuration is such that the addresses MaxScale uses to connect to backends are different from the ones the servers use to connect to each other and private_address is not used, assume_unique_hostnames should be set to OFF. In this mode, MaxScale uses server id:s it queries from the servers and the Master_Server_Id fields of the replica connections to deduce which server is replicating from which. This is not perfect though, since MaxScale doesn't know the id:s of servers it has never connected to (e.g. server has been down since MaxScale was started). Also, the Master_Server_Id-field may have an incorrect value if the replica connection has not been established. MaxScale will only trust the value if the monitor has seen the replica connection IO thread connected at least once. If this is not the case, the replica connection is ignored.

`private_address`

String. This is an optional server setting, yet documented here since it's only used by MariaDB Monitor. If not set, the normal server address setting is used.

Defines an alternative IP-address or hostname for the server for use with replication. Whenever MaxScale modifies replication (e.g. during switchover), the private address is given as Master_Host to "CHANGE MASTER TO"-commands. Also, when detecting replication, any Master_Host-values from "SHOW SLAVE STATUS"-queries are compared to the private addresses of configured servers if the normal address doesn't match.

This setting is useful if replication and application traffic are separated to different network interfaces.

`master_conditions`

Type:
Mandatory: No
Dynamic: Yes
Values: none, connecting_slave, connected_slave

Designate additional conditions for master-status, i.e. qualified for read and write queries.

Normally, if a suitable primary candidate server is found as described in , MaxScale designates it Master.master_conditions sets additional conditions for a primary server. This setting is an enum_mask, allowing multiple conditions to be set simultaneously. Conditions 2, 3 and 4 refer to replica servers. A single replica must fulfill all of the given conditions for the primary to be viable.

If the primary candidate fails master_conditions but fulfill slave_conditions, it may be designated Slave instead.

The available conditions are:

none : No additional conditions
connecting_slave : At least one immediate replica (not behind relay) is attempting to replicate or is replicating from the primary (Slave_IO_Running is 'Yes' or 'Connecting', Slave_SQL_Running is 'Yes'). A replica with incorrect replication credentials does not count. If the replica is currently down, results from the last successful monitor tick are used.
connected_slave : Same as above, with the difference that the replication connection must be up (Slave_IO_Running is 'Yes'). If the replica is currently down, results from the last successful monitor tick are used.
running_slave : Same as connecting_slave, with the addition that the replica must also be Running.

The default value of this setting is master_requirements=primary_monitor_master,disk_space_ok to ensure that both monitors use the same primary server when cooperating and that the primary is not out of disk space.

For example, to require that the primary must have a replica which is both connected and running, set

`slave_conditions`

Type:
Mandatory: No
Dynamic: Yes
Values: none, linked_master, running_master

Designate additional conditions for Slave-status, i.e qualified for read queries.

Normally, a server is Slave if it is at least attempting to replicate from the primary candidate or a relay (Slave_IO_Running is 'Yes' or 'Connecting', Slave_SQL_Running is 'Yes', valid replication credentials). The primary candidate does not necessarily need to be writable, e.g. if it fails its master_conditions. slave_conditions sets additional conditions for a replica server. This setting is an enum_mask, allowing multiple conditions to be set simultaneously.

The available conditions are:

none : No additional conditions. This is the default value.
linked_master : The replica must be connected to the primary (Slave_IO_Running and Slave_SQL_Running are 'Yes') and the primary must be Running. The same applies to any relays between the replica and the primary.
running_master : The primary must be running. Relays may be down.
writable_master : The primary must be writable, i.e. labeled Master.

For example, to require that the primary server of the cluster must be running and writable for any servers to have Slave-status, set

`failcount`

Type: number
Mandatory: No
Dynamic: Yes
Default: 5

Number of consecutive monitor passes a primary server must be down before it is considered failed. If automatic failover is enabled (auto_failover=true), it may be performed at this time. A value of 0 or 1 enables immediate failover.

If automatic failover is not possible, the monitor will try to search for another server to fulfill the primary role. See section for more details. Changing the primary may break replication as queries could be routed to a server without previous events. To prevent this, avoid having multiple valid primary servers in the cluster.

The worst-case delay between the primary failure and the start of the failover can be estimated by summing up the timeout values and monitor_interval and multiplying that by failcount:

`enforce_writable_master`

Type:
Mandatory: No
Dynamic: Yes
Default: false

If set to ON, the monitor attempts to disable the read_only-flag on the primary when seen. The flag is checked every monitor tick. The monitor user requires the SUPER-privilege for this feature to work.

Typically, the primary server should never be in read-only-mode. Such a situation may arise due to misconfiguration or accident, or perhaps if MaxScale crashed during switchover.

When this feature is enabled, setting the primary manually to read_only will no longer cause the monitor to search for another primary. The primary will instead for a moment lose its [Master]-status (no writes), until the monitor again enables writes on the primary. When starting from scratch, the monitor still prefers to select a writable server as primary if possible.

`enforce_read_only_slaves`

Type:
Mandatory: No
Dynamic: Yes
Default: false

If set to ON, the monitor attempts to enable the read_only-flag on any writable replica server. The flag is checked every monitor tick. The monitor user requires the SUPER-privilege (or READ_ONLY ADMIN) for this feature to work. While the read_only-flag is ON, only users with the SUPER-privilege (or READ_ONLY ADMIN) can write to the backend server. If temporary write access is required, this feature should be disabled before attempting to disable read_only manually. Otherwise, the monitor will quickly re-enable it.

read_only won't be enabled on the master server, even if it has lost [Master]-status due to and is marked [Slave].

`enforce_read_only_servers`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Works similar to except will set read_only on any writable server that is not the primary and not in maintenance (a superset of the servers altered by enforce_read_only_slaves).

The monitor user requires the SUPER-privilege (or READ_ONLY ADMIN) for this feature to work. If the cluster has no valid primary or primary candidate, read_only is not set on any server as it is unclear which servers should be altered.

`maintenance_on_low_disk_space`

Type:
Mandatory: No
Dynamic: Yes
Default: true

If a running server that is not the primary or a relay primary is out of disk space the server is set to maintenance mode. Such servers are not used for router sessions and are ignored when performing a failover or other cluster modification operation. See the general monitor parameters and on how to enable disk space monitoring.

Once a server has been put to maintenance mode, the disk space situation of that server is no longer updated. The server will not be taken out of maintenance mode even if more disk space becomes available. The maintenance flag must be removed manually:

`cooperative_monitoring_locks`

Type:
Mandatory: No
Dynamic: Yes
Values: none, majority_of_all, majority_of_running

Using this setting is recommended when multiple MaxScales are monitoring the same backend cluster. When enabled, the monitor attempts to acquire exclusive locks on the backend servers. The monitor considers itself the primary monitor if it has a majority of locks. The majority can be either over all configured servers or just over running servers. See for more details on how this feature works and which value to use.

Allowed values:

none Default value, no locking.
majority_of_all Primary monitor requires a majority of locks, even counting servers which are [Down].
majority_of_running Primary monitor requires a majority of locks over [Running] servers.

This setting is separate from the global MaxScale setting passive. If passive is set to true, cluster operations are disabled even if monitor has acquired the locks. Generally, it's best not to mix cooperative monitoring with passive. Either set passive=false or do not set it at all.

`servers_no_cooperative_monitoring_locks`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

This is a comma-separated list of server names that will not take part in . MaxScale will not acquire locks on these servers and the servers will not add to the number of locks required for majority. If all servers are added to the list, MaxScale cannot claim lock majority at all. This setting does not affect primary server selection (i.e. Master-role), so a server in this list may be a valid target for write queries. See for limiting primary server selection.

If a server in this list is selected as the primary server and cooperative monitoring is enabled, the monitor will (if it is the primary monitor) still acquire the maxscale_mariadbmonitor_master lock on the server to ensure other MaxScales select the same primary server.

`script_max_replication_lag`

Type: number
Mandatory: No
Dynamic: Yes
Default: -1

Defines a replication lag limit in seconds for launching the monitor script configured in the script-parameter. If the replication lag of a server goes above this limit, the script is ran with the $EVENT-placeholder replaced by "rlag_above". If the lag goes back below the limit, the script is ran again with replacement "rlag_below".

Negative values disable this feature. For more information on monitor scripts, see .

Cluster manipulation operations

MariaDB Monitor can perform several operations that modify the replication topology. The supported operations are:

, which replaces a failed primary with a replica
, which replaces a failed primary with a replica only if no data is clearly lost
, which swaps a running primary with a replica
, which swaps a running primary with a replica, ignoring most errors. Can break replication.

See for more information on the implementation of the commands.

The cluster operations require that the monitor user (user) has the following privileges:

SUPER, to modify replica connections, set globals such as read_only and kill connections from other super-users
REPLICATION CLIENT (REPLICATION SLAVE ADMIN in MariaDB Server 10.5), to list replica connections
RELOAD, to flush binary logs
PROCESS, to check if the event_scheduler process is running

A list of the grants can be found in the section.

The privilege system was changed in MariaDB Server 10.5. The effects of this on the MaxScale monitor user are minor, as the SUPER-privilege contains many of the required privileges and is still required to kill connections from other super-users.

In MariaDB Server 11.0.1 and later, SUPER no longer contains all the required grants. The monitor requires:

READ_ONLY ADMIN, to set read_only
REPLICA MONITOR and REPLICATION SLAVE ADMIN, to view and manage replication connections
RELOAD, to flush binary logs
PROCESS, to check if the event_scheduler process is running

In addition, the monitor needs to know which username and password a replica should use when starting replication. These are given inreplication_user and replication_password.

The user can define files with SQL statements which are executed on any server being demoted or promoted by cluster manipulation commands. See the sections onpromotion_sql_file and demotion_sql_file for more information.

The monitor can manipulate scheduled server events when promoting or demoting a server. See the section on handle_events for more information.

All cluster operations can be activated manually through MaxCtrl. See section for more details.

See for information on possible issues with failover and switchover.

Operation details

Failover

Failover replaces a failed primary with a running replica. It does the following:

Select the most up-to-date replica of the old primary to be the new primary. The selection criteria is as follows in descending priority:
- gtid_IO_pos (latest event in relay log)
- gtid_current_pos (most processed events)

Failover is considered successful if steps 1 to 3 succeed, as the cluster then has at least a valid primary server.

Failover-safe

Failover-safe performs the same steps as a normal failover but refuses to start if it's clear that data would be lost. Dataloss occurs if the primary had data which was not replicated to any replica before the primary went down. MaxScale detects this by looking at the GTIDs of the servers. Because the monitor queries the GTIDs only every monitor interval, this check is inaccurate. If the primary performs a write just before crashing and before MaxScale queries the GTID, data could be lost even with "safe" failover. Thus, this feature mainly protects against situations where the replicas are constantly lagging.

Switchover

Switchover swaps a running primary with a running replica. It does the following:

Prepare the old primary for demotion:
If backend_read_timeout is short, extend it and reconnect.
Stop any external replication.
Enable the read_only-flag to stop writes from normal users.

Similar to failover, switchover is considered successful if the new primary was successfully promoted.

Switchover-force

Switchover-force performs the same steps as a normal switchover but ignores any errors on the old primary. Switchover-force also does not expect the new primary to reach the gtid-position of the old, as the old primary could be receiving more events constantly. Thus, switchover-force may lose events and replication can break on multiple (or even all) replicas. This is an unsafe command and should only be used as a last resort.

Rejoin

Rejoin joins a standalone server to the cluster or redirects a replica replicating from a server other than the primary. A standalone server is joined by:

Run the commands in demotion_sql_file.
Enable the read_only-flag.
Disable scheduled server events (if event handling is on).
Start replication: CHANGE MASTER TO and START SLAVE.

A server which is replicating from the wrong primary is redirected simply with STOP SLAVE, RESET SLAVE, CHANGE MASTER TO and START SLAVE commands.

Redirect

Redirect redirects a replica to replicate from another server in the cluster. It runs the STOP REPLICA, CHANGE MASTER TO and START REPLICA commands. This command is ineffective if is on, as the monitor would quickly undo any changes.

Redirect accepts the following key-value arguments. conn_name is only required when dealing with multi-source replication.

argument

type

default

description

Reset Replication

Reset-replication (added in MaxScale 2.3.0) deletes binary logs and resets gtid:s. This destructive command is meant for situations where the gtid:s in the cluster are out of sync while the actual data is known to be in sync. The operation proceeds as follows:

Reset gtid:s and delete binary logs on all servers:
Stop (STOP SLAVE) and delete (RESET SLAVE ALL) all replica connections.
Enable the read_only-flag.
Disable scheduled server events (if event handling is on).

Scan topology

Scan-topology (added in MaxScale 25.08.0) scans the replication topology and outputs the results in json format. Topology scan begins by running SHOW ALL REPLICAS STATUS and SHOW REPLICA HOSTS on any existing monitored servers. These queries show connected primary and replica servers. The monitor then expands the search, performing the same queries on the discovered servers until no new servers can be found. This command can be useful in determining if all servers in the replication topology are configured in MaxScale and monitored.

Scan-topology accepts the following key-value arguments:

argument

type

default

description

The resulting json-object contains an array with an element for each scanned server. For each server, the host, port, server id, primary server ids and replica server ids are listed. If the server is already configured in MaxScale, its name and monitor is listed.

Discover replicas

Discover-replicas (added in MaxScale 25.08.0) scans the replication topology (as in scan-topology) and adds any new discovered servers to MaxScale and the monitor. Only servers directly replicating from the current primary server are added, i.e. any external primaries or replicas behind relays are ignored. The command can also optionally remove servers that are shut down or non-replicating.

Discover-replicas accepts the following key-value arguments:

argument

type

default

description

Any discovered servers are added to MaxScale as if created via runtime maxctrl create server .... The servers are thus similar to any other runtime configured server and are visible in the GUI and maxctrl list servers. The address and port-settings of the discovered servers are set to the values returned by SHOW REPLICA HOSTS. Other settings are copied from the current primary server, so that the discovered servers inherit e.g. TLS settings. The generated servers are named <monitor_name>-server, e.g. MyMonitor-server3.

Discover-replicas is incompatible with and will refuse to run if it is enabled.

Bootstrap

Bootstrap (added in MaxScale 25.08.0) bootstraps an empty monitor (no servers), adding servers to it. Bootstrap requires the address of a server in the cluster to start from. The monitor connects to the address given and scans the replication topology as in scan-topology. Any server successfully connected to is added to the monitor and monitored normally. Server names are auto-generated as in <monitor_name>-server, e.g. MyMonitor-server3.

Bootstrap accepts the following key-value arguments:

argument

type

default

description

The address and port-settings of the discovered servers are set to the values returned by SHOW REPLICA HOSTS or SHOW REPLICA STATUS. Other settings are copied from the server given in the template-setting, so that the discovered servers inherit e.g. TLS settings. If no server template is given, discovered servers will use . The server template must be a valid, existing server in MaxScale configuration. It need not be monitored by any monitor and its address and port-settings can point to a non-existing (but theoretically valid) network address. It can be configured in the config file or created runtime:

Only simple topologies (i.e. one primary and zero or more replicas) are supported. MaxScale refuses to attempt this command on a more complicated replication setup. This policy helps keep the effects of this command predictable, as more complicated setups may include e.g. external primaries that should not be added to the monitor.

Bootstrap is incompatible with and will refuse to run if it is enabled.

Manual activation

Cluster operations can be activated manually through the REST API or MaxCtrl. The commands are only performed when MaxScale is in active mode. The commands generally match their automatic versions. The exception is rejoin, in which the manual command allows rejoining even when the joining server has empty gtid:s. This rule allows the user to force a rejoin on a server without binary logs.

All commands require the monitor instance name as the first parameter. Failover selects the new primary server automatically and does not require additional parameters. Rejoin requires the name of the joining server as second parameter. Replication reset accepts the name of the new primary server as second parameter. If not given, the current primary is selected.

Switchover takes one to three parameters. If only the monitor name is given, switchover will autoselect both the replica to promote and the current primary as the server to be demoted. If two parameters are given, the second parameter is interpreted as the replica to promote. If three parameters are given, the third parameter is interpreted as the current primary. The user-given current primary is compared to the primary server currently deduced by the monitor and if the two are unequal, an error is given.

Example commands are below:

The commands follow the standard module command syntax. All require the monitor configuration name (MyMonitor) as the first parameter. For switchover, the last two parameters define the server to promote (NewPrimaryServ) and the server to demote (OldPrimaryServ). For rejoin, the server to join (OldPrimaryServ) is required. Replication reset requires the server to promote (NewPrimaryServ).

It is safe to perform manual operations even with automatic failover, switchover or rejoin enabled since automatic operations cannot happen simultaneously with manual ones.

When a cluster modification is initiated via the REST-API, the URL path is of the form:

<operation> is the name of the command e.g. failover, switchover, rejoin or reset-replication.
<monitor-name> is the monitor name from the MaxScale configuration file.
<server-name1> and <server-name2> are server names as described above for MaxCtrl. Only switchover accepts both, failover doesn't need any and both rejoin and reset-replication accept one.

Given a MaxScale configuration file like

with the assumption that server2 is the current primary, then the URL path for making server4 the new primary would be:

Example REST-API paths for other commands are listed below.

Queued switchover

Most cluster modification commands wait until the operation either succeeds or fails. async-switchover is an exception, as it returns immediately. Otherwise async-switchover works identical to a normal switchover command. Use the module command fetch-cmd-result to view the result of the queued command. fetch-cmd-result returns the status or result of the latest manual command, whether queued or not.

Switchover with key-value arguments

As of MaxScale 24.08.0, switchover can be launched using an alternate command syntax which passes arguments as key-value pairs. This allows for greater flexibility as a variable number of arguments can be easily defined in the call. This alternative form of switchover accepts the following arguments:

argument

type

default

description

The key-value syntax thus supports the same features as the old switchover,async-switchover and switchover-force-commands.

In addition, key-value argument passing supports old_primary_maint. This feature leaves the old primary server in maintenance mode without replication. This is useful when performing rolling MariaDB Server version upgrades. After all replicas have been upgraded, switch out the old primary withold_primary_maint=1 to promote one of the replicas while leaving the old primary as standalone.

Automatic activation

Failover can activate automatically if auto_failover is on. The activation begins when the primary has been down at least failcount monitor iterations. Before modifying the cluster, the monitor checks that all prerequisites for the failover are fulfilled. If the cluster does not seem ready, an error is printed and the cluster is rechecked during the next monitor iteration.

Switchover can also activate automatically with theswitchover_on_low_disk_space-setting. The operation begins if the primary server is low on disk space but otherwise the operating logic is quite similar to automatic failover.

Rejoin stands for starting replication on a standalone server or redirecting a replica replicating from the wrong primary (any server that is not the cluster primary). The rejoined servers are directed to replicate from the current cluster primary server, forcing the replication topology to a 1-primary-N-replicas configuration.

A server is categorized as standalone if the server has no replica connections, not even stopped ones. A server is replicating from the wrong primary if the replica IO thread is connected but the primary server id seen by the replica does not match the cluster primary id. Alternatively, the IO thread may be stopped or connecting but the primary server host or port information differs from the cluster primary info. These criteria mean that a STOP SLAVE does not yet set a replica as standalone.

With auto_rejoin active, the monitor will try to rejoin any servers matching the above requirements. Rejoin does not obey failcount and will attempt to rejoin any valid servers immediately. When activating rejoin manually, the user-designated server must fulfill the same requirements.

Limitations and requirements

Switchover and failover are meant for simple topologies (one primary and several replicas). Using these commands with complicated topologies (multiple primaries, relays, circular replication) may give unpredictable results and should be tested before use on a production system.

The server cluster is assumed to be well-behaving with no significant replication lag (within failover_timeout/switchover_timeout) and all commands that modify the cluster (such as "STOP SLAVE", "CHANGE MASTER", "START SLAVE") complete in a few seconds (faster than backend_read_timeout and backend_write_timeout).

The backends must all use GTID-based replication, and the domain id should not change during a switchover or failover. Replicas should not have extra local events so that GTIDs are compatible across the cluster.

Failover cannot be performed if MaxScale was started only after the primary server went down. This is because MaxScale needs reliable information on the gtid domain of the cluster and the replication topology in general to properly select the new primary. enforce_simple_topology=1 relaxes this requirement.

Failover may lose events. If a primary goes down before sending new events to at least one replica, those events are lost when a new primary is chosen. If the old primary comes back online, the other servers have likely moved on with a diverging history and the old primary can no longer join the replication cluster.

To reduce the chance of losing data, use In semisynchronous mode, the primary waits for a replica to receive an event before returning an acknowledgement to the client. This does not yet guarantee a clean failover. If the primary fails after preparing a transaction but before receiving replica acknowledgement, it will still commit the prepared transaction as part of its crash recovery. If the replicas never saw this transaction, the old primary has diverged from the cluster. See for more information. This situation is much less likely in MariaDB Server 10.6.2 and later, as the improved crash recovery logic will delete such transactions.

Even a controlled shutdown of the primary may lose events. The server does not by default wait for all data to be replicated to the replicas when shutting down and instead simply closes all connections. Before shutting down the primary with the intention of having a replica promoted, run switchover first to ensure that all data is replicated. For more information on server shutdown, see .

Switchover requires that the cluster is "frozen" for the duration of the operation. This means that no data modifying statements such as INSERT or UPDATE are executed and the GTID position of the primary server is stable. When switchover begins, the monitor sets the global read_only flag on the old primary backend to stop any updates. read_only does not affect users with the SUPER-privilege so any such user can issue writes during a switchover. These writes have a high chance of breaking replication, because the write may not be replicated to all replicas before they switch to the new primary. To prevent this, any users who commonly do updates should NOT have the SUPER-privilege. For even more security, the only SUPER-user session during a switchover should be the MaxScale monitor user. This also applies to users running scheduled server events. Although the monitor by default disables events on the master, an event may already be executing. If the event definer has SUPER-privilege, the event can write to the database even through read_only.

When mixing rejoin with failover/switchover, the backends should have log_slave_updates on. The rejoining server is likely lagging behind the rest of the cluster. If the current cluster primary does not have binary logs from the moment the rejoining server lost connection, the rejoining server cannot continue replication. This is an issue if the primary has changed and the new primary does not have log_slave_updates on.

If an automatic cluster operation such as auto-failover or auto-rejoin fails, all cluster modifying operations are disabled for failcount monitor iterations, after which the operation may be retried. Similar logic applies if the cluster is unsuitable for such operations, e.g. replication is not using GTID.

External primary support

The monitor detects if a server in the cluster is replicating from an external primary (a server that is not monitored by the monitor). If the replicating server is the cluster primary server, then the cluster itself is considered to have an external primary.

If a failover/switchover happens, the new primary server is set to replicate from the cluster external primary server. The username and password for the replication are defined in replication_user and replication_password. The address and port used are the ones shown by SHOW ALL SLAVES STATUS on the old cluster primary server. In the case of switchover, the old primary also stops replicating from the external server to preserve the topology.

After failover the new primary is replicating from the external primary. If the failed old primary comes back online, it is also replicating from the external server. To normalize the situation, either have auto_rejoin on or manually execute a rejoin. This will redirect the old primary to the current cluster primary.

Settings for Cluster manipulation operations

`auto_failover`

Type:
Mandatory: No
Dynamic: Yes
Values: true, on, yes

Enable automatic primary failover. true, on, yes and 1 enable normal failover. false, off, no and 0 disable the feature. safe enables .

When automatic failover is enabled, MaxScale will elect a new primary server for the cluster if the old primary goes down. A server is assumed Down if it cannot be connected to, even if this is caused by incorrect credentials. Failover triggers if the primary stays down for monitor intervals. Failover will not take place if MaxScale is set .

As failover alters replication, it requires more privileges than normal monitoring. See for a list of grants.

Failover is designed to be used with simple primary-replica topologies. More complicated topologies, such as multilayered or circular replication, are not guaranteed to always work correctly. Test before using failover with such setups.

`auto_rejoin`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Enable automatic joining of servers to the cluster. When enabled, MaxScale will attempt to direct servers to replicate from the current cluster primary if they are not currently doing so. Replication will be started on any standalone servers. Servers that are replicating from another server will be redirected. This effectively enforces a 1-primary-N-replicas topology. The current primary itself is not redirected, so it can continue to replicate from an external primary. Rejoin is also not performed on any server that is replicating from multiple sources, as this indicates a complicated topology (this rule is overridden by ).

This feature is often paired with to redirect the former primary when it comes back online. Sometimes this kind of rejoin will fail as the old primary may have transactions that were never replicated to the current one. See for more information.

As an example, consider the following series of events:

Replica A goes down
Primary goes down and a failover is performed, promoting Replica B
Replica A comes back
Old primary comes back

Replica A is still trying to replicate from the downed primary, since it wasn't online during failover. If auto_rejoin is on, Replica A will quickly be redirected to Replica B, the current primary. The old primary will also rejoin the cluster if possible.

`auto_failback_switchover`

Type:
Mandatory: No
Dynamic: Yes
Default: false

When enabled, the monitor will automatically switchover back to the original, failovered primary once it rejoins the cluster. Both and should be enabled for this feature to properly work. If these features are not enabled, failback will only activate once failover and rejoin have been performed manually.

The monitor keeps track of the failback primary separately from the current primary. The failback primary is updated if either an external topology change, , or causes a primary server change. Failover does not change the failback primary. The current failback primary is listed in monitor diagnostics. Run maxctrl show monitors and look for failback_primary.

Once the failback primary rejoins the cluster as a replica, a counter starts. The failback primary must stay online and replicate without interruption for monitor ticks. It must also catch up with the current primary server, at least to the gtid the current primary had when the failback primary rejoined. Replication delay must also be low, typically at most five seconds. Once these conditions are met, monitor runs switchover to restore the failback primary to the primary role.

The following series of events demonstrates failback switchover:

Cluster includes primary P, replicas R1 and R2.
P goes down and stays down long enough for failover to trigger. R1 is new primary.
R1 also goes down, failover triggers again. R2 is now primary and the only server left running.
R1 comes back up. Monitor rejoins it to the cluster, so that R1 replicates from R2.

`switchover_on_low_disk_space`**

Type:
Mandatory: No
Dynamic: Yes
Default: false

If enabled, the monitor will attempt to switchover a primary server low on disk space with a replica. The switch is only done if a replica without disk space issues is found. Ifmaintenance_on_low_disk_space is also enabled, the old primary (now a replica) will be put to maintenance during the next monitor iteration.

For this parameter to have any effect, disk_space_threshold must be specified for the or the . Also, must be defined for the monitor.

`enforce_simple_topology`

Type:
Mandatory: No
Dynamic: Yes
Default: false

This setting tells the monitor to assume that the servers should be arranged in a 1-primary-N-replicas topology and the monitor should try to keep it that way. Ifenforce_simple_topology is enabled, the settings assume_unique_hostnames,auto_failover and auto_rejoin are also activated regardless of their individual settings.

By default, mariadbmon will not rejoin servers with more than one replication stream configured into the cluster. Starting with MaxScale 6.2.0, whenenforce_simple_topology is enabled, all servers will be rejoined into the cluster and any extra replication sources will be removed. This is done to make automated failover with multi-source external replication possible.

This setting also allows the monitor to perform a failover to a cluster where the primary server has not been seen [Running]. This is usually the case when the primary goes down before MaxScale is started. When using this feature, the monitor will guess the GTID domain id of the primary from the replicas. For reliable results, the GTID:s of the cluster should be simple.

`replication_user`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

This and replication_password specify the username and password of the replication user. These are given as the values for MASTER_USER andMASTER_PASSWORD whenever a CHANGE MASTER TO command is executed.

Both replication_user and replication_password parameters must be defined if a custom replication user is used. If neither of the parameters is defined, theCHANGE MASTER TO-command will use the monitor credentials for the replication user.

The credentials used for replication must have the REPLICATION SLAVE privilege.

replication_password uses the same encryption scheme as other password parameters. If password encryption is in use, replication_password must be encrypted with the same key to avoid erroneous decryption.

`replication_password`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

See

`replication_master_ssl`

Type:
Mandatory: No
Dynamic: Yes
Default: false

If set to ON, any CHANGE MASTER TO-command generated will set MASTER_SSL=1 to enable encryption for the replication stream. This setting should only be enabled if the backend servers are configured for ssl. This typically means setting ssl_ca, ssl_cert and ssl_key in the server configuration file. Additionally, credentials for the replication user should require an encrypted connection (e.g. ALTER USER repl@'%' REQUIRE SSL;).

If the setting is left OFF, MASTER_SSL is not set at all, which will preserve existing settings when redirecting a replica connection.

`replication_custom_options`

Type: string

A custom string added to "CHANGE MASTER TO"-commands sent by the monitor whenever setting up replication (e.g. during switchover). Useful for defining ssl certificates or other specialized replication options. MaxScale does not check the contents of the string, so care should be taken to ensure that only valid options are set and that the contents do not interfere with the options MaxScale sets on its own (e.g. MASTER_HOST). This setting can also be configured for an individual server. If configured for both the monitor and a server, the server setting takes priority.

`failover_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 90s

Time limit for failover operation. Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

If no successful failover takes place within the configured time period, a message is logged and automatic failover is disabled. This prevents further automatic modifications to the misbehaving cluster.

`switchover_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 90s

Time limit for switchover operations. The timeout is also used as the time limit for a rejoin operation. Rejoin should rarely time out, since it is a faster operation than switchover. Note that since the granularity of the timeouts is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

`verify_master_failure`

Type:
Mandatory: No
Dynamic: Yes
Default: true

Enable additional primary failure verification for automatic failover.verify_master_failure enables this feature and defines the timeout.

Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

Failure verification is performed by checking whether the replica servers are still connected to the primary and receiving events. An event is either a change in the Gtid_IO_Pos-field of the SHOW SLAVE STATUS output or a heartbeat event. Effectively, if a replica has received an event withinmaster_failure_timeout duration, the primary is not considered down when deciding whether to failover, even if MaxScale cannot connect to the primary.master_failure_timeout should be longer than the Slave_heartbeat_period of the replica connection to be effective.

If every replica loses its connection to the primary (Slave_IO_Running is not "Yes"), primary failure is considered verified regardless of timeout. This allows faster failover when the primary properly disconnects.

For automatic failover to activate, the failcount requirement must also be met.

`master_failure_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

master_failure_timeout is specified as documented . If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

`servers_no_promotion`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

This is a comma-separated list of server names that will not be chosen for primary promotion during a failover or autoselected for switchover. This does not affect switchover if the user selects the server to promote. Using this setting can disrupt new primary selection for failover such that a non-optimal server is chosen. At worst, this will cause replication to break. Alternatively, failover may fail if all valid promotion candidates are in the exclusion list.

As of MaxScale 24.02.4 and 24.08.1, this setting also affects primary server selection during MaxScale startup or due to replication topology changes. A server listed in servers_no_promotion will thus not be selected as primary unless manually designated in a switchover-command.

`promotion_sql_file`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

This and demotion_sql_file are paths to text files with SQL statements in them. During promotion or demotion, the contents are read line-by-line and executed on the backend. Use these settings to execute custom statements on the servers to complement the built-in operations.

Empty lines or lines starting with '#' are ignored. Any results returned by the statements are ignored. All statements must succeed for the failover, switchover or rejoin to continue. The monitor user may require additional privileges and grants for the custom commands to succeed.

When promoting a replica to primary during switchover or failover, thepromotion_sql_file is read and executed on the new primary server after its read-only flag is disabled. The commands are ran before starting replication from an external primary if any.

demotion_sql_file is ran on an old primary during demotion to replica, before the old primary starts replicating from the new primary. The file is also ran before rejoining a standalone server to the cluster, as the standalone server is typically a former primary server. When redirecting a replica replicating from a wrong primary, the sql-file is not executed.

Since the queries in the files are ran during operations which modify replication topology, care is required. If promotion_sql_file contains data modification (DML) queries, the new primary server may not be able to successfully replicate from an external primary. demotion_sql_file should never contain DML queries, as these may not replicate to the replica servers before replica threads are stopped, breaking replication.

`demotion_sql_file`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

See .

`handle_events`

Type:
Mandatory: No
Dynamic: Yes
Default: true

If enabled, the monitor continuously queries the servers for enabled scheduled events and uses this information when performing cluster operations, enabling and disabling events as appropriate.

When a server is being demoted, any events with "ENABLED" status are set to "SLAVESIDE_DISABLED". When a server is being promoted to primary, events that are either "SLAVESIDE_DISABLED" or "DISABLED" are set to "ENABLED" if the same event was also enabled on the old primary server last time it was successfully queried. Events are considered identical if they have the same schema and name. When a standalone server is rejoined to the cluster, its events are also disabled since it is now a replica.

The monitor does not check whether the same events were disabled and enabled during a switchover or failover/rejoin. All events that meet the criteria above are altered.

The monitor does not enable or disable the event scheduler itself. For the events to run on the new primary server, the scheduler should be enabled by the admin. Enabling it in the server configuration file is recommended.

Events running at high frequency may cause replication to break in a failover scenario. If an old primary which was failed over restarts, its event scheduler will be on if set in the server configuration file. Its events will also remember their "ENABLED"-status and run when scheduled. This may happen before the monitor rejoins the server and disables the events. This should only be an issue for events running more often than the monitor interval or events that run immediately after the server has restarted.

`check_repl_on_stop_slave_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Enables additional checks when a STOP SLAVE command times out during a cluster manipulation operation such as failover or switchover. Normally, if STOP SLAVE times out, the monitor just tries again until time runs out. With this setting enabled, the monitor additionally checks replication connection status with SHOW ALL SLAVES STATUS. If replication has properly ended, the monitor assumes STOP SLAVE completed successfully and continues with the operation. If replication is still ongoing, the monitor prints the slave thread running states and retries STOP SLAVE.

Cooperative monitoring

As of MaxScale 2.5, MariaDB-Monitor supports cooperative monitoring. This means that multiple monitors (typically in different MaxScale instances) can monitor the same backend server cluster and only one will be the primary monitor. Only the primary monitor may perform switchover, failover or rejoin operations. The primary also decides which server is the primary. Cooperative monitoring is enabled with the -setting. Even with this setting, only one monitor per server per MaxScale is allowed. This limitation can be circumvented by defining multiple copies of a server in the configuration file.

Cooperative monitoring uses for coordinating between monitors. When cooperating, the monitor regularly checks the status of a lock named maxscale_mariadbmonitor on every server and acquires it if free. If the monitor acquires a majority of locks, it is the primary. If a monitor cannot claim majority locks, it is a secondary monitor.

The primary monitor of a cluster also acquires the lock maxscale_mariadbmonitor_master on the primary server. Secondary monitors check which server this lock is taken on and only accept that server as the primary. This arrangement is required so that multiple monitors can agree on which server is the primary regardless of replication topology. If a secondary monitor does not see the primary-lock taken, then it won't mark any server as [Master], causing writes to fail.

The lock-setting defines how many locks are required for primary status. Settingcooperative_monitoring_locks=majority_of_all means that the primary monitor needs n_servers/2 + 1 (rounded down) locks. For example, a cluster of three servers needs two locks for majority, a cluster of four needs three, and a cluster of five needs three. This scheme is resistant against split-brain situations in the sense that multiple monitors cannot be primary simultaneously. However, a split may cause both monitors to consider themselves secondary, in which case a primary server won't be detected.

Even without a network split, cooperative_monitoring_locks=majority_of_all will lead to neither monitor claiming lock majority once too many servers go down. This scenario is depicted in the image below. Only two out of four servers are running when three are needed for majority. Although both MaxScales see both running servers, neither is certain they have majority and the cluster stays in read-only mode. If the primary server is down, no failover is performed either.

Setting cooperative_monitoring_locks=majority_of_running changes the way n_servers is calculated. Instead of using the total number of servers, only servers currently [Running] are considered. This scheme adapts to multiple servers going down, ensuring that claiming lock majority is always possible. However, it can lead to multiple monitors claiming primary status in a split-brain situation. As an example, consider a cluster with servers 1 to 4 with MaxScales A and B, as in the image below. MaxScale A can connect to servers 1 and 2 (and claim their locks) but not to servers 3 and 4 due to a network split. MaxScale A thus assumes servers 3 and 4 are down. MaxScale B does the opposite, claiming servers 3 and 4 and assuming 1 and 2 are down. Both MaxScales claim two locks out of two available and assume that they have lock majority. Both MaxScales may then promote their own primaries and route writes to different servers.

The recommended strategy depends on which failure scenario is more likely and/or more destructive. If it's unlikely that multiple servers are ever down simultaneously, then majority_of_all is likely the safer choice. On the other hand, if split-brain is unlikely but multiple servers may be down simultaneously, then majority_of_running would keep the cluster operational.

To check if a monitor is primary, fetch monitor diagnostics with maxctrl show monitors or the REST API. The boolean field primary indicates whether the monitor has lock majority on the cluster. If cooperative monitoring is disabled, the field value is null. Lock information for individual servers is listed in the server-specific field lock_held. Again, null indicates that locks are not in use or the lock status is unknown.

If a MaxScale instance tries to acquire the locks but fails to get majority (perhaps another MaxScale was acquiring locks simultaneously) it will release any acquired locks and try again after a random number of monitor ticks. This prevents multiple MaxScales from fighting over the locks continuously as one MaxScale will eventually wait less time than the others. Conflict probability can be further decreased by configuring each monitor with a different monitor_interval.

The flowchart below illustrates the lock handling logic.

Releasing locks

Monitor cooperation depends on the server locks. The locks are connection-specific. The owning connection can manually release a lock, allowing another connection to claim it. Also, if the owning connection closes, the MariaDB Server process releases the lock. How quickly a lost connection is detected affects how quickly the primary monitor status moves from one monitor and MaxScale to another.

If the primary MaxScale or its monitor is stopped normally, the monitor connections are properly closed, releasing the locks. This allows the secondary MaxScale to quickly claim the locks. However, if the primary simply vanishes (broken network), the connection may just look idle. In this case, the MariaDB Server may take a long time before it considers the monitor connection lost. This time ultimately depends on TCP keepalive settings on the machines running MariaDB Server.

On MariaDB Server 10.3.3 and later, the TCP keepalive settings can be configured for just the server process. See for information on settings tcp_keepalive_interval, tcp_keepalive_probes and tcp_keepalive_time. These settings can also be set on the operating system level, as described .

As of MaxScale 6.4.16, 22.08.13, 23.02.10, 23.08.6 and 24.02.2, configuring TCP keepalive is no longer necessary as monitor sets the session wait_timeout variable when acquiring a lock. This causes the MariaDB Server to close the monitor connection if the connection appears idle for too long. The value of wait_timeout used depends on the monitor interval and connection timeout settings, and is logged at MaxScale startup.

A monitor can also be ordered to manually release its locks via the module command release-locks. This is useful for manually changing the primary monitor. After running the release-command, the monitor will not attempt to reacquire the locks for one minute, even if it wasn't the primary monitor to begin with. This command can cause the cluster to become temporarily unusable by MaxScale. Only use it when there is another monitor ready to claim the locks.

Primary server write test

Some backend failures are not observable just by connecting to the server and running standard monitor queries. A server may be connectable and respond to queries sent by the monitor, but its disk could be full or malfunctioning or the storage engine could be locked in some way. Normally, MariaDB Monitor would consider such a server to be in good health, even if in reality the server could not perform any writes.

To detect such errors, MariaDB Monitor can be configured to perform a regular write test if the gtid_binlog_pos of the primary server is not advancing otherwise. Testing that writes are going through and are being saved to the binary log increases the chance of detecting storage failures. The monitor can also be configured to perform a failover if the primary server fails the write test. Even this test may miss storage issues, as the monitor write test performs a small insert that may go through even when a large write done by a real application does not.

See the following configuration parameters for more information on how to configure this feature.

Settings for Primary server write test

`write_test_interval`

Type:
Dynamic: Yes
Default: 0s

If enabled (value > 0s), the monitor will perform a write test on the primary server if its gtid_binlog_pos has not changed within the configured interval. This test inserts one row to the table configured in . If the insert fails or does not complete within , the server fails the write test. What happens after that depends on .

`write_test_table`

Type: string
Dynamic: Yes
Default: mxs.maxscale_write_test

The write test target table. The table name should be fully qualified i.e. include the database name. If the table does not exist or does not contain expected columns, the monitor (re)creates it. The table is created with a query like

The database must be created manually. The monitor user requires privileges to create, drop, read and manipulate the table:

`write_test_fail_action`

Type:
Default: log
Values: log, failover

Which action to take if primary server fails the write test. log means that MaxScale will simply log the failure but perform no other action. This is mainly useful for testing the feature.

If set to failover, the monitor will perform a failover if the primary server fails the write test consecutive times. That is, the first write test is performed after write_test_interval has passed without writes. If the test fails, the monitor will repeat the test during the next monitor tick. After failcount monitor ticks with failed write tests, failover begins. After failover, the former primary server is set into maintenance mode. Manual intervention is required to take the server into use again.

Backup operations

Backup operations manipulate the contents of a MariaDB Server, saving it or overwriting it. MariaDB-Monitor supports three backup operations:

rebuild-server: Replace the contents of a database server with the contents of another.
create-backup: Copy the contents of a database server to a storage location.
restore-from-backup: Overwrite the contents of a database server with a backup.

These operations do not modify server config files, only files in the data directory (typically /var/lib/mysql) are affected.

All of these operations are monitor commands and best launched with MaxCtrl. The operations are asynchronous, which means MaxCtrl won't wait for the operation to complete and instead immediately returns "OK". To see the current status of an operation, either check MaxScale log or use the fetch-cmd-result-command (e.g. maxctrl call command mariadbmon fetch-cmd-result MyMonitor).

To perform backup operations, MaxScale requires ssh-access on all affected machines. The ssh_user and ssh_keyfile-settings define the SSH credentials MaxScale uses to access the servers. MaxScale must be able to run commands with sudo on both the source and target servers. See and below for more information.

The following tools need to be installed on the backends:

mariadb-backup. Backs up and restores MariaDB Server contents. Installed e.g. with yum install MariaDB-backup. See for more information.
pigz. Compresses and decompresses the backup stream. Installed e.g. withyum install pigz.
socat. Streams data from one machine to another. Is likely already installed. If not, can be installed e.g. with yum install socat.

mariadb-backup needs server credentials to log in and authenticate to the MariaDB Server being copied from. For this, MaxScale uses the monitor user. The monitor user may thus require additional privileges. See for more details.

Rebuild server

The rebuild server-operation replaces the contents of a database server with the contents of another server. The source server is effectively cloned and all data on the target server is lost. This is useful when a replica server has diverged from the primary server, or when adding a new server to the cluster. MaxScale performs this operation by running mariadb-backup on both the source and target servers.

When launched, the rebuild operation proceeds as below. If any step fails, the operation is stopped and the target server will be left in an unspecified state.

Log in to both servers with ssh and check that the tools listed above are present (e.g. mariadb-backup -v should succeed).
Check that the port used for transferring the backup is free on the source server. If not, kill the process holding it. This requires running lsof and kill.
Test the connection by streaming a short message from the source host to the target.
Launch mariadb-backup on the source machine, compress the stream and listen for an incoming connection. This is performed with a command like

The rebuild-operation is a monitor module command and takes four arguments:

Monitor name, e.g. MyMonitor.
Target server name, e.g. MyTargetServer.
Source server name, e.g. MySourceServer. This parameter is optional. If not specified, the monitor prefers to autoselect an up-to-date replica server to avoid increasing load on the primary server. Due to the--safe-slave-backup-option, the replica will stop replicating until the backup data has been transferred.
Data directory on target server. This parameter is optional. If not specified, the monitor will ask the target server. If target server is not running, monitor will assume /var/lib/mysql. Thus, this only needs to be defined with non-standard directory setups.

The following example rebuilds MyTargetServer with contents of MySourceServer.

The following example uses a custom data directory on the target.

The operation does not launch if the target server is already replicating or if the source server is not a primary or replica.

Steps 6 and 8 can take a long time depending on the size of the database and if writes are ongoing. During these steps, the monitor will continue monitoring the cluster normally. After each monitor tick the monitor checks if the rebuild-operation can proceed. No other monitor operations, either manual or automatic, can run until the rebuild completes.

Rebuild server with key-value arguments

In addition to traditional argument passing, rebuild-server supports key-value arguments. The supported arguments are:

argument

type

default

description

The dry_run-argument causes the monitor to only check if preconditions for rebuild are met on the source and target servers. It checks that SSH- connections can be established and that required tools are present on the servers. No permanent changes are done.

The following example checks if MyServer1 can be rebuilt from the contents of MyServer2.

Create backup

The create backup-operation copies the contents of a database server to the backup storage. The source server is not modified but may slow down during backup creation. MaxScale performs this operation by running mariadb-backup on both the source and storage servers. The storage location is defined by the backup_storage_address and backup_storage_path settings. Normal ssh-settings are used to access the storage server. The backup storage machine does not need to have a MariaDB Server installed.

Backup creation runs somewhat similar to rebuild-server. The main difference is that the backup data is simply saved to a directory and not prepared or used to start a MariaDB Server. If any step fails, the operation is stopped and the backup storage directory will be left in an unspecified state.

Init. See rebuild-server.
Check listen port on backup storage machine. See rebuild-server.
Check that the backup storage main directory exists. Check that it does not contain a backup with the same name as the one being created. Create the final backup directory.
Test the connection by streaming a short message from the source host to the backup storage.

Backup creation is a monitor module command and takes three arguments: the monitor name, source server name and backup name. Backup name defines the subdirectory where the backup is saved and should be a valid directory name. The command

would save the backup of MySourceServer to <backup_storage_path>/wednesday_161122 on the host defined in backup_storage_address. ssh_user needs to have read and write access to the main storage directory. The source server must be a primary or replica.

Similar to rebuild-server, the monitor will continue monitoring the servers while the backup is transferred.

Create backup with key-value arguments

In addition to traditional argument passing, create-backup supports key-value arguments. The supported arguments are:

argument

type

default

description

The dry_run-argument causes the monitor to only check if preconditions for backup creation are met on the source server and backup storage. It checks that SSH-connections can be established and that required tools are present. The backup storage must also not yet have a backup with the given name. No permanent changes are done.

The following example checks that MyServer1 can be backed up.

Restore from backup

The restore-operation is the reverse of create-backup. It overwrites the contents of an existing MariaDB Server with a backup from the backup storage. The backup is not removed and can be used again. MaxScale performs this operation by transferring the backup contents as a tar archive and overwriting the target server data directory. The backup storage is defined in monitor settings similar to create-backup.

The restore-operation runs somewhat similar to rebuild-server. The main difference is that the backup data is copied with tar instead of mariadb-backup. If any step fails, the operation is stopped and the target server will be left in an unspecified state.

Init. See rebuild-server.
Check listen port on target machine. See rebuild-server.
Check that the backup storage main directory exists and that it contains a backup with the name requested.
Test the connection by streaming a short message from the backup storage to the target machine.

Server restoration is a monitor module command and takes four arguments.

Monitor name, e.g. MyMonitor.
Target server name, e.g. MyNewServer.
Backup name. This parameter defines the subdirectory where the backup is read from and should be an existing directory on the backup storage host.
Data directory on target server. This parameter is optional. If not specified, the monitor will ask the target server. If target server is not running, monitor will assume /var/lib/mysql. Thus, this only needs to be defined with non-standard directory setups.

The command

would erase the contents of MyTargetServer and replace them with the backup contained in <backup_storage_path>/wednesday_161122 on the host defined in backup_storage_address. ssh_user needs to have read access to the main storage directory and the backup. The target server must not be a primary or replica.

The following example uses a custom data directory on the target.

Similar to rebuild-server, the monitor will continue monitoring the servers while the backup is transferred and prepared.

Restore from backup with key-value arguments

In addition to traditional argument passing, restore-from-backup supports key-value arguments. The supported arguments are:

argument

type

default

description

The dry_run-argument causes the monitor to only check if preconditions for restore are met on the target server and backup storage. It checks that SSH-connections can be established and that required tools are present. The backup storage must also have a backup with the given name. No permanent changes are done.

The following example checks if MyServer1 can be restored.

call command mariadbmon async-create-backup monitor=MariaDB-Monitor source=server1 dry_run=true bu_name=bu1

List backups

The list-backups command lists currently available backups in the backup storage.

Settings for Backup operations

`ssh_user`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

Ssh username. Used when logging in to backend servers to run commands.

`ssh_keyfile`

Type: path
Mandatory: No
Dynamic: Yes
Default: None

Path to file with an ssh private key. Used when logging in to backend servers to run commands.

`ssh_check_host_key`

Type:
Mandatory: No
Dynamic: Yes
Default: true

Boolean, default: true. When logging in to backends, require that the server is already listed in the known_hosts-file of the user running MaxScale.

`ssh_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

The rebuild operation consists of multiple ssh commands. Most of the commands are assumed to complete quickly. If these commands take more than ssh_timeout to complete, the operation fails. Adjust this setting if rebuild fails due to ssh commands timing out. This setting does not affect steps 5 and 6, as these are assumed to take significant time.

`ssh_port`

Type: number
Mandatory: No
Dynamic: Yes
Default: 22

SSH port. Used for running remote commands on servers.

`rebuild_port`

Type: number
Mandatory: No
Dynamic: Yes
Default: 4444

The port which the source server listens on for a connection. The port must not be blocked by a firewall or listened on by any other program. If another process is listening on the port when rebuild is starting, MaxScale will attempt to kill the process.

`mariadb_backup_use_memory`

Type: string
Mandatory: No
Dynamic: Yes
Default: 1G

Given as is tomariadb-backup --prepare --use-memory=<mariadb_backup_use_memory>. If set to empty, no --use-memory is set and mariadb-backup will use its internal default. See for more information.

Starting with MaxScale 24.02.7, the old name mariabackup_use_memory has been deprecated and replaced with mariadb_backup_use_memory. The old name is valid and will continue working as an alias.

`mariadb_backup_parallel`

Type: number
Mandatory: No
Dynamic: Yes
Default: 1

Given as is tomariadb-backup --backup --parallel=<val>. Defines the number of threads used for parallel data file transfer. See for more information.

Starting with MaxScale 24.02.7, the old name mariabackup_parallel has been deprecated and replaced with mariadb_backup_parallel. The old name is valid and will continue working as an alias.

`backup_storage_address`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

Address of the backup storage. Does not need to have MariaDB Server running or be monitored by the monitor. Connected to with ssh. Must have enough disk space to store all backups.

`backup_storage_path`

Type: path
Mandatory: No
Dynamic: Yes
Default: None

Path to main backup storage directory on backup storage host. ssh_user needs to have full access to this directory to save and read backups.

sudoers.d configuration

If giving MaxScale general sudo-access is out of the question, MaxScale must be allowed to run the specific commands required by the backup operations. This can be achieved by creating a file with the commands in the/etc/sudoers.d-directory. In the example below, the user johnny is given the power to run commands as root. The contents of the file may need to be tweaked due to changes in install locations.

ColumnStore commands

Since MaxScale version 22.08, MariaDB Monitor can run ColumnStore administrative commands against a ColumnStore cluster. The commands interact with the ColumnStore REST-API present in recent ColumnStore versions and have been tested with MariaDB-Server 10.6 running the ColumnStore plugin version 6.2. None of the commands affect monitor configuration or replication topology. MariaDB Monitor simply relays the commands to the backend cluster.

MariaDB Monitor can fetch cluster status, add and remove nodes, start and stop the cluster, and set cluster read-only or readwrite. MaxScale only communicates with the first server in the servers-list.

Most of the commands are asynchronous, i.e. they do not wait for the operation to complete on the ColumnStore backend before returning to the command prompt. MariaDB Monitor itself, however, runs the command in the background and does not perform normal monitoring until the operation completes or fails. After an operation has started the user should use fetch-cmd-result to check its status. The examples below show how to run the commands using MaxCtrl. If a command takes a timeout-parameter, the timeout can be given in seconds (s), minutes (m) or hours (h).

ColumnStore command settings are listed . At leastcs_admin_api_key must be set.

Get status

Fetch cluster status. Returns the result as is. Status fetching has an automatic timeout of ten seconds.

Examples:

Add or remove node

Add or remove a node to/from the ColumnStore cluster.

<node-host> is the hostname or IP of the node being added or removed.

Examples:

Start and stop cluster

Examples:

Set read-only or readwrite

Examples:

Settings for Columnstore commands

`cs_admin_port`

Numeric, default: 8640. The REST-API port on the ColumnStore nodes. All nodes are assumed to listen on the same port.

`cs_admin_api_key`

String. The API-key MaxScale sends to the ColumnStore nodes when making a REST-API request. Should match the value configured on the ColumnStore nodes.

`cs_admin_base_path`

String, default: /cmapi/0.4.0. Base path sent with the REST-API request.

Other commands

`fetch-cmd-result`

Fetches the result of the last manual command. Requires monitor name as parameter. Most commands only return a generic success message or an error description. ColumnStore commands may return more data. Scheduling another command clears a stored result.

`fetch-cmd-status`

Fetches the detailed status of the currently running manual command. If no command is running, fetches status of the last completed manual command. The results are returned as a json array, with each array element containing information on one phase of a manual command. Only backup operations currently support this detailed status information.

Fetch-cmd-status is meant for internal use and the data it returns is subject to change. It accepts the following key-value argument:

argument

type

default

description

`cancel-cmd`

Cancels the latest operation, whether manual or automatic, if possible. Requires monitor name as parameter. A scheduled manual command is simply canceled before it can run. If a command is already running, it stops as soon as possible. The cancel-cmd itself does not wait for a running operation to stop. Use fetch-cmd-result or check the log to see if the operation has truly completed. Canceling is most useful for stopping a stalled rebuild operation.

Troubleshooting

Failover/switchover fails

See the .

Before performing failover or switchover, the monitor checks that prerequisites are fulfilled, printing any errors and warnings found. This should catch and explain most issues with failover or switchover not working. If the operations are attempted and still fail, then most likely one of the commands the monitor issued to a server failed or timed out. The log should explain which query failed.

A typical failure reason is that a command such as STOP SLAVE takes longer than thebackend_read_timeout of the monitor, causing the connection to break. As of 2.3, the monitor will retry most such queries if the failure was caused by a timeout. The retrying continues until the total time for a failover or switchover has been spent. If the log shows warnings or errors about commands timing out, increasing the backend timeout settings of the monitor should help. Other settings to look at are query_retries and query_retry_timeout. These are general MaxScale settings described in the . Setting query_retries to 2 is a reasonable first try.

If switchover causes the old primary (now replica) to fail replication, then most likely a user or perhaps a scheduled event performed a write while monitor had set read_only=1. This is possible if the user performing the write has "SUPER" or "READ_ONLY ADMIN" privileges. The switchover-operation tries to kick out SUPER-users but this is not certain to succeed. Remove these privileges from any users that regularly do writes to prevent them from interfering with switchover.

The server configuration files should have log-slave-updates=1 to ensure that a newly promoted primary has binary logs of previous events. This allows the new primary to replicate past events to any lagging replicas.

To print out all queries sent to the servers, start MaxScale with--debug=enable-statement-logging. This setting prints all queries sent to the backends by monitors and authenticators. The printed queries may include usernames and passwords.

Replica detection shows external primaries

If a replica is shown in maxctrl as "Slave of External Server" instead of "Slave", the reason is likely that the "Master_Host"-setting of the replication connection does not match the MaxScale server definition. As of 2.3.2, the MariaDB Monitor by default assumes that the replica connections (as shown by SHOW ALL SLAVES STATUS) use the exact same "Master_Host" as used the MaxScale configuration file server definitions. This is controlled by the setting .

Using the MariaDB Monitor With Binlogrouter

Since MaxScale 2.2 it's possible to detect a replication setup which includes Binlog Server: the required action is to add the binlog server to the list of servers only if master_id identity is set.

_{This page is licensed: CC BY-SA / Gnu FDL}

MariaDB Monitor

Manage primary-replica clusters with the mariadbmon module. Learn to configure automatic failover, perform switchovers, and monitor replication lag to maintain database availability.

Overview

Required Grants

The monitor user requires the following grant:

If the monitor needs to query server disk space (for instance, disk_space_threshold is set), it needs the FILE privilege:

The CONNECTION ADMIN privilege is recommended since it allows the monitor to log in even if server connection limit has been reached.

, and require the following privilege:

% tabs %}

Cluster Manipulation Grants

If is enabled, the monitor requires the EVENT privilege. SHOW DATABASES is also recommended to ensure monitor can see events for all databases.

If a separate replication user is defined (with replication_user andreplication_password), it requires the following grant:

Primary selection

It is unwritable (read_only is on).
It has been down for more than failcount monitor passes and has no running replicas. Running replicas behind a downed relay count. A replica in this context is any server with at least a partially running replication connection (either io or sql thread is running). The replicas must also be down for more than failcount monitor passes to allow new master selection.
It did not previously replicate from another server in the cluster but it is now replicating.
It was previously part of a multiprimary group but is no longer, or the multiprimary group is replicating from a server not in the group.

Configuration

A minimal configuration for a monitor requires a set of servers for monitoring and a username and a password to connect to these servers.

From MaxScale 2.2.1 onwards, the module name is mariadbmon instead ofmysqlmon. The old name can still be used.

The grants required by user depend on which monitor features are used. A full list of the grants can be found in the section.

Common Monitor Settings

For a list of optional parameters that all monitors support, read the document.

Settings

`assume_unique_hostnames`

Type:
Mandatory: No
Dynamic: Yes
Default: true

`private_address`

String. This is an optional server setting, yet documented here since it's only used by MariaDB Monitor. If not set, the normal server address setting is used.

This setting is useful if replication and application traffic are separated to different network interfaces.

`master_conditions`

Type:
Mandatory: No
Dynamic: Yes
Values: none, connecting_slave, connected_slave

Designate additional conditions for master-status, i.e. qualified for read and write queries.

If the primary candidate fails master_conditions but fulfill slave_conditions, it may be designated Slave instead.

The available conditions are:

none : No additional conditions
connecting_slave : At least one immediate replica (not behind relay) is attempting to replicate or is replicating from the primary (Slave_IO_Running is 'Yes' or 'Connecting', Slave_SQL_Running is 'Yes'). A replica with incorrect replication credentials does not count. If the replica is currently down, results from the last successful monitor tick are used.
connected_slave : Same as above, with the difference that the replication connection must be up (Slave_IO_Running is 'Yes'). If the replica is currently down, results from the last successful monitor tick are used.
running_slave : Same as connecting_slave, with the addition that the replica must also be Running.

For example, to require that the primary must have a replica which is both connected and running, set

`slave_conditions`

Type:
Mandatory: No
Dynamic: Yes
Values: none, linked_master, running_master

Designate additional conditions for Slave-status, i.e qualified for read queries.

The available conditions are:

none : No additional conditions. This is the default value.
linked_master : The replica must be connected to the primary (Slave_IO_Running and Slave_SQL_Running are 'Yes') and the primary must be Running. The same applies to any relays between the replica and the primary.
running_master : The primary must be running. Relays may be down.
writable_master : The primary must be writable, i.e. labeled Master.

For example, to require that the primary server of the cluster must be running and writable for any servers to have Slave-status, set

`failcount`

Type: number
Mandatory: No
Dynamic: Yes
Default: 5

The worst-case delay between the primary failure and the start of the failover can be estimated by summing up the timeout values and monitor_interval and multiplying that by failcount:

`enforce_writable_master`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Typically, the primary server should never be in read-only-mode. Such a situation may arise due to misconfiguration or accident, or perhaps if MaxScale crashed during switchover.

`enforce_read_only_slaves`

Type:
Mandatory: No
Dynamic: Yes
Default: false

read_only won't be enabled on the master server, even if it has lost [Master]-status due to and is marked [Slave].

`enforce_read_only_servers`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Works similar to except will set read_only on any writable server that is not the primary and not in maintenance (a superset of the servers altered by enforce_read_only_slaves).

`maintenance_on_low_disk_space`

Type:
Mandatory: No
Dynamic: Yes
Default: true

`cooperative_monitoring_locks`

Type:
Mandatory: No
Dynamic: Yes
Values: none, majority_of_all, majority_of_running

Allowed values:

none Default value, no locking.
majority_of_all Primary monitor requires a majority of locks, even counting servers which are [Down].
majority_of_running Primary monitor requires a majority of locks over [Running] servers.

`servers_no_cooperative_monitoring_locks`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

`script_max_replication_lag`

Type: number
Mandatory: No
Dynamic: Yes
Default: -1

Negative values disable this feature. For more information on monitor scripts, see .

Cluster manipulation operations

MariaDB Monitor can perform several operations that modify the replication topology. The supported operations are:

, which replaces a failed primary with a replica
, which replaces a failed primary with a replica only if no data is clearly lost
, which swaps a running primary with a replica
, which swaps a running primary with a replica, ignoring most errors. Can break replication.

See for more information on the implementation of the commands.

The cluster operations require that the monitor user (user) has the following privileges:

SUPER, to modify replica connections, set globals such as read_only and kill connections from other super-users
REPLICATION CLIENT (REPLICATION SLAVE ADMIN in MariaDB Server 10.5), to list replica connections
RELOAD, to flush binary logs
PROCESS, to check if the event_scheduler process is running

A list of the grants can be found in the section.

In MariaDB Server 11.0.1 and later, SUPER no longer contains all the required grants. The monitor requires:

READ_ONLY ADMIN, to set read_only
REPLICA MONITOR and REPLICATION SLAVE ADMIN, to view and manage replication connections
RELOAD, to flush binary logs
PROCESS, to check if the event_scheduler process is running

In addition, the monitor needs to know which username and password a replica should use when starting replication. These are given inreplication_user and replication_password.

The monitor can manipulate scheduled server events when promoting or demoting a server. See the section on handle_events for more information.

All cluster operations can be activated manually through MaxCtrl. See section for more details.

See for information on possible issues with failover and switchover.

Operation details

Failover

Failover replaces a failed primary with a running replica. It does the following:

Select the most up-to-date replica of the old primary to be the new primary. The selection criteria is as follows in descending priority:
- gtid_IO_pos (latest event in relay log)
- gtid_current_pos (most processed events)

Failover is considered successful if steps 1 to 3 succeed, as the cluster then has at least a valid primary server.

Failover-safe

Switchover

Switchover swaps a running primary with a running replica. It does the following:

Prepare the old primary for demotion:
If backend_read_timeout is short, extend it and reconnect.
Stop any external replication.
Enable the read_only-flag to stop writes from normal users.

Similar to failover, switchover is considered successful if the new primary was successfully promoted.

Switchover-force

Rejoin

Rejoin joins a standalone server to the cluster or redirects a replica replicating from a server other than the primary. A standalone server is joined by:

Run the commands in demotion_sql_file.
Enable the read_only-flag.
Disable scheduled server events (if event handling is on).
Start replication: CHANGE MASTER TO and START SLAVE.

A server which is replicating from the wrong primary is redirected simply with STOP SLAVE, RESET SLAVE, CHANGE MASTER TO and START SLAVE commands.

Redirect

Redirect accepts the following key-value arguments. conn_name is only required when dealing with multi-source replication.

argument

type

default

description

Reset Replication

Reset gtid:s and delete binary logs on all servers:
Stop (STOP SLAVE) and delete (RESET SLAVE ALL) all replica connections.
Enable the read_only-flag.
Disable scheduled server events (if event handling is on).

Scan topology

Scan-topology accepts the following key-value arguments:

argument

type

default

description

Discover replicas

Discover-replicas accepts the following key-value arguments:

argument

type

default

description

Any discovered servers are added to MaxScale as if created via runtime maxctrl create server .... The servers are thus similar to any other runtime configured server and are visible in the GUI and maxctrl list servers. The address and port-settings of the discovered servers are set to the values returned by SHOW REPLICA HOSTS. Other settings are copied from the current primary server, so that the discovered servers inherit e.g. TLS settings. The generated servers are named <monitor_name>-server, e.g. MyMonitor-server3.

Discover-replicas is incompatible with and will refuse to run if it is enabled.

Bootstrap

Bootstrap accepts the following key-value arguments:

argument

type

default

description

Bootstrap is incompatible with and will refuse to run if it is enabled.

Manual activation

Example commands are below:

It is safe to perform manual operations even with automatic failover, switchover or rejoin enabled since automatic operations cannot happen simultaneously with manual ones.

When a cluster modification is initiated via the REST-API, the URL path is of the form:

<operation> is the name of the command e.g. failover, switchover, rejoin or reset-replication.
<monitor-name> is the monitor name from the MaxScale configuration file.
<server-name1> and <server-name2> are server names as described above for MaxCtrl. Only switchover accepts both, failover doesn't need any and both rejoin and reset-replication accept one.

Given a MaxScale configuration file like

with the assumption that server2 is the current primary, then the URL path for making server4 the new primary would be:

Example REST-API paths for other commands are listed below.

Queued switchover

Switchover with key-value arguments

argument

type

default

description

The key-value syntax thus supports the same features as the old switchover,async-switchover and switchover-force-commands.

Automatic activation

Limitations and requirements

External primary support

Settings for Cluster manipulation operations

`auto_failover`

Type:
Mandatory: No
Dynamic: Yes
Values: true, on, yes

Enable automatic primary failover. true, on, yes and 1 enable normal failover. false, off, no and 0 disable the feature. safe enables .

As failover alters replication, it requires more privileges than normal monitoring. See for a list of grants.

`auto_rejoin`

Type:
Mandatory: No
Dynamic: Yes
Default: false

As an example, consider the following series of events:

Replica A goes down
Primary goes down and a failover is performed, promoting Replica B
Replica A comes back
Old primary comes back

`auto_failback_switchover`

Type:
Mandatory: No
Dynamic: Yes
Default: false

The following series of events demonstrates failback switchover:

Cluster includes primary P, replicas R1 and R2.
P goes down and stays down long enough for failover to trigger. R1 is new primary.
R1 also goes down, failover triggers again. R2 is now primary and the only server left running.
R1 comes back up. Monitor rejoins it to the cluster, so that R1 replicates from R2.

`switchover_on_low_disk_space`**

Type:
Mandatory: No
Dynamic: Yes
Default: false

For this parameter to have any effect, disk_space_threshold must be specified for the or the . Also, must be defined for the monitor.

`enforce_simple_topology`

Type:
Mandatory: No
Dynamic: Yes
Default: false

`replication_user`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

The credentials used for replication must have the REPLICATION SLAVE privilege.

`replication_password`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

See

`replication_master_ssl`

Type:
Mandatory: No
Dynamic: Yes
Default: false

If the setting is left OFF, MASTER_SSL is not set at all, which will preserve existing settings when redirecting a replica connection.

`replication_custom_options`

Type: string

`failover_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 90s

Time limit for failover operation. Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

`switchover_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 90s

`verify_master_failure`

Type:
Mandatory: No
Dynamic: Yes
Default: true

Enable additional primary failure verification for automatic failover.verify_master_failure enables this feature and defines the timeout.

Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

For automatic failover to activate, the failcount requirement must also be met.

`master_failure_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

`servers_no_promotion`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

`promotion_sql_file`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

`demotion_sql_file`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

See .

`handle_events`

Type:
Mandatory: No
Dynamic: Yes
Default: true

If enabled, the monitor continuously queries the servers for enabled scheduled events and uses this information when performing cluster operations, enabling and disabling events as appropriate.

The monitor does not check whether the same events were disabled and enabled during a switchover or failover/rejoin. All events that meet the criteria above are altered.

`check_repl_on_stop_slave_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Cooperative monitoring

The flowchart below illustrates the lock handling logic.

Releasing locks

Primary server write test

See the following configuration parameters for more information on how to configure this feature.

Settings for Primary server write test

`write_test_interval`

Type:
Dynamic: Yes
Default: 0s

`write_test_table`

Type: string
Dynamic: Yes
Default: mxs.maxscale_write_test

The database must be created manually. The monitor user requires privileges to create, drop, read and manipulate the table:

`write_test_fail_action`

Type:
Default: log
Values: log, failover

Which action to take if primary server fails the write test. log means that MaxScale will simply log the failure but perform no other action. This is mainly useful for testing the feature.

Backup operations

Backup operations manipulate the contents of a MariaDB Server, saving it or overwriting it. MariaDB-Monitor supports three backup operations:

rebuild-server: Replace the contents of a database server with the contents of another.
create-backup: Copy the contents of a database server to a storage location.
restore-from-backup: Overwrite the contents of a database server with a backup.

These operations do not modify server config files, only files in the data directory (typically /var/lib/mysql) are affected.

The following tools need to be installed on the backends:

mariadb-backup. Backs up and restores MariaDB Server contents. Installed e.g. with yum install MariaDB-backup. See for more information.
pigz. Compresses and decompresses the backup stream. Installed e.g. withyum install pigz.
socat. Streams data from one machine to another. Is likely already installed. If not, can be installed e.g. with yum install socat.

Rebuild server

When launched, the rebuild operation proceeds as below. If any step fails, the operation is stopped and the target server will be left in an unspecified state.

Log in to both servers with ssh and check that the tools listed above are present (e.g. mariadb-backup -v should succeed).
Check that the port used for transferring the backup is free on the source server. If not, kill the process holding it. This requires running lsof and kill.
Test the connection by streaming a short message from the source host to the target.
Launch mariadb-backup on the source machine, compress the stream and listen for an incoming connection. This is performed with a command like

The rebuild-operation is a monitor module command and takes four arguments:

Monitor name, e.g. MyMonitor.
Target server name, e.g. MyTargetServer.
Source server name, e.g. MySourceServer. This parameter is optional. If not specified, the monitor prefers to autoselect an up-to-date replica server to avoid increasing load on the primary server. Due to the--safe-slave-backup-option, the replica will stop replicating until the backup data has been transferred.
Data directory on target server. This parameter is optional. If not specified, the monitor will ask the target server. If target server is not running, monitor will assume /var/lib/mysql. Thus, this only needs to be defined with non-standard directory setups.

The following example rebuilds MyTargetServer with contents of MySourceServer.

The following example uses a custom data directory on the target.

The operation does not launch if the target server is already replicating or if the source server is not a primary or replica.

Rebuild server with key-value arguments

In addition to traditional argument passing, rebuild-server supports key-value arguments. The supported arguments are:

argument

type

default

description

The following example checks if MyServer1 can be rebuilt from the contents of MyServer2.

Create backup

Init. See rebuild-server.
Check listen port on backup storage machine. See rebuild-server.
Check that the backup storage main directory exists. Check that it does not contain a backup with the same name as the one being created. Create the final backup directory.
Test the connection by streaming a short message from the source host to the backup storage.

Similar to rebuild-server, the monitor will continue monitoring the servers while the backup is transferred.

Create backup with key-value arguments

In addition to traditional argument passing, create-backup supports key-value arguments. The supported arguments are:

argument

type

default

description

The following example checks that MyServer1 can be backed up.

Restore from backup

Init. See rebuild-server.
Check listen port on target machine. See rebuild-server.
Check that the backup storage main directory exists and that it contains a backup with the name requested.
Test the connection by streaming a short message from the backup storage to the target machine.

Server restoration is a monitor module command and takes four arguments.

Monitor name, e.g. MyMonitor.
Target server name, e.g. MyNewServer.
Backup name. This parameter defines the subdirectory where the backup is read from and should be an existing directory on the backup storage host.
Data directory on target server. This parameter is optional. If not specified, the monitor will ask the target server. If target server is not running, monitor will assume /var/lib/mysql. Thus, this only needs to be defined with non-standard directory setups.

The command

The following example uses a custom data directory on the target.

Similar to rebuild-server, the monitor will continue monitoring the servers while the backup is transferred and prepared.

Restore from backup with key-value arguments

In addition to traditional argument passing, restore-from-backup supports key-value arguments. The supported arguments are:

argument

type

default

description

The following example checks if MyServer1 can be restored.

call command mariadbmon async-create-backup monitor=MariaDB-Monitor source=server1 dry_run=true bu_name=bu1

List backups

The list-backups command lists currently available backups in the backup storage.

Settings for Backup operations

`ssh_user`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

Ssh username. Used when logging in to backend servers to run commands.

`ssh_keyfile`

Type: path
Mandatory: No
Dynamic: Yes
Default: None

Path to file with an ssh private key. Used when logging in to backend servers to run commands.

`ssh_check_host_key`

Type:
Mandatory: No
Dynamic: Yes
Default: true

Boolean, default: true. When logging in to backends, require that the server is already listed in the known_hosts-file of the user running MaxScale.

`ssh_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

`ssh_port`

Type: number
Mandatory: No
Dynamic: Yes
Default: 22

SSH port. Used for running remote commands on servers.

`rebuild_port`

Type: number
Mandatory: No
Dynamic: Yes
Default: 4444

`mariadb_backup_use_memory`

Type: string
Mandatory: No
Dynamic: Yes
Default: 1G

Starting with MaxScale 24.02.7, the old name mariabackup_use_memory has been deprecated and replaced with mariadb_backup_use_memory. The old name is valid and will continue working as an alias.

`mariadb_backup_parallel`

Type: number
Mandatory: No
Dynamic: Yes
Default: 1

Given as is tomariadb-backup --backup --parallel=<val>. Defines the number of threads used for parallel data file transfer. See for more information.

Starting with MaxScale 24.02.7, the old name mariabackup_parallel has been deprecated and replaced with mariadb_backup_parallel. The old name is valid and will continue working as an alias.

`backup_storage_address`

Type: string
Mandatory: No
Dynamic: Yes
Default: None

Address of the backup storage. Does not need to have MariaDB Server running or be monitored by the monitor. Connected to with ssh. Must have enough disk space to store all backups.

`backup_storage_path`

Type: path
Mandatory: No
Dynamic: Yes
Default: None

Path to main backup storage directory on backup storage host. ssh_user needs to have full access to this directory to save and read backups.

sudoers.d configuration

ColumnStore commands

ColumnStore command settings are listed . At leastcs_admin_api_key must be set.

Get status

Fetch cluster status. Returns the result as is. Status fetching has an automatic timeout of ten seconds.

Examples:

Add or remove node

Add or remove a node to/from the ColumnStore cluster.

<node-host> is the hostname or IP of the node being added or removed.

Examples:

Start and stop cluster

Examples:

Set read-only or readwrite

Examples:

Settings for Columnstore commands

`cs_admin_port`

Numeric, default: 8640. The REST-API port on the ColumnStore nodes. All nodes are assumed to listen on the same port.

`cs_admin_api_key`

String. The API-key MaxScale sends to the ColumnStore nodes when making a REST-API request. Should match the value configured on the ColumnStore nodes.

`cs_admin_base_path`

String, default: /cmapi/0.4.0. Base path sent with the REST-API request.

Other commands

`fetch-cmd-result`

`fetch-cmd-status`

Fetch-cmd-status is meant for internal use and the data it returns is subject to change. It accepts the following key-value argument:

argument

type

default

description

`cancel-cmd`

Troubleshooting

Failover/switchover fails

See the .

Replica detection shows external primaries

Using the MariaDB Monitor With Binlogrouter

_{This page is licensed: CC BY-SA / Gnu FDL}