This module monitors the status of Aurora cluster replicas. These replicas do
not use the standard MySQL protocol replication but rely on a mechanism provided
by AWS to replicate changes.
How Aurora Is Monitored
Each node in an Aurora cluster has the variable @@aurora_server_id which is
the unique identifier for that node. An Aurora replica stores information
relevant to replication in the information_schema.replica_host_status
table. The table contains information about the status of all replicas in the
cluster. The server_id column in this table holds the values of@@aurora_server_id variables from all nodes. The session_id column contains
an unique string for all read-only replicas. For the master node, this value
will be MASTER_SESSION_ID. By executing the following query, we are able to
retrieve the @@aurora_server_id of the master node along with the@@aurora_server_id of the current node.
The node which returns a row with two identical fields is the master. All other
nodes are read-only replicas and will be labeled as slave servers.
In addition to replica status information, theinformation_schema.replica_host_status table contains information about
replication lag between the master and the read-only nodes. This value is stored
in the replica_lag_in_milliseconds column. This can be used to detect read
replicas that are lagging behind the master node. This information can then be
used by the routing modules to route reads to up-to-date nodes.
Configuring the Aurora Monitor
The Aurora monitor should connect directly to the unique endpoints of the Aurora
replicas. The cluster end point should not be included in the set of monitored
servers. Read the
for more information about how to retrieve the unique endpoints of your cluster.
The Aurora monitor requires no parameters apart from the standard monitor
parameters. It supports the monitor script functionality described in documentation.
Here is an example Aurora monitor configuration.
The servers cluster-1, cluster-2 and cluster-3 are the unique Aurora
endpoints configured as MaxScale servers. The monitor will use the_aurora_:borealis credentials to connect to each of the endpoint. The status
of the nodes is inspected every 2500 milliseconds.
Learn about the MariaDB Monitor in MaxScale 21.06. This module tracks primary and replica servers, enabling automatic failover, switchover, and other essential high-availability features.
MaxScale 21.06 ColumnStore Monitor
The ColumnStore monitor, csmon, is a monitor module for MariaDB ColumnStore
servers. The monitor supports ColumnStore version 1.5.
Required Grants
The credentials defined with the user and password parameters must have all
grants on the infinidb_vtable database.
For example, to create a user for this monitor with the required grants execute
the following SQL.
Configuration
Read the document for a list of supported
common monitor parameters.
version
With this deprecated optional parameter the used ColumnStore version is
specified. The only allowed value is 1.5.
admin_port
This optional parameter specifies the port of the ColumnStore administrative
daemon. The default value is 8640. Note that the daemons of all nodes must
be listening on the same port.
admin_base_path
This optional parameter specifies the base path of the ColumnStore
administrative daemon. The default value is /cmapi/0.4.0.
api_key
This optional parameter specifies the API key to be used in the
communication with the ColumnStore administrative daemon. If no
key is specified, then a key will be generated and stored to the
file api_key.txt in the directory with the same name as the
monitor in data directory of MaxScale. Typically that will
be /var/lib/maxscale/<monitor-section>/api_key.txt.
Note that ColumnStore will store the first key provided and
thereafter require it, so changing the key requires the
resetting of the key on the ColumnStore nodes as well.
local_address
With this parameter it is specified what IP MaxScale should
tell the ColumnStore nodes it resides at. Either it orlocal_address at the global level in the MaxScale
configuration file must be specified. If both have been
specified, then the one specified for the monitor overrides.
dynamic_node_detection
This optional boolean parameter specifies whether the monitor should
autonomously figure out the ColumnStore cluster configuration or whether
it should solely rely upon the monitor configuration in the configuration
file. Please see for a
thorough discussion on the meaning of the parameter. The default value
is false.
cluster_monitor_interval
This optional parameter, meaningful only if dynamic_node_detection istrue specifies how often the monitor should probe the ColumnStore
cluster and adapt to any changes that have occurred in the number of
nodes of the cluster. The default value is 10s, that is, the
cluster configuration is probed every 10 seconds.
Note that as the probing is performed at the regular monitor round,
the value should be some multiple of monitor_interval.
Dynamic Node Detection
NOTE If dynamic node detection is used, the network setup must
be such that the hostname/IP-address of a ColumnStore node is the
same when viewed both from MaxScale and from another node.
By default, the ColumnStore monitor behaves like the regular MariaDB
monitor. That is, it only monitors the servers it has been configured
with.
If dynamic_node_detection has been enabled, the behaviour of the monitor
changes significantly. Instead of being explicitly told which servers it
should monitor, the monitor is only told how to get into contact with the
cluster whereafter it autonomously figures out the cluster configuration
and creates dynamic server entries accordingly.
When dynamic node detection is enabled, the servers the monitor has been
configured with are only used for "bootstrapping" the monitor, because
at the initial startup the monitor does not otherwise know how to get
into contact with the cluster.
In the following is shown a configuration using dynamic node detection.
As can be seen, the server entries look just like any other server entries,
but to make them stand out and to indicate what they are used for, they have
the word bootstrap in their name.
In principle, it is sufficient with a single entry, but to cater for the
case that a node happens to be down, it is adviseable to have more than one.
Once the monitor has been able to connect to a node, it will fetch the
configuration and store information about the nodes locally. On subsequent
startups, the monitor will use the bootstrap information only if it cannot
connect using the persisted information. Also, if there has been any change
in the bootstrap servers, the persisted information is not used.
Based on the information obtained from the cluster itself, the monitor
will create dynamic server instances that are named as @@ followed by
the monitor name, followed by a :, followed by the hostname.
If the cluster in fact consists of three nodes, then the output ofmaxctrl list servers may look like
Note that there will be dynamic server entries also for the nodes for
which there is a bootstrap entry.
When the service is defined, it is imperative that it does not explicitly
refer to either the bootstrap or the dynamic entries. Instead, it should
refer to the monitor using the cluster parameter.
With this configuration the RWS service will automatically adapt to any
changes made to the ColumnStore cluster.
Commands
The ColumnStore monitor provides module commands using which the ColumnStore
cluster can be managed. The commands can be invoked using the REST-API with
a client such as curl or using maxctrl.
All commands require the monitor instance name as the first parameters.
Additional parameters must be provided depending on the command.
Note that as maxctrl itself has a timeout of 10 seconds, if a
timeout larger than that is provided to any command, the timeout of
maxctrl must also be increased. For instance:
Here a 30 second timeout is specified for maxctrl to ensure
that it does not expire before the timeout of 20s provided for
the shutdown command possibly does.
The output is always a JSON object.
In the following, assume a configuration like this:
start
Starts the ColumnStore cluster.
Example
shutdown
Shuts down the ColumnStore cluster.
Example
status
Get the status of the ColumnStore cluster.
Returns the status of the cluster or the status of a specific server.
Example
mode-set
Sets the mode of the cluster.
Example
config-get
Returns the cluster configuration.
If no server is specified, the configuration is fetched from
the first server in the monitor configuration, otherwise from
the specified server.
Note that if everything is in order, the returned configuration
should be identical regardless of the server it is fetched from.
Example
add-node
Adds a new node located on the server at the hostname or IP host
to the ColumnStore cluster.
Example
For a more complete example, please refer to .
remove-node
Remove the node located on the server at the hostname or IP host
from the ColumnStore cluster.
Example
For a more complete example, please refer to .
Example
The following is an example of a csmon configuration.
Adding a Node
Note that in the following dynamic_node_detection is not used, but
the monitor is configured in the traditional way. The impact ofdynamic_node_detection is described .
Adding a new node to a ColumnStore cluster can be performed dynamically
at runtime, but it must be done in two steps. First, the node is added
to ColumnStore and then, the corresponding server object (that possibly
has to be created) in the MaxScale configuration is added to the
ColumnStore monitor.
In the following, assume a two node ColumnStore cluster and an initial
MaxScale configuration like.
Invoking maxctrl list servers will now show:
If we now want to add a new ColumnStore node, located at mcs3/10.10.10.12
to the cluster, the steps are as follows.
First the node is added
After a while the following is output:
At this point, the ColumnStore cluster consists of three nodes. However,
the ColumnStore monitor is not yet aware of the new node.
First we need to create the corresponding server object.
Invoking maxctrl list servers will now show:
The server CsNode3 has been created, but its state is Down since
it is not yet being monitored.
It must now be added to the monitor.
Now the server is monitored and maxctrl list monitors shows:
The state of the new node is now also set correctly, as shown bymaxctrl list servers.
Note that the MaxScale server object can be created at any point, but
it must not be added to the monitor before the node has been added to
the ColumnStore cluster using call command csmon add-node.
Impact of dynamic_node_detection
If dynamic_node_detection is enabled, there is no need to create any
explicit server entries. All that needs to be done, is to add the node
and the monitor will adapt automatically. Note that it does not matter
whether the node is added indirectly via maxscale or directly using the
REST-API of ColumnStore. The only difference is that in the former case,
MaxScale may detect the new situation slightly faster.
Removing a Node
Note that in the following dynamic_node_detection is not used, but
the monitor is configured in the traditional way. The impact ofdynamic_node_detection is described .
Removing a node should be performed in the reverse order of how a
node was added. First, the MaxScale server should be removed from the
monitor. Then, the node should be removed from the ColumnStore cluster.
Suppose we want to remove the ColumnStore node at mcs2/10.10.10.12
and the current situation is as:
First, the server is removed from the monitor.
Checking with maxctrl list monitors we see that the server has
indeed been removed.
Now the node can be removed from the cluster itself.
Impact of dynamic_node_detection
If dynamic_node_detection is enabled, there is in general no need
to explicitly remove a static server entry (as there never was one in
the first place). The only exception is if the removed node happened
to be a bootstrap server. In that case, the server entry should be
removed from the monitor's list of servers (used as bootstrap nodes).
If that is not done, then the monitor will log a warning at each startup.
This page is licensed: CC BY-SA / Gnu FDL
Understanding MaxScale's Aurora Monitor
MaxScale's monitors the status of Aurora cluster replicas.
What Does the Aurora Monitor Support?
The supports:
Monitoring replicas in Amazon Aurora deployments
Designing for MaxScale's MariaDB Monitor
MaxScale's monitors deployments.
This page contains topics that need to be considered when designing applications that use the MariaDB Monitor.
This page contains topics that need to be considered when designing applications that use the Galera Monitor.
?
Additional information is available .
Understanding MaxScale's Galera Monitor
MaxScale's monitors deployments.
What Does the Support?
The Galera Monitor (galeramon) supports:
Monitoring deployments
Monitoring deployments
Query-based load balancing with the
Connection-based load balancing with the Read Connection Router (readconnroute)
Deploying Galera Monitor
Deploy MaxScale with Galera Monitor and Read/Write Split Router
Deploy MaxScale with Galera Monitor and Read Connection Router
Using SST Donors for Queries with MaxScale's Galera Monitor
MaxScale's monitors .
By default, when a node is chosen as a donor for a State Snapshot Transfer (SST), Galera Monitor does not route any queries to it. However, some SST methods are non-blocking on the donor, so this default behavior is not always desired.
Non-Blocking SST Methods
A cluster's SST method is defined by the system variable. When this system variable is set to mariadb-backup, the cluster uses to perform the SST. is a non-blocking backup method, so Galera Cluster allows the node to execute queries while acting as the SST donor.
Configuring Availability of SST Donors
Configure the availability of SST donors by configuring the available_when_donor parameter for the Galera Monitor in maxscale.cnf.
Defines how often the monitor updates the status of the servers. Choose a lower
value if servers should be queried more often. The smallest possible value is
100 milliseconds. If querying the servers takes longer than monitor_interval,
the effective update rate is reduced.
The interval is specified as documented here. If no explicit unit
is provided, the value is interpreted as milliseconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected.
This parameter controls the timeout for connecting to a monitored server.
The interval is specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the timeout is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second. The minimum value is 1 second.
This parameter controls the timeout for writing to a monitored server.
The timeout is specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the timeout is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second. The minimum value is 1 seconds.
This parameter controls the timeout for reading from a monitored server.
The timeout is specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the timeout is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second. The minimum value is 1 second.
backend_connect_attempts
Type: number
Mandatory: No
Dynamic: Yes
Default: 1
This parameter defines the maximum times a backend connection is attempted every
monitoring loop. Every attempt may take up to backend_connect_timeout seconds
to perform. If none of the attempts are successful, the backend is considered to
be unreachable and down.
disk_space_threshold
Type: string
Mandatory: No
Dynamic: Yes
Default: None
This parameter duplicates the disk_space_thresholdserver parameter.
If the parameter has not been specified for a server, then the one specified
for the monitor is applied.
NOTE: Since MariaDB 10.4.7, MariaDB 10.3.17 and MariaDB 10.2.26, the
information will be available only if the monitor user has the FILE
privilege.
That is, if the disk configuration is the same on all servers monitored by
the monitor, it is sufficient (and more convenient) to specify the disk
space threshold in the monitor section, but if the disk configuration is
different on all or some servers, then the disk space threshold can be
specified individually for each server.
For example, suppose server1, server2 and server3 are identical
in all respects. In that case we can specify disk_space_threshold
in the monitor.
However, if the servers are heterogeneous with the disk used for the
data directory mounted on different paths, then the disk space threshold
must be specified separately for each server.
If most of the servers have the data directory disk mounted on
the same path, then the disk space threshold can be specified on
the monitor and separately on the server with a different setup.
Above, server1 has the disk used for the data directory mounted
at /DbData while both server2 and server3 have it mounted on/data and thus the setting in the monitor covers them both.
With this parameter it can be specified the minimum amount of time
between disk space checks. The interval is specified as documented here. If no explicit unit
is provided, the value is interpreted as milliseconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected.
The default value is 0, which means that by default the disk space
will not be checked.
Note that as the checking is made as part of the regular monitor interval
cycle, the disk space check interval is affected by the value ofmonitor_interval. In particular, even if the value ofdisk_space_check_interval is smaller than that of monitor_interval,
the checking will still take place at monitor_interval intervals.
script
Type: string
Mandatory: No
Dynamic: Yes
Default: None
This command will be executed on a server state change. The parameter should
be an absolute path to a command or the command should be in the executable
path. The user running MaxScale should have execution rights to the file itself
and the directory it resides in. The script may have placeholders which
MaxScale will substitute with useful information when launching the script.
The placeholders and their substitution results are:
$INITIATOR -> IP and port of the server which initiated the event
$EVENT -> event description, e.g. "server_up"
$LIST -> list of IPs and ports of all servers
$NODELIST -> list of IPs and ports of all running servers
$SLAVELIST -> list of IPs and ports of all slave servers
$MASTERLIST -> list of IPs and ports of all master servers
$SYNCEDLIST -> list of IPs and ports of all synced Galera nodes
$PARENT -> IP and port of the parent of the server which initiated the event.
For master-slave setups, this will be the master if the initiating server is a
slave.
$CHILDREN -> list of IPs and ports of the child nodes of the server who
initiated the event. For master-slave setups, this will be a list of slave
servers if the initiating server is a master.
The expanded variable value can be an empty string if no servers match the
variable's requirements. For example, if no masters are available $MASTERLIST
will expand into an empty string. The list-type substitutions will only contain
servers monitored by the current monitor.
Any output by the executed script will be logged into the MaxScale log. Each
outputted line will be logged as a separate log message.
The log level on which the messages are logged depends on the format of the
messages. If the first word in the output line is one of alert:, error:,warning:, notice:, info: or debug:, the message will be logged on the
corresponding level. If the message is not prefixed with one of the keywords,
the message will be logged on the notice level. Whitespace before, after or
between the keyword and the colon is ignored and the matching is
case-insensitive.
Currently, the script must not execute any of the following MaxCtrl
calls as they cause a deadlock:
alter monitor to the monitor executing the script
stop monitor to the monitor executing the script
call command to a MariaDB-Monitor that is executing the script
The timeout for the executed script. The interval is specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the timeout is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second.
If the script execution exceeds the configured timeout, it is stopped by sending
a SIGTERM signal to it. If the process does not stop, a SIGKILL signal will be
sent to it once the execution time is greater than twice the configured timeout.
A list of event names which cause the script to be executed. If this option is
not defined, all events cause the script to be executed. The list must contain a
comma separated list of event names.
The following table contains all the possible event types and their
descriptions.
The maximum journal file age. The interval is specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the max age is seconds, a max age specified in milliseconds will be rejected,
even if the duration is longer than a second.
When the monitor starts, it reads any stored journal files. If the journal file
is older than the value of journal_max_age, it will be removed and the monitor
starts with no prior knowledge of the servers.
Monitor Crash Safety
Starting with MaxScale 2.2.0, the monitor modules keep an on-disk journal of the
latest server states. This change makes the monitors crash-safe when options
that introduce states are used. It also allows the monitors to retain stateful
information when MaxScale is restarted.
For MySQL monitor, options that introduce states into the monitoring process are
the detect_stale_master and detect_stale_slave options, both of which are
enabled by default. Galeramon has the disable_master_failback parameter which
introduces a state.
The default location for the server state journal is in/var/lib/maxscale/<monitor name>/monitor.dat where <monitor name> is the
name of the monitor section in the configuration file. If MaxScale crashes or is
shut down in an uncontrolled fashion, the journal will be read when MaxScale is
started. To skip the recovery process, manually delete the journal file before
starting MaxScale.
Script example
Below is an example monitor configuration which launches a script with all
supported substitutions. The example script reads the results and prints it to
file and sends it as email.
The Galera Monitor is a monitoring module for MaxScale that monitors a Galera
cluster. It detects whether nodes are a part of the cluster and if they are in
sync with the rest of the cluster. It can also assign master and slave roles
inside MaxScale, allowing Galera clusters to be used with modules designed for
traditional master-slave clusters.
By default, the Galera Monitor will choose the node with the lowestwsrep_local_index value as the master. This will mean that two MaxScales
running on different servers will choose the same server as the master.
WSREP Variables and Their Effects
The following WSREP variables are inspected by galeramon to see whether a node is
usable. If the node is not usable, it loses the Master and Slave labels and
will be in the Running state.
If wsrep_ready=0, the WSREP system is not yet ready and the Galera node
cannot accept queries.
If wsrep_desync=1 is set, the node is desynced and is not participating in
the Galera replication.
If wsrep_reject_queries=[ALL|ALL_KILL] is set, queries are refused and the
node is unusable.
With wsrep_sst_donor_rejects_queries=1, donor nodes reject
queries. Galeramon treats this the same as if wsrep_reject_queries=ALL was
set.
If wsrep_local_state is not 4 (or 2 with available_when_donor=true), the
node is not in the correct state and is not used.
Galera clusters and slaves replicating from it
MaxScale 2.4.0 added support for slaves replicating off of Galera nodes. If a
non-Galera server monitored by galeramon is replicating from a Galera node also
monitored by galeramon, it will be assigned the Slave, Running status as long
as the replication works. This allows read-scaleout with Galera servers without
increasing the size of the Galera cluster.
Required Grants
The Galera Monitor requires the REPLICA MONITOR grant to work:
With MariaDB Server 10.4 and earlier, REPLICATION CLIENT is required instead.
If set_donor_nodes is configured, the SUPER grant is required:
Configuration
A minimal configuration for a monitor requires a set of servers for monitoring
and a username and a password to connect to these servers. The user requires the
REPLICATION CLIENT privilege to successfully monitor the state of the servers.
Common Monitor Parameters
For a list of optional parameters that all monitors support, read the Monitor Common document.
Galera Monitor optional parameters
These are optional parameters specific to the Galera Monitor.
disable_master_failback
Type: boolean
Default: false
Dynamic: Yes
If a node marked as master inside MaxScale happens to fail and the master status
is assigned to another node MaxScale will normally return the master status to
the original node after it comes back up. With this option enabled, if the
master status is assigned to a new node it will not be reassigned to the
original node for as long as the new master node is running. In this case theMaster Stickiness status bit is set which will be visible in themaxctrl list servers output.
available_when_donor
Type: boolean
Default: false
Dynamic: Yes
This option allows Galera nodes to be used normally when they are donors in an
SST operation when the SST method is non-blocking
(e.g. wsrep_sst_method=mariadb-backup).
Normally when an SST is performed, both participating nodes lose their Synced,
Master or Slave statuses. When this option is enabled, the donor is treated as
if it was a normal member of the cluster (i.e. wsrep_local_state = 4). This is
especially useful if the cluster drops down to one node and an SST is required
to increase the cluster size.
The current list of non-blocking SST
methods are xtrabackup, xtrabackup-v2 and mariadb-backup. Read the
documentation for more details.
disable_master_role_setting
Type: boolean
Default: false
Dynamic: Yes
This disables the assignment of master and slave roles to the Galera cluster
nodes. If this option is enabled, Synced is the only status assigned by this
monitor.
use_priority
Type: boolean
Default: false
Dynamic: Yes
Enable interaction with server priorities. This will allow the monitor to
deterministically pick the write node for the monitored Galera cluster and will
allow for controlled node replacement.
root_node_as_master
Type: boolean
Default: false
Dynamic: Yes
This option controls whether the write master Galera node requires a_wsrep_local_index_ value of 0. This option was introduced in MaxScale 2.1.0 and
it is disabled by default in versions 2.1.5 and newer. In versions 2.1.4 and
older, the option was enabled by default.
A Galera cluster will always have a node which has a wsrep_local_index value
of 0. Based on this information, multiple MaxScale instances can always pick the
same node for writes.
If the root_node_as_master option is disabled for galeramon, the node with the
lowest index will always be chosen as the master. If it is enabled, only the
node with a wsrep_local_index value of 0 can be chosen as the master.
This parameter can work with disable_master_failback but using them together
is not advisable: the intention of root_node_as_master is to make sure that
all MaxScale instances that are configured to use the same Galera cluster will
send writes to the same node. If disable_master_failback is enabled, this is
no longer true if the Galera cluster reorganizes itself in a way that a
different node gets the node index 0, writes would still be going to the old
node that previously had the node index 0. A restart of one of the MaxScales or
a new MaxScale joining the cluster will cause writes to be sent to the wrong
node, thus resulting in an increasing the rate of deadlock errors and
sub-optimal performance.
set_donor_nodes
Type: boolean
Default: false
Dynamic: Yes
This option controls whether the global variable wsrep_sst_donor should be set
in each cluster node with slave' status.
The variable contains a list of slave servers, automatically sorted, with
possible master candidates at its end.
The sorting is based either on wsrep_local_index or node server priority
depending on the value of use_priority option.
If no server has priority defined the sorting switches to wsrep_local_index.
Node names are collected by fetching the result of the variable wsrep_node_name.
Example of variable being set in all slave nodes, assuming three nodes:
Note:
in order to set the global variable wsrep_sst_donor, proper privileges are
required for the monitor user that connects to cluster nodes.
This option is disabled by default and was introduced in MaxScale 2.1.0.
Interaction with Server Priorities
If the use_priority option is set and a server is configured with thepriority=<int> parameter, galeramon will use that as the basis on which the
master node is chosen. This requires the disable_master_role_setting to be
undefined or disabled. The server with the lowest positive value of priority
will be chosen as the master node when a replacement Galera node is promoted to
a master server inside MaxScale. If all candidate servers have the same
priority, the order of the servers in the servers parameter dictates which is
chosen as the master.
Nodes with a negative value (priority < 0) will never be chosen as the
master. This allows you to mark some servers as permanent slaves by assigning a
non-positive value into priority. Nodes with the default priority of 0 are
only selected if no nodes with higher priority are present and the normal node
selection rules apply to them (i.e. selection is based on wsrep_local_index).
Here is an example.
In this example node-1 is always used as the master if available. If node-1
is not available, then the next node with the highest priority rank is used. In
this case it would be node-3. If both node-1 and node-3 were down, thennode-2 would be used. Because node-4 has a value of -1 in priority, it
will never be the master. Nodes without priority parameter are considered as
having a priority of 0 and will be used only if all nodes with a positive_priority_ value are not available.
With priority ranks you can control the order in which MaxScale chooses the
master node. This will allow for a controlled failure and replacement of nodes.
This monitor is designed specifically for Amazon Aurora clusters. It identifies the writer and reader instances, allowing MaxScale to route queries and manage high availability in an AWS environment.
MaxScale 21.06 Monitors
Monitors are essential for high availability. They track backend server status, detect failures, promote replicas, and perform automatic failovers, ensuring service continuity.
If the promotion_sql_file parameter is set, then the script referred to by the parameter is executed.
If there is an external master, then it configures that replication by executing CHANGE MASTER TO and START REPLICA.
enforce_simple_topology
• When this parameter is enabled, the monitor assumes that the topology of the cluster only consists of a single primary server, which has multiple replica servers. • When this parameter is disabled, the monitor does not make assumptions about the topology of the cluster. • This parameter implicitly sets the assume_unique_hostnames, auto_failover, and auto_rejoin parameters.
replication_user
• This parameter is used by the monitor to set the MASTER_USER option when executing the statement. • If this parameter is not set, then the monitor uses the monitor user.
replication_password
• This parameter is used by the monitor to set the MASTER_PASSWORD option when executing the statement. • If this parameter is not set, then the monitor uses the monitor user's password.
replication_master_ssl
• This parameter is used by the monitor to set the MASTER_SSL option when executing the statement. • If this parameter is not set, then the monitor does not enable TLS.t
failover_timeout
• This parameter defines the maximum amount of time allowed to perform a failover. • If failover times out, then a message is logged to the MaxScale log, and automatic failover is disabled.
switchover_timeout
• This parameter defines the maximum amount of time allowed to perform a switchover. • If switchover times out, then a message is logged to the MaxScale log, and automatic failover is disabled.
verify_master_failure
• When this parameter is enabled, if the monitor detects that the primary server failed, it will execute to verify that the replica servers have also detected the failure. • If a replica has received an event within master_failure_timeout duration, the primary is not considered down when deciding whether to failover, even if the monitor cannot connect to the primary.
master_failure_timeout
• This parameter defines the timeout for verify_master_failure. • The default value is 10 seconds.
servers_no_promotion
• This parameter defines a comma-separated list of servers that should not be chosen to be primary server.
promotion_sql_file
• This parameter defines an SQL script that should be executed on the new primary server during failover or switchover.
demotion_sql_file
• This parameter defines an SQL script that should be executed on the old primary server during failover or switchover when it is demoted to be a replica server. • The script is also executed when a server is automatically added to the cluster due to the auto_rejoin parameter.
handle_events
• When this parameter is enabled, the monitor enables events on the new primary server that were previously enabled on the old primary server. • The monitor also disables the events on the old primary server.
failcount
• This parameter defines the number of monitoring checks that must pass before a primary server is considered to be down. • The default value is 5. • The total wait time can be calculated as: (monitor_interval + backend_connect_timeout) * failcount
auto_failover
• When this parameter is enabled, the monitor will automatically failover to a new primary server if the primary server fails. • When this parameter is disabled, the monitor will not automatic failover to a new primary server if the primary server fails, so failover must be performed manually. • This parameter is disabled by default.
auto_rejoin
• When this parameter is enabled, the monitor will attempt to automatically configure new replica servers to replicate from the primary server when they come online. • When this parameter is disabled, the monitor will not attempt to automatically configure new replica servers to replicate from the primary server when they come online, so they must be configured manually. • TThis parameter is disabled by default.
• When this parameter is enabled, the monitor will automatically switchover to a new primary server if the primary server is low on disk space. • When this parameter is disabled, the monitor will automatically switchover to a new primary server if the primary server is low on disk space, so switchover must be performed manually. • This parameter requires the disk_space_threshold parameter to be set for the server or the monitor. • This parameter requires the disk_space_check_interval parameter to be set for the monitor. • This parameter is disabled by default.
When multiple MaxScale instances are used in a highly available deployment, MariaDB Monitor needs to ensure that only one MaxScale instance performs automatic failover operations at a given time. It does this by using cooperative locks on the back-end servers.
How MariaDB Monitor uses Cooperative Locks
When cooperative locking is enabled for MariaDB Monitor, it tries to acquire locks on the back-end servers with GET_LOCK() function. If a specific MaxScale instance is able to acquire the lock on a majority of servers, then it is considered the primary MaxScale instance, which means that it can handle .
Configuring Cooperative Locking
Configure cooperative locking by configuring the cooperative_monitoring_locks parameter for the MariaDB Monitor in maxscale.cnf. It has several possible values.
Learn to use the MariaDB Monitor to automate cluster management. This guide covers how to configure server monitoring, automatic failover, switchover, and other HA features.
MariaDB Monitor monitors a Master-Slave replication cluster. It probes the
state of the backends and assigns server roles such as master and slave, which
are used by the routers when deciding where to route a query. It can also modify
the replication cluster by performing failover, switchover and rejoin. Backend
server versions older than MariaDB/MySQL 5.5 are not supported. Failover and
other similar operations require MariaDB 10.0.2 or later.
Up until MariaDB MaxScale 2.2.0, this monitor was called MySQL Monitor.
Required Grants
The monitor user requires the following grant:
In MariaDB Server versions 10.5.0 to 10.5.8, the monitor user instead requires
REPLICATION SLAVE ADMIN:
In MariaDB Server 10.5.9 and later, REPLICA MONITOR is required:
If the monitor needs to query server disk space (i.e. disk_space_threshold is
set), then the FILE-grant is required with MariaDB Server versions 10.4.7,
10.3.17, 10.2.26 and 10.1.41 and later.
MariaDB Server 10.5.2 introduces CONNECTION ADMIN. This is recommended since it
allows the monitor to log in even if server connection limit has been reached.
As of MariaDB Server 11.0.1, the SUPER-privilege no longer contains several of
its former sub-privileges. These must be given separately.
If a separate replication user is defined (with replication_user andreplication_password), it requires the following grant:
Master selection
Only one backend can be master at any given time. A master must be running
(successfully connected to by the monitor) and its read_only-setting must be
off. A master may not be replicating from another server in the monitored
cluster unless the master is part of a multimaster group. Master selection
prefers to select the server with the most slaves, possibly in multiple
replication layers. Only slaves reachable by a chain of running relays or
directly connected to the master count. When multiple servers are tied for
master status, the server which appears earlier in the servers-setting of the
monitor is selected.
Servers in a cyclical replication topology (multimaster group) are interpreted
as having all the servers in the group as slaves. Even from a multimaster group
only one server is selected as the overall master.
After a master has been selected, the monitor prefers to stick with the choice
even if other potential masters with more slave servers are available. Only if
the current master is clearly unsuitable does the monitor try to select another
master. An existing master turns invalid if:
It is unwritable (read_only is on).
It has been down for more than failcount monitor passes and has no running
slaves. Running slaves behind a downed relay count. A slave in this context is
any server with at least a partially running replication connection (either
io or sql thread is running). The slave servers must also be down for more than
failcount monitor passes to allow new master selection.
It did not previously replicate from another server in the cluster but it
is now replicating.
It was previously part of a multimaster group but is no longer, or the
multimaster group is replicating from a server not in the group.
Cases 1 and 2 cover the situations in which the DBA, an external script or even
another MaxScale has modified the cluster such that the old master can no longer
act as master. Cases 3 and 4 are less severe. In these cases the topology has
changed significantly and the master should be re-selected, although the old
master may still be the best choice.
The master change described above is different from failover and switchover
described in section Failover, switchover and auto-rejoin.
A master change only modifies the server roles inside MaxScale but does not
modify the cluster other than changing the targets of read and write queries.
Failover and switchover perform a master change on their own.
As a general rule, it's best to avoid situations where the cluster has multiple
standalone servers, separate master-slave pairs or separate multimaster groups.
Due to master invalidation rule 2, a standalone master can easily lose the
master status to another valid master if it goes down. The new master probably
does not have the same data as the previous one. Non-standalone masters are less
vulnerable, as a single running slave or multimaster group member will keep the
master valid even when down.
Configuration
A minimal configuration for a monitor requires a set of servers for monitoring
and a username and a password to connect to these servers.
From MaxScale 2.2.1 onwards, the module name is mariadbmon instead ofmysqlmon. The old name can still be used.
The grants required by user depend on which monitor features are used. A full
list of the grants can be found in the Required Grants
section.
Common Monitor Parameters
For a list of optional parameters that all monitors support, read the Monitor Common document.
MariaDB Monitor optional parameters
These are optional parameters specific to the MariaDB Monitor. Failover,
switchover and rejoin-specific parameters are listed in their own section.
When active, the monitor assumes that server hostnames and
ports are consistent between the server definitions in the MaxScale
configuration file and the "SHOW ALL SLAVES STATUS" outputs of the servers
themselves. Specifically, the monitor assumes that if server A is replicating
from server B, then A must have a slave connection with Master_Host andMaster_Port equal to B's address and port in the configuration file. If this
is not the case, e.g. an IP is used in the server while a hostname is given in
the file, the monitor may misinterpret the topology. In MaxScale 2.4.1, the
monitor attempts name resolution on the addresses if a simple string comparison
does not find a match. Using exact matching addresses is, however, more
reliable.
This setting must be ON to use any cluster operation features such as failover
or switchover, because MaxScale uses the addresses and ports in the
configuration file when issuing "CHANGE MASTER TO"-commands.
If the network configuration is such that the addresses MaxScale uses to connect
to backends are different from the ones the servers use to connect to each
other, assume_unique_hostnames should be set to OFF. In this mode, MaxScale
uses server id:s it queries from the servers and the Master_Server_Id fields
of the slave connections to deduce which server is replicating from which. This
is not perfect though, since MaxScale doesn't know the id:s of servers it has
never connected to (e.g. server has been down since MaxScale was started). Also,
the Master_Server_Id-field may have an incorrect value if the slave connection
has not been established. MaxScale will only trust the value if the monitor has
seen the slave connection IO thread connected at least once. If this is not the
case, the slave connection is ignored.
Designate additional conditions for_Master_-status, i.e qualified for read and write queries.
Normally, if a suitable master candidate server is found as described in Master selection, MaxScale designates it Master.master_conditions sets additional conditions for a master server. This
setting is an enum_mask, allowing multiple conditions to be set simultaneously.
Conditions 2, 3 and 4 refer to slave servers. If combined, a single slave must
fulfill all of the given conditions for the master to be viable.
If the master candidate fails master_conditions but fulfills_slave_conditions_, it may be designated Slave instead.
The available conditions are:
none : No additional conditions
connecting_slave : At least one immediate slave (not behind relay) is
attempting to replicate or is replicating from the master (Slave_IO_Running is
'Yes' or 'Connecting', Slave_SQL_Running is 'Yes'). A slave with incorrect
replication credentials does not count. If the slave is currently down, results
from the last successful monitor tick are used.
connected_slave : Same as above, with the difference that the replication
connection must be up (Slave_IO_Running is 'Yes'). If the slave is currently
down, results from the last successful monitor tick are used.
running_slave : Same as connecting_slave, with the addition that the
slave must also be Running.
primary_monitor_master : If this MaxScale is with another MaxScale and this is the
secondary MaxScale, require that the candidate master is selected also by the
primary MaxScale.
The default value of this setting ismaster_requirements=primary_monitor_master to ensure that both monitors use
the same master server when cooperating.
For example, to require that the master must have a slave which is both
connected and running, set
Designate additional conditions for Slave-status,
i.e qualified for read queries.
Normally, a server is Slave if it is at least attempting to replicate from the
master candidate or a relay (Slave_IO_Running is 'Yes' or 'Connecting',
Slave_SQL_Running is 'Yes', valid replication credentials). The master candidate
does not necessarily need to be writable, e.g. if it fails its_master_conditions_. slave_conditions sets additional conditions for a slave
server. This setting is an enum_mask, allowing multiple conditions to be set
simultaneously.
The available conditions are:
none : No additional conditions. This is the default value.
linked_master : The slave must be connected to the master (Slave_IO_Running
and Slave_SQL_Running are 'Yes') and the master must be Running. The same
applies to any relays between the slave and the master.
running_master : The master must be running. Relays may be down.
writable_master : The master must be writable, i.e. labeled Master.
primary_monitor_master : If this MaxScale is with another MaxScale and this is the
secondary MaxScale, require that the candidate master is selected also by the
primary MaxScale.
For example, to require that the master server of the cluster must be running
and writable for any servers to have Slave-status, set
Deprecated. Ignore any servers that are not monitored by
this monitor but are a part of the replication topology.
An external server is a server not monitored by this monitor. If a server is
replicating from an external server, it typically gains the Slave of External
&#xNAN;Server-status. If this setting is enabled, the status is not set.
failcount
Type: number
Mandatory: No
Dynamic: Yes
Default: 5
Number of consecutive monitor passes a master server must be down before it is
considered failed. If automatic failover is enabled (auto_failover=true), it
may be performed at this time. A value of 0 or 1 enables immediate failover.
If automatic failover is not possible, the monitor will try to
search for another server to fulfill the master role. See section Master selection
for more details. Changing the master may break replication as queries could be
routed to a server without previous events. To prevent this, avoid having
multiple valid master servers in the cluster.
The worst-case delay between the master failure and the start of the failover
can be estimated by summing up the timeout values and monitor_interval and
multiplying that by failcount:
If set to ON, the monitor attempts to
disable the read_only-flag on the master when seen. The flag is
checked every monitor tick. The monitor user requires the SUPER-privilege for
this feature to work.
Typically, the master server should never be in read-only-mode. Such a situation
may arise due to misconfiguration or accident, or perhaps if MaxScale crashed
during switchover.
When this feature is enabled, setting the master manually to read_only will no
longer cause the monitor to search for another master. The master will instead
for a moment lose its [Master]-status (no writes), until the monitor again
enables writes on the master. When starting from scratch, the monitor still
prefers to select a writable server as master if possible.
If set to ON, the monitor attempts to
enable the read_only-flag on any writable slave server. The flag is checked
every monitor tick. The monitor user requires the SUPER-privilege
(or READ_ONLY ADMIN) for this
feature to work. While the read_only-flag is ON, only users with the
SUPER-privilege (or READ_ONLY ADMIN) can write to the backend server. If
temporary write access is required, this feature should be disabled before
attempting to disable read_only manually. Otherwise, the monitor will quickly
re-enable it.
read_only won't be enabled on the master server, even if it has
lost [Master]-status due to master_conditions and is
marked [Slave].
Works similar to enforce_read_only_slaves except will set_read_only_ on any writable server that is not the primary and not in
maintenance (a superset of the servers altered by enforce_read_only_slaves).
The monitor user requires the SUPER-privilege
(or READ_ONLY ADMIN) for this feature to work. If the cluster has no valid
primary or primary candidate, read_only is not set on any server as it is
unclear which servers should be altered.
If a running server that is not the master
or a relay master is out of disk space the server is set to maintenance mode.
Such servers are not used for router sessions and are ignored when performing a
failover or other cluster modification operation. See the general monitor
parameters disk_space_threshold and disk_space_check_interval
on how to enable disk space monitoring.
Once a server has been put to maintenance mode, the disk space situation
of that server is no longer updated. The server will not be taken out of
maintenance mode even if more disk space becomes available. The maintenance
flag must be removed manually:
Using this setting is recommended when multiple MaxScales are monitoring the
same backend cluster. When enabled, the monitor attempts to acquire exclusive
locks on the backend servers. The monitor considers itself the primary monitor
if it has a majority of locks. The majority can be either over all configured
servers or just over running servers. See Cooperative monitoring
for more details on how this feature works and which value to use.
Allowed values:
none Default value, no locking.
majority_of_all Primary monitor requires a majority of locks, even counting
servers which are [Down].
majority_of_running Primary monitor requires a majority of locks over
[Running] servers.
This setting is separate from the global MaxScale setting passive. If_passive_ is set to true, cluster operations are disabled even if monitor has
acquired the locks. Generally, it's best not to mix cooperative monitoring with_passive_. Either set passive=false or do not set it at all.
script_max_replication_lag
Type: number
Mandatory: No
Dynamic: Yes
Default: -1
Defines a replication lag limit in seconds for
launching the monitor script configured in the script-parameter. If the
replication lag of a server goes above this limit, the script is ran with the
$EVENT-placeholder replaced by "rlag_above". If the lag goes back below the
limit, the script is ran again with replacement "rlag_below".
Starting with MaxScale 2.2.1, MariaDB Monitor supports replication cluster
modification. The operations implemented are:
failover, which replaces a failed master with a slave
switchover, which swaps a running master with a slave
async-switchover, which schedules a switchover and returns
rejoin, which directs servers to replicate from the master
reset-replication (added in MaxScale 2.3.0), which deletes binary logs and
resets gtid:s
See operation details for more information on the
implementation of the commands.
The cluster operations require that the monitor user (user) has the following
privileges:
SUPER, to modify slave connections, set globals such as read_only and kill
connections from other super-users
REPLICATION CLIENT (REPLICATION SLAVE ADMIN in MariaDB Server 10.5), to list
slave connections
RELOAD, to flush binary logs
PROCESS, to check if the event_scheduler process is running
SHOW DATABASES and EVENT, to list and modify server events
SELECT on mysql.user, to see which users have SUPER
A list of the grants can be found in the Required Grants
section.
The privilege system was changed in MariaDB Server 10.5. The effects of this on
the MaxScale monitor user are minor, as the SUPER-privilege contains many of the
required privileges and is still required to kill connections from other
super-users.
In MariaDB Server 11.0.1 and later, SUPER no longer contains all the required
grants. The monitor requires:
READ_ONLY ADMIN, to set read_only
REPLICA MONITOR and REPLICATION SLAVE ADMIN, to view and manage replication
connections
RELOAD, to flush binary logs
PROCESS, to check if the event_scheduler process is running
SHOW DATABASES, EVENT and SET USER, to list and modify server events
BINLOG ADMIN, to delete binary logs (during reset-replication)
CONNECTION ADMIN, to kill connections
SELECT on mysql.user, to see which users have SUPER
In addition, the monitor needs to know which username and password a
slave should use when starting replication. These are given inreplication_user and replication_password.
The user can define files with SQL statements which are executed on any server
being demoted or promoted by cluster manipulation commands. See the sections onpromotion_sql_file and demotion_sql_file for more information.
The monitor can manipulate scheduled server events when promoting or demoting a
server. See the section on handle_events for more information.
All cluster operations can be activated manually through MaxCtrl. See
section Manual activation for more details.
Failover replaces a failed master with a running slave. It does the
following:
Select the most up-to-date slave of the old master to be the new master. The
selection criteria is as follows in descending priority:
gtid_IO_pos (latest event in relay log)
gtid_current_pos (most processed events)
log_slave_updates is on
disk space is not low
If the new master has unprocessed relay log items, cancel and try again
later.
Prepare the new master:
Remove the slave connection the new master used to replicate from the
old master.
Disable the read_only-flag.
Enable scheduled server events (if event handling is on). Only events
that were enabled on the old master are enabled.
Run the commands in promotion_sql_file.
Start replication from external master if one existed.
Redirect all other slaves to replicate from the new master:
STOP SLAVE
CHANGE MASTER TO
START SLAVE
Check that all slaves are replicating.
Failover is considered successful if steps 1 to 3 succeed, as the cluster then
has at least a valid master server.
Switchover swaps a running master with a running slave. It does the
following:
Prepare the old master for demotion:
Stop any external replication.
Kill connections from super-users since read_only does not affect
them.
Enable the read_only-flag to stop writes.
Disable scheduled server events (if event handling is on).
Run the commands in demotion_sql_file.
Flush the binary log (FLUSH LOGS) so that all events are on disk.
Wait for the new master to catch up with the old master.
Promote new master and redirect slaves as in failover steps 3 and 4. Also
redirect the demoted old master.
Check that all slaves are replicating.
Similar to failover, switchover is considered successful if the new master was
successfully promoted.
Rejoin joins a standalone server to the cluster or redirects a slave
replicating from a server other than the master. A standalone server is joined
by:
Run the commands in demotion_sql_file.
Enable the read_only-flag.
Disable scheduled server events (if event handling is on).
Start replication: CHANGE MASTER TO and START SLAVE.
A server which is replicating from the wrong master is redirected simply with
STOP SLAVE, RESET SLAVE, CHANGE MASTER TO and START SLAVE commands.
Reset-replication (added in MaxScale 2.3.0) deletes binary logs and resets
gtid:s. This destructive command is meant for situations where the gtid:s in the
cluster are out of sync while the actual data is known to be in sync. The
operation proceeds as follows:
Reset gtid:s and delete binary logs on all servers:
Stop (STOP SLAVE) and delete (RESET SLAVE ALL) all slave connections.
Enable the read_only-flag.
Disable scheduled server events (if event handling is on).
Delete binary logs (RESET MASTER).
Set the sequence number of gtid_slave_pos to zero. This also affects
gtid_current_pos.
Prepare new master:
Disable the read_only-flag.
Enable scheduled server events (if event handling is on). Events are
only enabled if the cluster had a master server when starting the
reset-replication operation. Only events that were enabled on the previous
master are enabled on the new.
Direct other servers to replicate from the new master as in the other
operations.
Manual activation
Cluster operations can be activated manually through the REST API or MaxCtrl.
The commands are only performed when MaxScale is in active mode. The commands
generally match their automatic versions. The exception is rejoin, in which
the manual command allows rejoining even when the joining server has empty
gtid:s. This rule allows the user to force a rejoin on a server without binary
logs.
All commands require the monitor instance name as the first parameter. Failover
selects the new master server automatically and does not require additional
parameters. Rejoin requires the name of the joining server as second parameter.
Replication reset accepts the name of the new master server as second parameter.
If not given, the current master is selected.
Switchover takes one to three parameters. If only the monitor name is given,
switchover will autoselect both the slave to promote and the current master as
the server to be demoted. If two parameters are given, the second parameter is
interpreted as the slave to promote. If three parameters are given, the third
parameter is interpreted as the current master. The user-given current master is
compared to the master server currently deduced by the monitor and if the two
are unequal, an error is given.
Example commands are below:
The commands follow the standard module command syntax. All require the monitor
configuration name (MyMonitor) as the first parameter. For switchover, the
last two parameters define the server to promote (NewMasterServ) and the server
to demote (OldMasterServ). For rejoin, the server to join (OldMasterServ) is
required. Replication reset requires the server to promote (NewMasterServ).
It is safe to perform manual operations even with automatic failover, switchover
or rejoin enabled since automatic operations cannot happen simultaneously
with manual ones.
When a cluster modification is initiated via the REST-API, the URL path is of the
form:
<operation> is the name of the command: failover, switchover, rejoin
or reset-replication.
<monitor-instance> is the monitor section name from the MaxScale
configuration file.
<server-param1> and <server-param2> are server parameters as described
above for MaxCtrl. Only switchover accepts both, failover doesn't need any
and both rejoin and reset-replication accept one.
Given a MaxScale configuration file like
with the assumption that server2 is the current master, then the URL
path for making server4 the new master would be:
Example REST-API paths for other commands are listed below.
Queued switchover
Most cluster modification commands wait until the operation either succeeds or
fails. async-switchover is an exception, as it returns immediately. Otherwise_async-switchover_ works identical to a normal switchover command. Use the
module command fetch-cmd-result to view the result of the queued command.fetch-cmd-result returns the status or result of the latest manual command,
whether queued or not.
Automatic activation
Failover can activate automatically if auto_failover is on. The activation
begins when the master has been down at least failcount monitor iterations.
Before modifying the cluster, the monitor checks that all prerequisites for the
failover are fulfilled. If the cluster does not seem ready, an error is printed
and the cluster is rechecked during the next monitor iteration.
Switchover can also activate automatically with theswitchover_on_low_disk_space-setting. The operation begins if the master
server is low on disk space but otherwise the operating logic is quite similar
to automatic failover.
Rejoin stands for starting replication on a standalone server or redirecting a
slave replicating from the wrong master (any server that is not the cluster
master). The rejoined servers are directed to replicate from the current cluster
master server, forcing the replication topology to a 1-master-N-slaves
configuration.
A server is categorized as standalone if the server has no slave connections,
not even stopped ones. A server is replicating from the wrong master if the
slave IO thread is connected but the master server id seen by the slave does not
match the cluster master id. Alternatively, the IO thread may be stopped or
connecting but the master server host or port information differs from the
cluster master info. These criteria mean that a STOP SLAVE does not yet set a
slave as standalone.
With auto_rejoin active, the monitor will try to rejoin any servers matching
the above requirements. Rejoin does not obey failcount and will attempt to
rejoin any valid servers immediately. When activating rejoin manually, the
user-designated server must fulfill the same requirements.
Limitations and requirements
Switchover and failover only understand simple topologies. They will not work if
the cluster has multiple masters, relay masters, or if the topology is circular.
The server cluster is assumed to be well-behaving with no significant
replication lag and all commands that modify the cluster complete in a few
seconds (faster than backend_read_timeout and backend_write_timeout).
The backends must all use GTID-based replication, and the domain id should not
change during a switchover or failover. Master and slaves must have
well-behaving GTIDs with no extra events on slave servers.
Failover cannot be performed if MaxScale was started only after the master
server went down. This is because MaxScale needs reliable information on the
gtid domain of the cluster and the replication topology in general to properly
select the new master.
Failover may lose events. If a master goes down before sending new events to at
least one slave, those events are lost when a new master is chosen. If the old
master comes back online, the other servers have likely moved on with a
diverging history and the old master can no longer join the replication cluster.
To reduce the chance of losing data, use .
In semisynchronous mode, the master waits for a slave to receive an event before
returning an acknowledgement to the client. This does not yet guarantee a clean
failover. If the master fails after preparing a transaction but before receiving
slave acknowledgement, it will still commit the prepared transaction as part of
its crash recovery. Since the slaves may never have seen this transaction, the
old master has diverged from the slaves. See
for more information.
Even a controlled shutdown of the master may lose events. The server does not by
default wait for all data to be replicated to the slaves when shutting down and
instead simply closes all connections. Before shutting down the master with the
intention of having a slave promoted, run switchover first to ensure that all
data is replicated. For more information on server shutdown, see .
Switchover requires that the cluster is "frozen" for the duration of the
operation. This means that no data modifying statements such as INSERT or UPDATE
are executed and the GTID position of the master server is stable. When
switchover begins, the monitor sets the global read_only flag on the old
master backend to stop any updates. read_only does not affect users with the
SUPER-privilege so any such user can issue writes during a switchover. These
writes have a high chance of breaking replication, because the write may not be
replicated to all slaves before they switch to the new master. To prevent this,
any users who commonly do updates should not have the SUPER-privilege. For even
more security, the only SUPER-user session during a switchover should be the
MaxScale monitor user. This also applies to users running scheduled server
events. Although the monitor by default disables events on the master, an
event may already be executing. If the event definer has SUPER-privilege, the
event can write to the database even through read_only.
When mixing rejoin with failover/switchover, the backends should have_log_slave_updates_ on. The rejoining server is likely lagging behind the rest
of the cluster. If the current cluster master does not have binary logs from the
moment the rejoining server lost connection, the rejoining server cannot
continue replication. This is an issue if the master has changed and
the new master does not have log_slave_updates on.
If an automatic cluster operation such as auto-failover or auto-rejoin fails,
all cluster modifying operations are disabled for failcount monitor iterations,
after which the operation may be retried. Similar logic applies if the cluster is
unsuitable for such operations, e.g. replication is not using GTID.
External master support
The monitor detects if a server in the cluster is replicating from an external
master (a server that is not monitored by the monitor). If the replicating
server is the cluster master server, then the cluster itself is considered to
have an external master.
If a failover/switchover happens, the new master server is set to replicate from
the cluster external master server. The username and password for the replication
are defined in replication_user and replication_password. The address and
port used are the ones shown by SHOW ALL SLAVES STATUS on the old cluster
master server. In the case of switchover, the old master also stops replicating
from the external server to preserve the topology.
After failover the new master is replicating from the external master. If the
failed old master comes back online, it is also replicating from the external
server. To normalize the situation, either have auto_rejoin on or manually
execute a rejoin. This will redirect the old master to the current cluster
master.
Enable automatic master failover. When automatic failover is enabled, MaxScale
will elect a new master server for the cluster if the old master goes down. A
server is assumed Down if it cannot be connected to, even if this is caused by
incorrect credentials. Failover triggers if the master stays down for failcount monitor intervals. Failover will not take place if
MaxScale is set passive.
As failover alters replication, it requires more privileges than normal
monitoring. See here for a list of grants.
Failover is designed to be used with simple master-slave topologies. More
complicated topologies, such as multilayered or circular replication, are not
guaranteed to always work correctly. Test before using failover with such
setups.
Enable automatic joining of servers to the cluster. When enabled, MaxScale will
attempt to direct servers to replicate from the current cluster master if they
are not currently doing so. Replication will be started on any standalone
servers. Servers that are replicating from another server will be redirected.
This effectively enforces a 1-master-N-slaves topology. The current master
itself is not redirected, so it can continue to replicate from an external
master. Rejoin is also not performed on any server that is replicating from
multiple sources, as this indicates a complicated topology (this rule is
overridden by enforce_simple_topology).
This feature is often paired with auto_failover to redirect
the former master when it comes back online. Sometimes this kind of rejoin will
fail as the old master may have transactions that were never replicated to the
current one. See limitations for more
information.
As an example, consider the following series of events:
Slave A goes down
Master goes down and a failover is performed, promoting Slave B
Slave A comes back
Old master comes back
Slave A is still trying to replicate from the downed master, since it wasn't
online during failover. If auto_rejoin is on, Slave A will quickly be
redirected to Slave B, the current master. The old master will also rejoin the
cluster if possible.
If enabled, the monitor will attempt to
switchover a master server low on disk space with a slave. The switch is only
done if a slave without disk space issues is found. Ifmaintenance_on_low_disk_space is also enabled, the old master (now a slave)
will be put to maintenance during the next monitor iteration.
For this parameter to have any effect, disk_space_threshold must be specified
for the server
or the monitor.
Also, disk_space_check_interval
must be defined for the monitor.
This setting tells the monitor to assume that the servers should be arranged in a
1-master-N-slaves topology and the monitor should try to keep it that way. Ifenforce_simple_topology is enabled, the settings assume_unique_hostnames,auto_failover and auto_rejoin are also activated regardless of their individual
settings.
By default, mariadbmon will not rejoin servers with more than one replication
stream configured into the cluster. Starting with MaxScale 6.2.0, whenenforce_simple_topology is enabled, all servers will be rejoined into the
cluster and any extra replication sources will be removed. This is done to make
automated failover with multi-source external replication possible.
This setting also allows the monitor to perform a failover to a cluster where the master
server has not been seen [Running]. This is usually the case when the master goes down
before MaxScale is started. When using this feature, the monitor will guess the GTID
domain id of the master from the slaves. For reliable results, the GTID:s of the cluster
should be simple.
replication_user and replication_password
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The username and password of the replication user. These are given as the values
for MASTER_USER and MASTER_PASSWORD whenever a CHANGE MASTER TO command is
executed.
Both replication_user and replication_password parameters must be defined if
a custom replication user is used. If neither of the parameters is defined, theCHANGE MASTER TO-command will use the monitor credentials for the replication
user.
The credentials used for replication must have the REPLICATION SLAVE
privilege.
replication_password uses the same encryption scheme as other password
parameters. If password encryption is in use, replication_password must be
encrypted with the same key to avoid erroneous decryption.
If set to ON, any CHANGE MASTER TO-command generated will set MASTER_SSL=1 to enable
encryption for the replication stream. This setting should only be enabled if the backend
servers are configured for ssl. This typically means setting ssl_ca, ssl_cert and_ssl_key_ in the server configuration file. Additionally, credentials for the replication
user should require an encrypted connection (e.g. ALTER USER repl@'%' REQUIRE SSL;).
If the setting is left OFF, MASTER_SSL is not set at all, which will preserve existing
settings when redirecting a slave connection.
Time limit for failover and switchover operations. The default
values are 90 seconds for both. switchover_timeout is also used as the time
limit for a rejoin operation. Rejoin should rarely time out, since it is a
faster operation than switchover.
The timeouts are specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the timeouts is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second.
If no successful failover/switchover takes place within the configured time
period, a message is logged and automatic failover is disabled. This prevents
further automatic modifications to the misbehaving cluster.
Enable additional master failure verification for automatic failover.verify_master_failure enables this feature and master_failure_timeout defines the timeout.
Failure verification is performed by checking whether the slave servers are
still connected to the master and receiving events. An event is either a change
in the Gtid_IO_Pos-field of the SHOW SLAVE STATUS output or a heartbeat
event. Effectively, if a slave has received an event withinmaster_failure_timeout duration, the master is not considered down when
deciding whether to failover, even if MaxScale cannot connect to the master.master_failure_timeout should be longer than the Slave_heartbeat_period of
the slave connection to be effective.
If every slave loses its connection to the master (Slave_IO_Running is not
"Yes"), master failure is considered verified regardless of timeout. This allows
faster failover when the master properly disconnects.
For automatic failover to activate, the failcount requirement must also be
met.
The master failure timeout is specified as documented here. If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the timeout is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second.
servers_no_promotion
Type: string
Mandatory: No
Dynamic: Yes
Default: None
This is a comma-separated list of server names that will not be chosen for
master promotion during a failover or autoselected for switchover. This does not
affect switchover if the user selects the server to promote. Using this setting
can disrupt new master selection for failover such that an non-optimal server is
chosen. At worst, this will cause replication to break. Alternatively, failover
may fail if all valid promotion candidates are in the exclusion list.
promotion_sql_file and demotion_sql_file
Type: string
Mandatory: No
Dynamic: Yes
Default: None
These optional settings are paths to text files with SQL statements in them.
During promotion or demotion, the contents are read line-by-line and executed on
the backend. Use these settings to execute custom statements on the servers to
complement the built-in operations.
Empty lines or lines starting with '#' are ignored. Any results returned by the
statements are ignored. All statements must succeed for the failover, switchover
or rejoin to continue. The monitor user may require additional privileges and
grants for the custom commands to succeed.
When promoting a slave to master during switchover or failover, thepromotion_sql_file is read and executed on the new master server after its
read-only flag is disabled. The commands are ran before starting replication
from an external master if any.
demotion_sql_file is ran on an old master during demotion to slave, before the
old master starts replicating from the new master. The file is also ran before
rejoining a standalone server to the cluster, as the standalone server is
typically a former master server. When redirecting a slave replicating from a
wrong master, the sql-file is not executed.
Since the queries in the files are ran during operations which modify
replication topology, care is required. If promotion_sql_file contains data
modification (DML) queries, the new master server may not be able to
successfully replicate from an external master. demotion_sql_file should never
contain DML queries, as these may not replicate to the slave servers before
slave threads are stopped, breaking replication.
If enabled, the monitor continuously queries the
servers for enabled scheduled events and uses this information when performing
cluster operations, enabling and disabling events as appropriate.
When a server is being demoted, any events with "ENABLED" status are set to
"SLAVESIDE_DISABLED". When a server is being promoted to master, events that are either
"SLAVESIDE_DISABLED" or "DISABLED" are set to "ENABLED" if the same event was also enabled
on the old master server last time it was successfully queried. Events are considered
identical if they have the same schema and name. When a standalone server is rejoined to
the cluster, its events are also disabled since it is now a slave.
The monitor does not check whether the same events were disabled and enabled during a
switchover or failover/rejoin. All events that meet the criteria above are altered.
The monitor does not enable or disable the event scheduler itself. For the
events to run on the new master server, the scheduler should be enabled by the
admin. Enabling it in the server configuration file is recommended.
Events running at high frequency may cause replication to break in a failover
scenario. If an old master which was failed over restarts, its event scheduler
will be on if set in the server configuration file. Its events will also
remember their "ENABLED"-status and run when scheduled. This may happen before
the monitor rejoins the server and disables the events. This should only be an
issue for events running more often than the monitor interval or events that run
immediately after the server has restarted.
Cooperative monitoring
As of MaxScale 2.5, MariaDB-Monitor supports cooperative monitoring. This means
that multiple monitors (typically in different MaxScale instances) can monitor
the same backend server cluster and only one will be the primary monitor. Only
the primary monitor may perform switchover, failover or rejoin operations.
The primary also decides which server is the master. Cooperative monitoring is
enabled with the cooperative_monitoring_locks-setting.
Even with this setting, only one monitor per server per MaxScale is allowed.
This limitation can be circumvented by defining multiple copies of a server in
the configuration file.
Cooperative monitoring uses server locks
for coordinating between monitors. When cooperating, the monitor regularly
checks the status of a lock named maxscale_mariadbmonitor on every server and
acquires it if free. If the monitor acquires a majority of locks, it is the
primary. If a monitor cannot claim majority locks, it is a secondary monitor.
The primary monitor of a cluster also acquires the lockmaxscale_mariadbmonitor_master on the master server. Secondary monitors check
which server this lock is taken on and only accept that server as the master.
This arrangement is required so that multiple monitors can agree on which server
is the master regardless of replication topology. If a secondary monitor does
not see the master-lock taken, then it won't mark any server as [Master],
causing writes to fail.
The lock-setting defines how many locks are required for primary status. Settingcooperative_monitoring_locks=majority_of_all means that the primary monitor
needs n_servers/2 + 1 (rounded down) locks. For example, a cluster of three
servers needs two locks for majority, a cluster of four needs three, and a
cluster of five needs three.
This scheme is resistant against split-brain situations in the sense
that multiple monitors cannot be primary simultaneously. However, a split may
cause both monitors to consider themselves secondary, in which case a master
server won't be detected.
Even without a network split, cooperative_monitoring_locks=majority_of_all
will lead to neither monitor claiming lock majority once too many servers go
down. This scenario is depicted in the image below. Only two out of four servers
are running when three are needed for majority. Although both MaxScales see both
running servers, neither is certain they have majority and the cluster stays in
read-only mode. If the primary server is down, no failover is performed either.
Setting cooperative_monitoring_locks=majority_of_running changes the way_n_servers_ is calculated. Instead of using the total number of servers, only
servers currently [Running] are considered. This scheme adapts to multiple
servers going down, ensuring that claiming lock majority is always possible.
However, it can lead to multiple monitors claiming primary status in a
split-brain situation. As an example, consider a cluster with servers 1 to 4
with MaxScales A and B, as in the image below. MaxScale A can connect to
servers 1 and 2 (and claim their locks) but not to servers 3 and 4 due to
a network split. MaxScale A thus assumes servers 3 and 4 are down. MaxScale B
does the opposite, claiming servers 3 and 4 and assuming 1 and 2 are down.
Both MaxScales claim two locks out of two available and assume that they have
lock majority. Both MaxScales may then promote their own primaries and route
writes to different servers.
The recommended strategy depends on which failure scenario is more likely and/or
more destructive. If it's unlikely that multiple servers are ever down
simultaneously, then majority_of_all is likely the safer choice. On the other
hand, if split-brain is unlikely but multiple servers may be down
simultaneously, then majority_of_running would keep the cluster operational.
To check if a monitor is primary, fetch monitor diagnostics with maxctrl show monitors or the REST API. The boolean field primary indicates whether the
monitor has lock majority on the cluster. If cooperative monitoring is disabled,
the field value is null. Lock information for individual servers is listed in
the server-specific field lock_held. Again, null indicates that locks are
not in use or the lock status is unknown.
If a MaxScale instance tries to acquire the locks but fails to get majority
(perhaps another MaxScale was acquiring locks simultaneously) it will release
any acquired locks and try again after a random number of monitor ticks. This
prevents multiple MaxScales from fighting over the locks continuously as one
MaxScale will eventually wait less time than the others. Conflict probability
can be further decreased by configuring each monitor with a different_monitor_interval_.
The flowchart below illustrates the lock handling logic.
Releasing locks
Monitor cooperation depends on the server locks. The locks are
connection-specific. The owning connection can manually release a lock, allowing
another connection to claim it. Also, if the owning connection closes, the
MariaDB Server process releases the lock. How quickly a lost connection is
detected affects how quickly the primary monitor status moves from one monitor
and MaxScale to another.
If the primary MaxScale or its monitor is stopped normally, the monitor
connections are properly closed, releasing the locks. This allows the secondary
MaxScale to quickly claim the locks. However, if the primary simply vanishes
(broken network), the connection may just look idle. In this case, the
MariaDB Server may take a long time before it considers the monitor connection
lost. This time ultimately depends on TCP keepalive settings on the machines
running MariaDB Server.
On MariaDB Server 10.3.3 and later, the TCP keepalive settings can be configured
for just the server process. See
for information on settings tcp_keepalive_interval, tcp_keepalive_probes and_tcp_keepalive_time_. These settings can also be set on the operating system
level, as described here.
As of MaxScale 6.4.16, 22.08.13, 23.02.10, 23.08.6 and 24.02.2, configuring
TCP keepalive is no longer necessary as monitor sets the session wait_timeout
variable when acquiring a lock. This causes the MariaDB Server to close the
monitor connection if the connection appears idle for too long. The value of_wait_timeout_ used depends on the monitor interval and connection timeout
settings, and is logged at MaxScale startup.
A monitor can also be ordered to manually release its locks via the module
command release-locks. This is useful for manually changing the primary
monitor. After running the release-command, the monitor will not attempt to
reacquire the locks for one minute, even if it wasn't the primary monitor to
begin with. This command can cause the cluster to become temporarily unusable by
MaxScale. Only use it when there is another monitor ready to claim the locks.
Troubleshooting
Failover/switchover fails
Before performing failover or switchover, the MariaDB Monitor first checks that
prerequisites are fulfilled, printing any found errors. This should catch and
explain most issues with failover or switchover not working. If the operations
are attempted and still fail, then most likely one of the commands the monitor
issued to a server failed or timed out. The log should explain which query failed.
To print out all queries sent to the servers, start MaxScale with--debug=enable-statement-logging. This setting prints all queries sent to the
backends by monitors and authenticators. The printed queries may include
usernames and passwords.
A typical reason for failure is that a command such as STOP SLAVE takes longer than thebackend_read_timeout of the monitor, causing the connection to break. As of 2.3, the
monitor will retry most such queries if the failure was caused by a timeout. The retrying
continues until the total time for a failover or switchover has been spent. If the log
shows warnings or errors about commands timing out, increasing the backend timeout
settings of the monitor should help. Another settings to look at are query_retries andquery_retry_timeout. These are general MaxScale settings described in the Configuration guide. Settingquery_retries to 2 is a reasonable first try.
Slave detection shows external masters
If a slave is shown in maxctrl as "Slave of External Server" instead of
"Slave", the reason is likely that the "Master_Host"-setting of the replication connection
does not match the MaxScale server definition. As of 2.3.2, the MariaDB Monitor by default
assumes that the slave connections (as shown by SHOW ALL SLAVES STATUS) use the exact
same "Master_Host" as used the MaxScale configuration file server definitions. This is
controlled by the setting assume_unique_hostnames.
Using the MariaDB Monitor With Binlogrouter
Since MaxScale 2.2 it's possible to detect a replication setup
which includes Binlog Server: the required action is to add the
binlog server to the list of servers only if master_id identity is set.
CREATE USER 'maxscale'@'maxscalehost' IDENTIFIED BY 'maxscale-password';
GRANT REPLICATION CLIENT ON *.* TO 'maxscale'@'maxscalehost';
GRANT REPLICATION SLAVE ADMIN ON *.* TO 'maxscale'@'maxscalehost';
GRANT REPLICA MONITOR ON *.* TO 'maxscale'@'maxscalehost';
GRANT FILE ON *.* TO 'maxscale'@'maxscalehost';
GRANT CONNECTION ADMIN ON *.* TO 'maxscale'@'maxscalehost';
GRANT SUPER, RELOAD, PROCESS, SHOW DATABASES, EVENT ON *.* TO 'maxscale'@'maxscalehost';
GRANT SELECT ON mysql.user TO 'maxscale'@'maxscalehost';
GRANT RELOAD, PROCESS, SHOW DATABASES, EVENT, SET USER, READ_ONLY ADMIN ON *.* TO 'maxscale'@'maxscalehost';
GRANT REPLICATION SLAVE ADMIN, BINLOG ADMIN, CONNECTION ADMIN ON *.* TO 'maxscale'@'maxscalehost';
GRANT SELECT ON mysql.user TO 'maxscale'@'maxscalehost';
CREATE USER 'replication'@'replicationhost' IDENTIFIED BY 'replication-password';
GRANT REPLICATION SLAVE ON *.* TO 'replication'@'replicationhost';