MaxScale, manual control, external monitors and notification methods

One of the nice things about the “plug and play” approach of MaxScale is that people constantly find ways of using it that were not originally envisaged when we designed MaxScale. One such configuration that I have heard of from multiple sources is using monitoring outside of MaxScale itself. This post will discuss a little about how monitoring works and how it can be moved outside of MaxScale. In particular a simplified example will be presented which shows how to use the notification mechanism in Galera to control MaxScale’s use of the nodes in a Galera cluster.

Monitoring Within MaxScale

Perhaps it is best to start with a little background as to what the function of the monitor plugin is within MaxScale, how they work and how the plugins communicate with the other components of MaxScale.

MaxScale monitors the servers for one reason only, so that it is able to feed the routing algorithms with the state they require in order to make the decision as to which server the request the router is dealing with should be sent. In particular the monitoring is not designed to be a mechanism that administrators would use to determine the health of the database cluster or to provide critical alarms. There are other tools that are much more focused to preforming that task than MaxScale.

Monitor plugins within MaxScale have one of the least complex interface requirements of any plugin module within MaxScale, other than the entry points used to configure options and for diagnostic purposes there are essentially four entry points. A pair of entry points to start and stop the monitor and a pair of entry points to register and unregister a server that should be monitored. The monitor implementation is expected to create a thread of its own on which to run, it will then monitor the servers that are registered with it asynchronously from any operations within MaxScale.

The monitor threads communicate with the rest of MaxScale by setting flag bits on the individual servers to reflect the state of the server and adding values to some monitor owned fields within the servers. These “monitor owned” fields are related to the measurement of specific quantities, namely the replication lag with a cluster. The mechanism of more interest here is the status bits within the server structure. The bit values that are of interest are;

  • Running – the server is running the database, connections can be made to the server and SQL statements executed.
  • Joined – the server is a member of a multi-master cluster and is able to accept both read and write statements to execute on that cluster.
  • Master – the server is a member of a cluster and can be considered as a master of that cluster, i.e. it can accept both read and write requests.
  • Slave – the server is a member of a cluster and is able to accept read requests for that cluster only.
  • Maintenance – the server is running however it is in a maintenance mode, new connections or operations should not be sent to the server.

It is possible to set all of these bits manually, however under normal circumstances, when a monitor modules has been added to the configuration file, these will be changed back periodically by the monitor to reflect the situation as observed by the monitor. Therefore in order to effectively control the state of the servers either manually or externally of MaxScale it is important that no monitor module within MaxScale is monitoring the same set of hosts.

Manually Setting Server Status

The MaxScale command line interface application, maxadmin, allows the server status bits to be set and cleared manually using the set server command. MaxAdmin will also take commands as arguments on the command line, therefore a simple command can be used to set the status of a database server.

$ maxadmin -pskysql set server dbserver1 master

The above command will set the server which has a section name of dbserver1 to be a master server, note this has no impact on any other server status bits or on the bits of other servers. If the dbserver1 server previously had the slave bit set it would remain set. Likewise if the server dbserver2 had the master bit set before the call then it will still be set after this call.

To clear a status bit on a server the clear server command should be used.

$ maxadmin -pskysql clear server dbserver1 master

As soon as a manual method of control is available, using these maxadmin scripts, this provides a mechanism to also integrate with third party monitors or use any state change notification that exists within your cluster management software.

Using Galera Notify Command Option

The Galera Cluster provides a mechanism to have the cluster make a call to an external application whenever the cluster state changes, this can be utilised to have the Galera Cluster directly control MaxScale rather than have MaxScale monitor the status of the Galera Cluster and responded to the monitored state values. This approach has the advantage that MaxScale will be informed immediately of a state change within the Galera Cluster, however the disadvantage of this mechanism is that at least one node in the cluster must be running in order to trigger the notification process.

The wsrep_notify_cmd option in the Galera Cluster can be used to define the name of an application or script that is called when the status of the node changes or the membership of the cluster changes. This script is called with a number of parameters that need to be parsed in order to determine what has happened in the cluster. At the simplest level this can be used to set to status of the node on which it is executed. When the status of the current node changes the wsrep_notify_cmd is called with the –status= argument. A very simple script might be


#!/bin/sh

while [ $# -gt 0 ] do
    case $1 in
    --status)
        STATUS=$2
        shift
        ;;
    esac
    shift
done

if [ "$STATUS" == "Undefined" ]
then
    maxadmin -p$PASSWD clear server $SERVER synced
    maxadmin -p$PASSWD clear server $SERVER running
else
    case $STATUS in 
    Joiner)
        maxadmin -p$PASSWD set server $SERVER running
        maxadmin -p$PASSWD clear server $SERVER synced
        ;;
    Donor)
        maxadmin -p$PASSWD set server $SERVER running
        maxadmin -p$PASSWD set server $SERVER synced
        ;;
    Joined)
        maxadmin -p$PASSWD set server $SERVER running
        maxadmin -p$PASSWD clear server $SERVER synced
        ;;
    Synced)
        maxadmin -p$PASSWD set server $SERVER running
        maxadmin -p$PASSWD set server $SERVER synced
        ;;
    Error/
        maxadmin -p$PASSWD clear server $SERVER running
        maxadmin -p$PASSWD clear server $SERVER synced
        ;;
    esac
fi

Note the the above script will marked a server as running and joined to the cluster is the Galera status says that it is either a fully synchronised node or it is a donor node, this differs from the default Galera Monitor plugin in that the plugin will remove the donor node from consideration for traffic to be routed to it.

In order to make this script useful two more things must be done, the variable PASSWD should be set to the password of the admin user for admin and the variable SERVER should be set to the name by which this server is known in the MaxScale configuration file.

Other Notification Methods

Although the above example is based around the Galera wsrep_notify_cmd mechanism the principles still hold for other environments, for example the MHA scripts can be enhanced in order to allow MHA to control the server status bits if replication clusters. The MonYog http based API can be used such that when MonYog monitors the change of state of a node it will make an API call to an HTTP server which will execute the set of maxadmin commands needed to reflect the monitored change without MaxScale. This style of manual (or external) control of MaxScale could also be integrated into pacemaker resource agents in order to synchronise the changes to the database with the changes in MaxScale.