Scale-In with MariaDB Xpand 5.3 and 6.0

Overview

MariaDB Xpand 5.3 and 6.0 can be scaled in, which reduces the number of Xpand nodes in the cluster.

Compatibility

Information provided here applies to:

  • MariaDB Xpand 5.3

  • MariaDB Xpand 6.0

Use Cases

Some use cases for scale-in:

  • To reduce operating costs following a peak event (i.e., following Cyber-Monday).

  • To allocate servers for other purposes.

  • To remove failing hardware. (See ALTER CLUSTER DROP to drop a permanently failed node.)

Review Target Cluster Configuration

  • MariaDB Xpand requires a minimum of three nodes to support production systems. Going from three or more nodes to a single node is not supported via the steps outlined on this page.

  • When Zones are configured, Xpand requires a minimum of 3 zones.

  • For clusters deployed in zones, Xpand requires an equal number of nodes in each zone.

  • Ensure that the target cluster has has sufficient space. See Allocating Disk Space for Fault Tolerance and Availability.

Scale-In

Step 1: Obtain Node IDs

On any Xpand node, obtain the node ID for each Xpand node that will be removed from the cluster by querying the system.nodeinfo system table:

SELECT *
FROM system.nodeinfo
ORDER BY nodeid;

Step 2: Softfail Nodes

On any Xpand node, softfail one or more Xpand nodes by executing ALTER CLUSTER SOFTFAIL and specifying a comma-separated list of node IDs:

ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ...

When one or more nodes are softfailed, Xpand directs the Rebalancer to move all data from the softfailed nodes to other nodes within the cluster. The Rebalancer performs this task in the background without interrupting queries.

Step 3: Verify Softfailed Nodes

On any Xpand node, verify that the specified Xpand nodes are softfailed by querying the system.softfailed_nodes system table:

SELECT *
FROM system.softfailed_nodes;

Step 4: Wait for Nodes to Softfail

On any Xpand node, monitor the progress of the node softfail operation using one or more of the following methods and wait for all specified Xpand nodes to be softfailed:

  • Monitor the number of containers still waiting to be moved to a new node by querying the number of rows in the system.softfailing_containers system table:

    SELECT COUNT(1)
    FROM system.softfailing_containers;
    

    When the query returns 0, there are no more containers waiting to be moved, so all nodes have been softfailed, and you can proceed to the next step.

  • Monitor the number of softfailed nodes that are ready for removal by joining the system.softfailed_nodes and system.softfailing_containers system tables:

    SELECT *
    FROM system.softfailed_nodes
    WHERE nodeid NOT IN(
       SELECT DISTINCT nodeid
       FROM system.softfailing_containers
    );
    

    When the query returns the node IDs for all softfailed nodes, all softfailed nodes are ready for removal, so all nodes have been softfailed, and you can proceed to the next step.

Step 5: Reform Cluster

On any Xpand node, reform the cluster by executing ALTER CLUSTER REFORM:

ALTER CLUSTER REFORM;

This will initiate a brief interruption of service while the cluster is reformed.

If your Xpand cluster does not have binary logs, the softfailed nodes have been removed from the cluster, and the operation is complete, so the rest of this procedure can be skipped.

If your Xpand cluster has binary logs, the previous ALTER CLUSTER REFORM results in the softfailed nodes remaining part of the cluster, but the cluster assigns the nodes the LEAVING state, so they are not chosen to be acceptors. The following message is written to the Xpand log:

INFO dbcore/dbstate.c:292 dbprepare_done(): Running dbstart for membership afffe { 1-3 leaving: 2}

In this case, the softfail operation is not complete until the binary logs have been softfailed.

Step 6: Wait for Binary Logs to Softfail

On any Xpand node, monitor the progress of the binary log softfail operation by querying the system.binlog_commits_segments system table and wait for all binary logs to be softfailed:

SELECT count(1)
FROM system.binlog_commits_segments
WHERE softfailed_replicas > 0;

When the query returns 0, all binary logs have been rebalanced, so the binary logs have been softfailed, and you can proceed to the next step.

Step 7: Reform Cluster Again

If there are no more softfailing containers (see "Step 4: Wait for Nodes to Softfail") and there are no more softfailing binary logs (see "Step 6: Wait for Binary Logs to Softfail"), the softfailed nodes are ready to be removed from the cluster, so the following message is written to the Xpand log:

INFO dbcore/softfail.ct:27 softfail_node_msg_signal(): softfailing nodes are ready to be removed: 2

On any Xpand node, reform the cluster again to remove the softfailed nodes by executing ALTER CLUSTER REFORM:

ALTER CLUSTER REFORM;

This will initiate a brief interruption of service while the softfailed nodes are removed from the cluster and the cluster is re-formed.

Optional: Cancel the Softfail Operation

On any Xpand node, a SOFTFAIL operation can be canceled before it completes by executing ALTER CLUSTER UNSOFTFAIL and specifying the node IDs:

ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...

The cluster will be restored to its prior configuration.

Troubleshooting

The SOFTFAIL operation raises an error when certain issues occur, including:

  • Xpand does not have sufficient storage space to rebalance the data stored on the nodes

  • Xpand does not have enough nodes to protect the data