Scale-In with MariaDB Xpand 5.3 and 6.0
This page is part of MariaDB's Documentation.
The parent of this page is: Scale-In with MariaDB Xpand
Topics on this page:
Overview
MariaDB Xpand 5.3 and 6.0 can be scaled in, which reduces the number of Xpand nodes in the cluster.
Compatibility
Information provided here applies to:
MariaDB Xpand 5.3
MariaDB Xpand 6.0
Use Cases
Some use cases for scale-in:
To reduce operating costs following a peak event (i.e., following Cyber-Monday).
To allocate servers for other purposes.
To remove failing hardware. (See
ALTER CLUSTER DROP
to drop a permanently failed node.)
Review Target Cluster Configuration
MariaDB Xpand requires a minimum of three nodes to support production systems. Going from three or more nodes to a single node is not supported via the steps outlined on this page.
When Zones are configured, Xpand requires a minimum of 3 zones.
For clusters deployed in zones, Xpand requires an equal number of nodes in each zone.
Ensure that the target cluster has has sufficient space. See Allocating Disk Space for Fault Tolerance and Availability.
Scale-In
Step 1: Obtain Node IDs
On any Xpand node, obtain the node ID for each Xpand node that will be removed from the cluster by querying the system.nodeinfo
system table:
SELECT *
FROM system.nodeinfo
ORDER BY nodeid;
Step 2: Softfail Nodes
On any Xpand node, softfail one or more Xpand nodes by executing ALTER CLUSTER SOFTFAIL
and specifying a comma-separated list of node IDs:
ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ...
When one or more nodes are softfailed, Xpand directs the Rebalancer to move all data from the softfailed nodes to other nodes within the cluster. The Rebalancer performs this task in the background without interrupting queries.
Step 3: Verify Softfailed Nodes
On any Xpand node, verify that the specified Xpand nodes are softfailed by querying the system.softfailed_nodes
system table:
SELECT *
FROM system.softfailed_nodes;
Step 4: Wait for Nodes to Softfail
On any Xpand node, monitor the progress of the node softfail operation using one or more of the following methods and wait for all specified Xpand nodes to be softfailed:
Monitor the number of containers still waiting to be moved to a new node by querying the number of rows in the
system.softfailing_containers
system table:SELECT COUNT(1) FROM system.softfailing_containers;
When the query returns
0
, there are no more containers waiting to be moved, so all nodes have been softfailed, and you can proceed to the next step.Monitor the number of softfailed nodes that are ready for removal by joining the
system.softfailed_nodes
andsystem.softfailing_containers
system tables:SELECT * FROM system.softfailed_nodes WHERE nodeid NOT IN( SELECT DISTINCT nodeid FROM system.softfailing_containers );
When the query returns the node IDs for all softfailed nodes, all softfailed nodes are ready for removal, so all nodes have been softfailed, and you can proceed to the next step.
Step 5: Reform Cluster
On any Xpand node, reform the cluster by executing ALTER CLUSTER REFORM
:
ALTER CLUSTER REFORM;
This will initiate a brief interruption of service while the cluster is reformed.
If your Xpand cluster does not have binary logs, the softfailed nodes have been removed from the cluster, and the operation is complete, so the rest of this procedure can be skipped.
If your Xpand cluster has binary logs, the previous ALTER CLUSTER REFORM
results in the softfailed nodes remaining part of the cluster, but the cluster assigns the nodes the LEAVING
state, so they are not chosen to be acceptors. The following message is written to the Xpand log:
INFO dbcore/dbstate.c:292 dbprepare_done(): Running dbstart for membership afffe { 1-3 leaving: 2}
In this case, the softfail operation is not complete until the binary logs have been softfailed.
Step 6: Wait for Binary Logs to Softfail
On any Xpand node, monitor the progress of the binary log softfail operation by querying the system.binlog_commits_segments
system table and wait for all binary logs to be softfailed:
SELECT count(1)
FROM system.binlog_commits_segments
WHERE softfailed_replicas > 0;
When the query returns 0
, all binary logs have been rebalanced, so the binary logs have been softfailed, and you can proceed to the next step.
Step 7: Reform Cluster Again
If there are no more softfailing containers (see "Step 4: Wait for Nodes to Softfail") and there are no more softfailing binary logs (see "Step 6: Wait for Binary Logs to Softfail"), the softfailed nodes are ready to be removed from the cluster, so the following message is written to the Xpand log:
INFO dbcore/softfail.ct:27 softfail_node_msg_signal(): softfailing nodes are ready to be removed: 2
On any Xpand node, reform the cluster again to remove the softfailed nodes by executing ALTER CLUSTER REFORM
:
ALTER CLUSTER REFORM;
This will initiate a brief interruption of service while the softfailed nodes are removed from the cluster and the cluster is re-formed.
Optional: Cancel the Softfail Operation
On any Xpand node, a SOFTFAIL
operation can be canceled before it completes by executing ALTER CLUSTER UNSOFTFAIL
and specifying the node IDs:
ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...
The cluster will be restored to its prior configuration.
Troubleshooting
The SOFTFAIL
operation raises an error when certain issues occur, including:
Xpand does not have sufficient storage space to rebalance the data stored on the nodes
Xpand does not have enough nodes to protect the data