Scale-In with MariaDB Xpand

Overview

MariaDB Xpand can be scaled in, which reduces the number of Xpand nodes in the cluster.

Use Cases

Some use cases for scale-in:

  • To reduce operating costs following a peak event (i.e., following Cyber-Monday).

  • To allocate servers for other purposes.

  • To remove failing hardware. (See ALTER CLUSTER DROP to drop a permanently failed node.)

Review Target Cluster Configuration

  • MariaDB Xpand requires a minimum of three nodes to support production systems. Going from three or more nodes to a single node is not supported via the steps outlined on this page.

  • When Zones are configured, Xpand requires a minimum of 3 zones.

  • For clusters deployed in zones, Xpand requires an equal number of nodes in each zone.

  • Ensure that the target cluster has has sufficient space. See Allocating Disk Space for Fault Tolerance and Availability.

Scale-In

  1. Execute ALTER CLUSTER SOFTFAIL.

    Marking a node as softfailed directs the Xpand Rebalancer to move all data from the node(s) specified to others within the cluster. The Rebalancer proceeds in the background while the database continues to serve your ongoing production needs.

    If necessary, determine the nodeid assigned to a given IP or hostname using the following query:

    SELECT * FROM system.nodeinfo
    ORDER BY nodeid;
    

    To initiate a SOFTFAIL operation, use ALTER CLUSTER SOFTFAIL:

    ALTER CLUSTER SOFTFAIL nodeid [, nodeid] ...
    

    The SOFTFAIL operation will issue an error if there is not sufficient space to complete the operation or if the operation would leave the cluster unable to protect data should an additional node be lost.

    To cancel a SOFTFAIL operation before it completes, use ALTER CLUSTER UNSOFTFAIL. Your system will be restored to its prior configuration.

    ALTER CLUSTER UNSOFTFAIL nodeid [, nodeid] ...
    
  2. Monitor the SOFTFAIL operation.

    Once marked as softfailed, the Rebalancer moves data from the softfailed node(s). The Rebalancer process runs in the background while foreground processing continues to serve your production workload.

    To monitor the progress of the SOFTFAIL operation:

    • Verify that the node(s) you specified are indeed marked for removal:

      SELECT * FROM system.softfailed_nodes;
      
    • The system.softfailing_containers tables will show the list of containers that are slated to be moved as part of the SOFTFAIL operation. When the following query returns 0, the data migration is complete:

      SELECT COUNT(1) FROM system.softfailing_containers;
      
    • The following query shows the list of softfailed node(s) that are ready for removal:

      SELECT * FROM system.softfailed_nodes
      WHERE nodeid NOT IN
         (SELECT DISTINCT nodeid
          FROM system.softfailing_containers);
      
  3. Execute ALTER CLUSTER REFORM.

    Once data has been moved off the nodes and there are no more entries in system.softfailing_containers, run ALTER CLUSTER REFORM:

    ALTER CLUSTER REFORM;
    

    This will initiate a brief interruption of service while the cluster is re-formed. If you do not have any binlogs, the softfailed node(s) will be removed from the cluster, and the flex down operation is complete. If you have binlogs, continue with the steps that follow.

  4. Wait for binlog softfail.

    If your cluster has binlogs, the previous ALTER CLUSTER REFORM will result in the softfailed node being part of the cluster, but designated to be in the LEAVING state and will not be chosen as an acceptor:

    INFO dbcore/dbstate.c:292 dbprepare_done(): Running dbstart for membership afffe { 1-3 leaving: 2}
    

    In the meantime, the binlog_commits table is being rebalanced across the non-softfailed nodes. When the following query returns 0, the binlog_commits table is done being rebalanced:

    SELECT count(1) FROM system.binlog_commits_segments WHERE softfailed_replicas > 0;
    

    When the binlog_commits table is done being rebalanced, the following log message will appear on all nodes:

    INFO dbcore/softfail.ct:27 softfail_node_msg_signal(): softfailing nodes are ready to be removed: 2
    
  5. Execute ALTER CLUSTER REFORM again.

    If there are no more softfailing containers (see step 2) and if the binlog_commits table is done being rebalanced (see step 4), run ALTER CLUSTER REFORM:

    ALTER CLUSTER REFORM;
    

    This will remove the softfailed nodes from the cluster.