Configure MAX_FAILURES for MariaDB Xpand

Overview

Increasing the value for max_failures increases the number of replicas created, which can have a significant performance impact to writes and requires additional disk space. Due to the high performance overhead, Xpand does not recommend exceeding MAX_FAILURES = 2.

To change the value of the global max_failures, perform the steps outlined below. In this sample, the number of allowable failures is modified from the default of 1 to 2. This will cause all new tables and indexes to be created with REPLICAS = 3.

Step 1: Ensure there is sufficient disk space

Ensure that the cluster has sufficient disk space for additional replicas. For more details on how to determine disk space required, see "Allocate Disk Space for Fault Tolerance and Availability with MariaDB Xpand".

Step 2: Set the Cluster-Wide Failure Threshold

sql>  ALTER CLUSTER SET MAX_FAILURES = 2;

Note

Running this command will result in a group change and an interruption in service.

Step 3: Alter Existing Tables

Tables created after the value for max_failures has been updated will automatically have sufficient replicas. However, tables created before max_failures was updated may not have sufficient replicas and need to be altered. This query generates ALTER statements for all representations that are under-protected.

sql> SELECT concat ('ALTER TABLE ',`database`, '.', `Table`, ' REPLICAS = MAX_FAILURES + 1;')
     FROM system.table_replicas
     where database not in ('system', 'clustrix_dbi', 'clustrix_statd', '_replication')
     GROUP BY `table`, `database`
          having (count(1) / count(distinct slice)) < MAX_FAILURES + 1;

The resulting SQL will look like:

sql> ALTER TABLE foo REPLICAS = 3;

Run the generated script and monitor the Rebalancer as it creates additional replicas. For additional information, see "Manage the Rebalancer for MariaDB Xpand".

Log Messages

When the value for max_failures is modified, you will see an entry in clustrix.log that notes the number of failures that are configured:

INFO tm/gtm_resolve.c:168 gtm_r_validate_paxos_f(): group 1cfffe supports 2 simultaneous failures