Fault Tolerance with MariaDB Xpand


Xpand provides fault tolerance by maintaining multiple copies of data across the cluster. This enables a cluster to experience the loss of node(s) or zone(s), without data loss and allowing the cluster to automatically resume operations.

Built-in Fault Tolerance

By default, Xpand is configured to accommodate a single node failure and automatically maintain 2 copies (replicas) of all data. As long as the cluster has sufficient replicas and a quorum of nodes is available, a cluster can lose a node without experiencing any data loss. Clusters with zones configured can lose a single zone.

The default settings for fault tolerance are generally acceptable for most clusters.

Zones and Fault Tolerance

Xpand can be configured to be zone-aware. A zone is an arbitrary grouping of Xpand nodes. Different zones could represent different availability zones in a cloud environment, different server racks, different network switches, and/or different power sources. Different zones should not be used to represent different regions, because the distance between regions tends to cause higher network latency, which can negatively affect performance.

To configure Xpand to be zone-aware, you can configure the zone for each node using the ALTER CLUSTER .. ZONE statement.

When Xpand is configured to use zones, Xpand's default zone-aware configuration can tolerate a single zone failure without experiencing any data loss. However, Xpand can be configured to tolerate more failures by setting MAX_FAILURES.

For additional information, see "Zones with MariaDB Xpand".

Using Replication

Setting up disaster recovery site for your Xpand cluster will allow you to recover from catastrophic failures. Setting up a secondary Xpand cluster for DR will also allow for easier transition to new releases. For information regarding the various replication configurations supported by Xpand, please see "Configure Replication with MariaDB Xpand".


MariaDB Xpand's fault tolerance is configurable. By default, MariaDB Xpand is configured to tolerate a single node failure or a single zone failure. If the default configuration does not provide sufficient fault tolerance for your needs, Xpand can be configured to provide even more fault tolerance.

To configure Xpand to provide a higher level of fault tolerance, you can configure the maximum number of node or zone failures using the ALTER CLUSTER SET MAX_FAILURES statement.

When the maximum number of failures is increased, Xpand maintains additional replicas for each slice, ensuring that Xpand can handle a greater simultaneous loss of nodes or zones without experiencing data loss.

When Xpand is configured to maintain additional physical replicas, Xpand's overhead increases for updates to logical slices, because each additional replica is updated synchronously when the slice is updated. Each additional replica requires the transaction to perform:

  • Additional network communication to transfer the updated data to the node hosting the replica

  • Additional disk writes to update the replica on disk

In some cases, the additional overhead can decrease throughput and increase commit latency, which can have negative effects for the application. If you choose to configure Xpand to maintain additional replicas, it is recommended to perform performance testing.

For additional information, see "MAX_FAILURES for MariaDB Xpand".

Effects of a Node or Zone Failure

When Xpand experiences a node or zone failure:

  1. A node or zone can no longer communicate with other nodes, so it fails a heartbeat check.

    Xpand with Failed Node
  2. A short group change occurs, during which Xpand updates membership. Once this is complete, Xpand resumes processing transactions.

  3. The Rebalancer starts a timer that counts down from the global rebalancer_reprotect_queue_interval_s value (default = 10 minutes).

  4. The Rebalancer establishes a Recovery Queue of pending changes for the node's or the zone's data and tracks all pending changes for that node or zone in that queue. The reprotect queue is necessary only if the failure is temporary.

  5. The next steps depend on whether the failed node or zone returns before the timer exceeds the rebalancer_reprotect_queue_interval_s value.

    If the failed node or zone returns within the interval:

    1. The Rebalancer applies the transactions in the reprotect queue to the returned node or zone.

    2. Operations resume normally.

    If the failed node or zone does not return within the interval:

    1. The Rebalancer discards the queued transactions.

    2. The Rebalancer reprotects slices from the failed node or zone by creating new replicas to replace the ones that were lost. If the failed node(s) contained any ranking replicas, Xpand assigns that role to another replica.

      Xpand Rebalancer
    3. When the reprotect process completes, Xpand sends a message indicating that full protection has been restored using Email Alerts. Xpand also writes an entry to query.log.

    4. The failed/unavailable node(s) can be safely removed with ALTER CLUSTER DROP.

Group Change Effects on Transactions

If a transaction is interrupted by a group change or encounters a non-fatal error, Xpand automatically retries the transaction in some cases.

Transactions will only be retried if the global value of the autoretry system variable is set to TRUE.

The following types of transactions can be retried:

  • A single statement transaction executed with autocommit = 1

  • The first statement in an explicit transaction

If Xpand retries a transaction and the transaction fails, Xpand returns an error to the application.

The following types of transactions cannot be retried:

  • Subsequent statements in an explicit transaction

  • Stored procedures

  • Stored functions

If Xpand can't retry the transaction and your application is connecting to MaxScale's Read/Write Split (readwritesplit) router, you can configure MaxScale automatically retry some transactions by configuring the delayed_retry and transaction_replay parameters.

Group Change Effects on Connections

When a group change occurs, connections can be affected:

  • If a connection was opened to a node still in quorum, the connection will remain open after the new group is formed with the available nodes.

  • If a connection was opened to a node that is no longer in quorum, the connection will be lost.

When the connection is lost, users have a couple ways to automatically re-establish a connection to a valid node:

  • If your application is connecting to MaxScale's Read/Write Split (readwritesplit) router, you can configure MaxScale to automatically reconnect by configuring the master_reconnection parameter.

  • If your application is using a MariaDB Connector that supports automatically reconnecting, you can enable that feature in the connector.

    • If you are using MariaDB Connector/C, the MYSQL_OPT_RECONNECT option can be set with the mysql_optionsv() function:

      /* enable auto-reconnect */
      mysql_optionsv(mysql, MYSQL_OPT_RECONNECT, (void *)"1");
    • If you are using MariaDB Connector/J, the autoReconnect parameter can be set for the connection:

      Connection connection = DriverManager.getConnection("jdbc:mariadb://");
    • If you are using MariaDB Connector/Python, the auto_reconnect parameter can be set for the connection:

      conn = mariadb.connect(