ALTER CLUSTER SET MAX_FAILURES

Overview

Modifies an Xpand deployment, configures the Rebalancer to sets the number of replicas and acceptors to maintain, specifying the maximum number of Xpand Nodes that can fail simultaneously without data loss.

When using the Xpand Storage Engine topology, the details described here only apply when you connect to the Xpand nodes.

DETAILS

ALTER CLUSTER SET MAX_FAILURES num

Using this statement, you can reconfigure the Rebalancer to set the MAX_FAILURES value.

MAX_FAILURES is the number of failures that can occur simultaneously while ensuring that no data is lost. By default for (non-zone) configurations, a single node can become unavailable and Xpand will resume operations without data loss. When zones are configured, a node or a zone (with any number of nodes per zone) can become unavailable with no loss of data.

The value of MAX_FAILURES is also used to determine the default number of replicas created for a table or index. For the default of MAX_FAILURES = 1, new tables are created with REPLICAS = 2.

Prerequisites for MAX_FAILURES

For a deployment to tolerate the configured value for MAX_FAILURES:

  • All representations must have sufficient replicas. If MAX_FAILURES is updated, all tables created previously must have their replicas updated manually.

  • There must be a quorum (at least N/2+1) of nodes available

  • Xpand recommends provisioning enough disk space so that the deployment has enough space to re-protect after an unexpected failure.

MAX_FAILURES = 1

The default configuration for Xpand is MAX_FAILURES = 1, which indicates REPLICAS = 2.

In the following example, Table A is configured with SLICES = 2 and REPLICAS = 2 (default for MAX_FAILURES = 1). Slices are labeled A1, and A2, and A3 and prime notation is used to denote replicas. A1 and A1' are different replicas of the same slice.

MAX_FAILURES = 1 and SLICES = 3

In this configuration any node can be lost with no data loss as the remaining nodes will contain a copy of the slice. After the loss of a node, ClustrixDB will continue to operate using the remaining nodes and work to create additional copies for replicas that were lost.

The default configuration of Xpand is to have slices equal to the number of nodes (hash_dist_min_slices = 0). In this example, A1-A5 are used to label slices, and A1 and A1' are replicas of the same slice.

MAX_FAILURES = 1 and SLICES = 5

Any node can be lost with no data loss as the remaining nodes will contain a copy of the slice. Increasing the number of slices and deployment size increases parallelism and improves performance but does not change the degree of fault tolerance.

MAX_FAILURES = 2

To configure Xpand to tolerate additional failures, you can increase the value of MAX_FAILURES.

Note

Increasing the value of MAX_FAILURES increases the number of replicas required, which can have a significant performance impact to writes and requires additional disk space.

If additional failure protection is desired MAX_FAILURES can be set to 2. Setting MAX_FAILURES=2 will increase the default number of replicas maintained by the deployment and allow it to survive two simultaneous failures as long as the deployment is able to maintain a quorum of nodes (N/2 + 1). This means two nodes can fail simultaneously, or if Zones are configured, a single zone failure and an additional node can fail.

The following examples illustrate fault tolerance for MAX_FAILURES = 2.

A 5 node deployment can sustain 2 simultaneous node failures with no loss of data.

2 Node Failures in Non-zoned Cluster

A 9 node deployment deployed across 3 zones can sustain 2 simultaneous node failures (in any zone or zones) with no data loss.

2 Node Fialures in Zoned Cluster

A 9 node deployment deployed across 3 zones can sustain a zone failure, plus one additional node failure with no data loss:

Combination Zone and Node Failure

However, if a 9 node deployment deployed across 3 zones sustains 2 simultaneous zone failures, the remaining deployment does not meet the quorum requirement (N/2 + 1).

A 10 node deployment deployed across 5 zones can sustain 2 simultaneous zone failures:

Successful 2 Zone Failure

When MAX_FAILURES = 2, there must be a minimum of 5 zones to sustain 2 zone failures in order to meet the quorum requirement (N/2 + 1):

Unsuccessful 2 Zone Failure

Note

Due to the high performance overhead, Xpand does not recommend exceeding MAX_FAILURES = 2.