FAILURES for MariaDB Xpand
The value of MAX_
MAX_FAILURES =1, new tables are created with
REPLICAS = 2.
Prerequisites for MAX_
For a cluster to tolerate the configured value for MAX_
All representations must have sufficient replicas. If MAX_
FAILURES is updated, all tables created previously must have their replicas updated manually.
There must be a quorum (at least N/2+1) of nodes available
Xpand recommends provisioning enough disk space so that the cluster has enough space to reprotect after an unexpected failure. For additional information, see "Allocate Disk Space for Fault Tolerance and Availability with MariaDB Xpand".
Due to the high performance overhead, Xpand does not recommend exceeding
MAX_FAILURES = 2.
FAILURES = 1
The default configuration for Xpand is
MAX_FAILURES = 1, which indicates
REPLICAS = 2.
In the following example, Table A is configured with
SLICES = 3 and
REPLICAS = 2 (default for
MAX_FAILURES = 1). Slices are labeled A1, and A2, and A3 and prime notation is used to denote replicas. A1 and A1' are different replicas of the same slice.
In this configuration any node can be lost with no data loss as the remaining nodes will contain a copy of the slice. After the loss of a node, Xpand will continue to operate using the remaining nodes and work to create additional copies for replicas that were lost.
The default configuration of Xpand is to have slices equal to the number of nodes (
hash_dist_min_slices = 0). In this example, A1-A5 are used to label slices, and A1 and A1' are replicas of the same slice.
Any node can be lost with no data loss as the remaining nodes will contain a copy of the slice. Increasing the number of slices and cluster size increases parallelism and improves performance but does not change the degree of fault tolerance.
FAILURES = 2
To configure Xpand to tolerate additional failures, you can increase the value of
Increasing the value of MAX_
If additional failure protection is desired MAX_
MAX_FAILURES=2 will increase the default number of replicas maintained by the cluster and allow it to survive two simultaneous failures as long as the cluster is able to maintain a quorum of nodes (N/2 + 1). This means two nodes can fail simultaneously, or if Zones are configured, a single zone failure and an additional node can fail.
The following examples illustrate fault tolerance for
MAX_FAILURES = 2.
A 5 node cluster can sustain 2 simultaneous node failures with no loss of data.
A 9 node cluster deployed across 3 zones can sustain 2 simultaneous node failures (in any zone or zones) with no data loss.
A 9 node cluster deployed across 3 zones can sustain a zone failure, plus one additional node failure with no data loss:
However, if a 9 node cluster deployed across 3 zones sustains 2 simultaneous zone failures, the remaining cluster does not meet the quorum requirement (N/2 + 1):
A 10 node cluster deployed across 5 zones (the maximum number of zones) can sustain 2 simultaneous zone failures:
MAX_FAILURES = 2, there must be a minimum of 5 zones to sustain 2 zone failures in order to meet the quorum requirement (N/2 + 1):