1 of 1

What is MariaDB Galera Cluster?

MariaDB Galera Cluster is a Linux-exclusive, multi-primary cluster designed for MariaDB, offering features such as active-active topology, read/write capabilities on any node, automatic membership and node joining, true parallel replication at the row level, and direct client connections, with an emphasis on the native MariaDB experience.

About

MariaDB Galera Cluster is a virtually synchronous multi-primary cluster for MariaDB. It is available on Linux only and only supports the (although there is experimental support for and, from , . See the wsrep_replicate_myisam system variable, or, from , the wsrep_mode system variable.

Features

Active-active multi-primary topology
Read and write to any cluster node
Automatic membership control: failed nodes drop from the cluster

Benefits

The above features yield several benefits for a DBMS clustering solution, including:

No replica lag
No lost transactions
Read scalability
Smaller client latencies

The page has instructions on how to get up and running with MariaDB Galera Cluster.

A great resource for Galera users is (codership-team 'at' googlegroups (dot) com) - If you use Galera, it is recommended you subscribe.

Galera Versions

MariaDB Galera Cluster is powered by:

MariaDB Server.
The .

The functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the package. The following version corresponds to each MariaDB Server version:

In and later, MariaDB Galera Cluster uses 4. This means that the wsrep API version is 26 and the is version 4.X.
In and before, MariaDB Galera Cluster uses 3. This means that the wsrep API is version 25 and the is version 3.X.

See for more information about how to interpret these version numbers.

Galera 4 Versions

The following table lists each version of the 4 wsrep provider, and it lists which version of MariaDB each one was first released in. If you would like to install 4 using , , or , then the package is called galera-4.

Galera Version

Released in MariaDB Version

Cluster Failure and Recovery Scenarios

While a Galera Cluster is designed for high availability, various scenarios can lead to node or cluster outages. This guide describes common failure situations and the procedures to safely recover from them.

Graceful Shutdown Scenarios

This covers situations where nodes are intentionally stopped for maintenance or configuration changes, based on a three-node cluster.

One Node is Gracefully Stopped

When one node is stopped, it sends a message to the other nodes, and the cluster size is reduced. Properties like Quorum calculation are automatically adjusted. As soon as the node is started again, it rejoins the cluster based on its wsrep_cluster_address variable.

If the write-set cache (gcache.size) on a donor node still has all the transactions that were missed, the node will rejoin using a fast Incremental State Transfer (IST). If not, it will automatically fall back to a full State Snapshot Transfer (SST).

Two Nodes Are Gracefully Stopped

The single remaining node forms a Primary Component and can serve client requests. To bring the other nodes back, you simply start them.

However, the single running node must act as a Donor for the state transfer. During the SST, its performance may be degraded, and some load balancers may temporarily remove it from rotation. For this reason, it's best to avoid running with only one node.

All Three Nodes Are Gracefully Stopped

When the entire cluster is shut down, you must bootstrap it from the most advanced node to prevent data loss.

Identify the most advanced node: On each server, check the seqno value in the /var/lib/mysql/grastate.dat file. The node with the highest seqno was the last to commit a transaction.
Bootstrap from that node: Use the appropriate MariaDB script to start a new cluster from this node only.
Bash
Start the other nodes normally: Once the first node is running, start the MariaDB service on the other nodes. They will join the new cluster via SST.

Unexpected Node Failure (Crash) Scenarios

This covers situations where nodes become unavailable due to a power outage, hardware failure, or software crash.

One Node Disappears from the Cluster

If one node crashes, the two remaining nodes will detect the failure after a timeout period and remove the node from the cluster. Because they still have Quorum (2 out of 3), the cluster continues to operate without service disruption. When the failed node is restarted, it will rejoin automatically as described above.

Two Nodes Disappear from the Cluster

The single remaining node cannot form a Quorum by itself. It will switch to a non-Primary state and refuse to serve queries to protect data integrity. Any query attempt will result in an error:

Recovery:

If the other nodes come back online, the cluster will re-form automatically.
If the other nodes have permanently failed, you must manually force the remaining node to become a new Primary Component. Warning: Only do this if you are certain the other nodes are permanently down.

All Nodes Go Down Without a Proper Shutdown

In a datacenter power failure or a severe bug, all nodes may crash. The grastate.dat file will not be updated correctly and will show seqno: -1.

Recovery:

On each node, run mysqld with the --wsrep-recover option. This will read the database logs and report the node's last known transaction position (GTID).
Bash
Compare the sequence numbers from the recovered position on all nodes.
On the node with the highest sequence number, edit its /var/lib/mysql/grastate.dat file and set safe_to_bootstrap: 1

Recovering from a Split-Brain Scenario

A split-brain occurs when a network partition splits the cluster, and no resulting group has a Quorum. This is most common with an even number of nodes. All nodes will become non-Primary.

Recovery:

Choose one of the partitioned groups to become the new Primary Component.
On one node within that chosen group, manually force it to bootstrap:
This group will now become operational. When network connectivity is restored, the nodes from the other partition will automatically detect this Primary Component and rejoin it.

Never execute the bootstrap command on both sides of a partition. This will create two independent, active clusters with diverging data, leading to severe data inconsistency.

What is MariaDB Galera Cluster?

About

Features

Active-active multi-primary topology
Read and write to any cluster node
Automatic membership control: failed nodes drop from the cluster

Benefits

The above features yield several benefits for a DBMS clustering solution, including:

No replica lag
No lost transactions
Read scalability
Smaller client latencies

The page has instructions on how to get up and running with MariaDB Galera Cluster.

A great resource for Galera users is (codership-team 'at' googlegroups (dot) com) - If you use Galera, it is recommended you subscribe.

Galera Versions

MariaDB Galera Cluster is powered by:

MariaDB Server.
The .

The functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the package. The following version corresponds to each MariaDB Server version:

In and later, MariaDB Galera Cluster uses 4. This means that the wsrep API version is 26 and the is version 4.X.
In and before, MariaDB Galera Cluster uses 3. This means that the wsrep API is version 25 and the is version 3.X.

See for more information about how to interpret these version numbers.

Galera 4 Versions

Galera Version

Released in MariaDB Version

Cluster Failure and Recovery Scenarios

Graceful Shutdown Scenarios

This covers situations where nodes are intentionally stopped for maintenance or configuration changes, based on a three-node cluster.

One Node is Gracefully Stopped

Two Nodes Are Gracefully Stopped

The single remaining node forms a Primary Component and can serve client requests. To bring the other nodes back, you simply start them.

All Three Nodes Are Gracefully Stopped

When the entire cluster is shut down, you must bootstrap it from the most advanced node to prevent data loss.

Identify the most advanced node: On each server, check the seqno value in the /var/lib/mysql/grastate.dat file. The node with the highest seqno was the last to commit a transaction.
Bootstrap from that node: Use the appropriate MariaDB script to start a new cluster from this node only.
Bash
Start the other nodes normally: Once the first node is running, start the MariaDB service on the other nodes. They will join the new cluster via SST.

Unexpected Node Failure (Crash) Scenarios

This covers situations where nodes become unavailable due to a power outage, hardware failure, or software crash.

One Node Disappears from the Cluster

Two Nodes Disappear from the Cluster

The single remaining node cannot form a Quorum by itself. It will switch to a non-Primary state and refuse to serve queries to protect data integrity. Any query attempt will result in an error:

Recovery:

If the other nodes come back online, the cluster will re-form automatically.
If the other nodes have permanently failed, you must manually force the remaining node to become a new Primary Component. Warning: Only do this if you are certain the other nodes are permanently down.

All Nodes Go Down Without a Proper Shutdown

In a datacenter power failure or a severe bug, all nodes may crash. The grastate.dat file will not be updated correctly and will show seqno: -1.

Recovery:

On each node, run mysqld with the --wsrep-recover option. This will read the database logs and report the node's last known transaction position (GTID).
Bash
Compare the sequence numbers from the recovered position on all nodes.
On the node with the highest sequence number, edit its /var/lib/mysql/grastate.dat file and set safe_to_bootstrap: 1

Recovering from a Split-Brain Scenario

A split-brain occurs when a network partition splits the cluster, and no resulting group has a Quorum. This is most common with an even number of nodes. All nodes will become non-Primary.

Recovery:

Choose one of the partitioned groups to become the new Primary Component.
On one node within that chosen group, manually force it to bootstrap:
This group will now become operational. When network connectivity is restored, the nodes from the other partition will automatically detect this Primary Component and rejoin it.

Never execute the bootstrap command on both sides of a partition. This will create two independent, active clusters with diverging data, leading to severe data inconsistency.

What is MariaDB Galera Cluster?

About

Features

Benefits

Galera Versions

Galera 4 Versions

Cluster Failure and Recovery Scenarios

Graceful Shutdown Scenarios

One Node is Gracefully Stopped

Two Nodes Are Gracefully Stopped

All Three Nodes Are Gracefully Stopped

Unexpected Node Failure (Crash) Scenarios

One Node Disappears from the Cluster

Two Nodes Disappear from the Cluster

All Nodes Go Down Without a Proper Shutdown

Recovering from a Split-Brain Scenario

See Also

What is MariaDB Galera Cluster?

About

Features

Benefits

Galera Versions

Galera 4 Versions

Cluster Failure and Recovery Scenarios

Graceful Shutdown Scenarios

One Node is Gracefully Stopped

Two Nodes Are Gracefully Stopped

All Three Nodes Are Gracefully Stopped

Unexpected Node Failure (Crash) Scenarios

One Node Disappears from the Cluster

Two Nodes Disappear from the Cluster

All Nodes Go Down Without a Proper Shutdown

Recovering from a Split-Brain Scenario

See Also