1 of 4

MariaDB Galera Cluster

Complete MariaDB Galera Cluster guide for MariaDB. Complete reference documentation for implementation, configuration, and usage for production use.

MariaDB Galera Cluster Overview

MariaDB Enterprise Cluster is a solution designed to handle high workloads exceeding the capacity of a single server. It is based on Galera Cluster technology integrated with MariaDB Enterprise Server and includes features like data-at-rest encryption for added security. This multi-primary replication alternative is ideal for maintaining data consistency across multiple servers, providing enhanced reliability and scalability.

Overview

MariaDB Enterprise Cluster, powered by Galera, is available with MariaDB Enterprise Server. MariaDB Galera Cluster is available with MariaDB Community Server.

In order to handle increasing load and especially when that load exceeds what a single server can process, it is best practice to deploy multiple MariaDB Enterprise Servers with a replication solution to maintain data consistency between them. MariaDB Enterprise Cluster is a multi-primary replication solution that serves as an alternative to the single-primary MariaDB Replication.

An Introduction to Database Replication

Database replication is the process of continuously copying data from one database server (a "node") to another, creating a distributed and resilient system. The goal is for all nodes in this system to contain the same set of data, forming what is known as a database cluster. From the perspective of a client application, this distributed nature is often transparent, allowing it to interact with the cluster as if it were a single database.

Replication Architectures

Primary/Replica

The most common replication architecture is Primary/Replica (also known as Master/Slave). In this model:

The Primary node is the authoritative source. It is the only node that accepts write operations (e.g., INSERT, UPDATE, DELETE).
The Primary logs these changes and sends them to one or more Replica nodes.

Multi-Primary Replication

In a multi-primary system, every node in the cluster acts as a primary. This means any node can accept write operations. When a node receives an update, it automatically propagates that change to all other primary nodes in the cluster. Each primary node logs its own changes and communicates them to its peers to maintain synchronization.

Replication Protocols: Asynchronous vs. Synchronous

Beyond the architecture, the replication protocol determines how transactions are confirmed across the cluster.

Asynchronous Replication (Lazy Replication)

In asynchronous replication, the primary node commits a transaction locally first and then sends the changes to the replicas in the background. The transaction is confirmed as complete to the client immediately after it's saved on the primary. This means there is a brief period, known as replication lag, where the replicas have not yet received the latest data.

Synchronous Replication (Eager Replication)

In synchronous replication, a transaction is not considered complete (committed) until it has been successfully applied and confirmed on all participating nodes. When the client receives confirmation, it is a guarantee that the data exists consistently across the cluster.

The Trade-offs of Synchronous Replication

Advantages

Synchronous replication offers several powerful advantages over its asynchronous counterpart:

High Availability: Since all nodes are fully synchronized, if one node fails, there is zero data loss. Traffic can be immediately directed to another node without complex failover procedures, as all data replicas are guaranteed to be consistent.
Read-After-Write Consistency: Synchronous replication guarantees causality. A SELECT query issued immediately after a transaction will always see the effects of that transaction, even if the query is executed on a different node in the cluster.

Disadvantages

Traditionally, eager replication protocols coordinate nodes one operation at a time. They use a two phase commit, or distributed locking. A system with number of nodes due to process operations with a throughput of transactions per second gives you messages per second with:

What this means that any increase in the number of nodes leads to an exponential growth in the transaction response times and in the probability of conflicts and deadlock rates.

For this reason, asynchronous replication remains the dominant replication protocol for database performance, scalability and availability. Widely adopted open source databases, such as MySQL and PostgreSQL only provide asynchronous replication solutions.

Galera's Solution: Modern Synchronous Replication

Galera Cluster solves the traditional problems of synchronous replication by using a modern, certification-based approach built on several key innovations:

Group Communication: A robust messaging layer ensures that information is delivered to all nodes reliably and in the correct order, forming a solid foundation for data consistency.
Write-Set Replication: Instead of coordinating on every individual operation, database changes (writes) are grouped into a single package called a "write-set." This write-set is replicated as a single message, avoiding the high overhead of traditional two-phase commit.
Optimistic Execution: Transactions are first executed optimistically on a local node. The resulting write-set is then broadcast to the cluster for a fast, parallel certification process. If it passes certification (meaning no conflicts), it is committed on all nodes.

The certification-based replication system that Galera Cluster uses is built on these powerful approaches, delivering the benefits of synchronous replication without the traditional performance bottlenecks.

How it Works

MariaDB Enterprise Cluster is built on MariaDB Enterprise Server with Galera Cluster and MariaDB MaxScale. In MariaDB Enterprise Server 10.5 and later, it features enterprise-specific options, such as data-at-rest encryption for the write-set cache, that are not available in other Galera Cluster implementations.

As a multi-primary replication solution, any MariaDB Enterprise Server can operate as a Primary Server. This means that changes made to any node in the cluster replicate to every other node in the cluster, using certification-based replication and global ordering of transactions for the InnoDB storage engine.

MariaDB Enterprise Cluster is only available for Linux operating systems.

Architecture

There are a few things to consider when planning the hardware, virtual machines, or containers for MariaDB Enterprise Cluster.

MariaDB Enterprise Cluster architecture involves deploying with multiple instances of MariaDB Enterprise Server. The Servers are configured to use multi-primary replication to maintain consistency between themselves while routes reads and writes between them.

The application establishes a client connection to . MaxScale then routes statements to one of the MariaDB Enterprise Servers in the cluster. Writes made to any node in this cluster replicate to all the other nodes of the cluster.

When MariaDB Enterprise Servers start in a cluster:

Each Server attempts to establish network connectivity with the other Servers in the cluster
Groups of connected Servers form a component
When a Server establishes network connectivity with the Primary Component, it synchronizes its local database with that of the cluster

In planning the number of systems to provision for MariaDB Enterprise Cluster, it is important to keep cluster operation in mind. Ensuring that it has enough disk space and that it is able to maintain a Primary Component in the event of outages.

Each Server requires the minimum amount of disk space needed to store the entire database. The upper storage limit for MariaDB Enterprise Cluster is that of the smallest disk in use.
Each switch in use should have an odd number of Servers above three.
In a cluster that spans multiple switches, each data center in use should have an odd number of switches above three.

In case of planning Servers to the switch, switches to the data center, and data centers in the cluster, this model helps preserve the Primary Component. A minimum of three in use means that a single Server or switch can fail without taking down the cluster.

Using an odd number above three reduces the risk of a split-brain situation (that is, a case where two separate groups of Servers believe that they are part of the Primary Component and remain operational).

Cluster Configuration

Nodes in MariaDB Enterprise Cluster are individual MariaDB Enterprise Servers configured to perform multi-primary cluster replication. This configuration is set using a series of system variables in the configuration file.

Additional information on system variables is available in the Reference chapter.

General Configuration

The innodb_autoinc_lock_mode system variable must be set to a value of 2 to enable interleaved lock mode. MariaDB Enterprise Cluster does not support other lock modes.

Ensure also that the bind_address system variable is properly set to allow MariaDB Enterprise Server to listen for TCP/IP connections:

Cluster Name and Address

MariaDB Enterprise Cluster requires that you set a name for your cluster, using the wsrep_cluster_name system variable. When nodes connect to each other, they check the cluster name to ensure that they've connected to the correct cluster before replicating data. All Servers in the cluster must have the same value for this system variable.

Using the wsrep_cluster_address system variable, you can define the back-end protocol (always gcomm) and comma-separated list of the IP addresses or domain names of the other nodes in the cluster.

It is best practice to list all nodes on this system variable, as this is the list the node searches when attempting to reestablish network connectivity with the primary component.

Note: In certain environments, such as deployments in the cloud, you may also need to set the wsrep_node_address system variable, so that MariaDB Enterprise Server properly informs other Servers how to reach it.

Galera Replicator Plugin

MariaDB Enterprise Server connects to other Servers and replicates data from the cluster through a wsrep Provider called the Galera Replicator plugin. In order to enable clustering, specify the path to the relevant .so file using the wsrep_provider system variable.

MariaDB Enterprise Server 10.4 and later installations use an enterprise-build of the Galera Enterprise 4 plugin. This includes all the features of Galera Cluster 4 as well as enterprise features like GCache encryption.

To enable MariaDB Enterprise Cluster, use the libgalera_enterprise_smm.so library:

MariaDB Enterprise Server use the older community-release of the Galera 3 plugin. This is set using the libgalera_smm.so library:

In addition to system variables, there is a set of options that you can pass to the wsrep Provider to configure or to otherwise adjust its operations. This is done through the wsrep_provider_options system variable:

Additional information is available in the Reference chapter.

Cluster Replication

MariaDB Enterprise Cluster implements a multi-primary replication solution.

When you write to a table on a node, the node collects the write into a write-set transaction, which it then replicates to the other nodes in the cluster.

Your application can write to any node in the cluster. Each node certifies the replicated write-set. If the transaction has no conflicts, the nodes apply it. If the transaction does have conflicts, it is rejected and all of the nodes revert the changes.

Quorum

The first node you start in MariaDB Enterprise Cluster bootstraps the Primary Component. Each subsequent node that establishes a connection joins and synchronizes with the Primary Component. A cluster achieves a quorum when more than half the nodes are joined to the Primary Component.

When a component forms that has less than half the nodes in the cluster, it becomes non-operational, since it believes there is a running Primary Component to which it has lost network connectivity.

These quorum requirements, combined with the requisite number of odd nodes, avoid a split brain situation, or one in which two separate components believe they are each the Primary Component.

Dynamically Bootstrapping the Cluster

In cases where the cluster goes down and your nodes become non-operational, you can dynamically bootstrap the cluster.

First, find the most up-to-date node (that is, the node with the highest value for the wsrep_last_committed status variable):

Once you determine the node with the most recent transaction, you can designate it as the Primary Component by running the following on it:

The node bootstraps the Primary Component onto itself. Other nodes in the cluster with network connectivity then submit state transfer requests to this node to bring their local databases into sync with what's available on this node.

State Transfers

From time to time a node can fall behind the cluster. This can occur due to expensive operations being issued to it or due to network connectivity issues that lead to write-sets backing up in the queue. Whatever the cause, when a node finds that it has fallen too far behind the cluster, it attempts to initiate a state transfer.

In a state transfer, the node connects to another node in the cluster and attempts to bring its local database back in sync with the cluster. There are two types of state transfers:

Incremental State Transfer (IST)
State Snapshot Transfer (SST)

When the donor node receives a state transfer request, it checks its write-set cache (that is, the GCache) to see if it has enough saved write-sets to bring the joiner into sync. If the donor node has the intervening write-sets, it performs an IST operation, where the donor node only sends the missing write-sets to the joiner. The joiner applies these write-sets following the global ordering to bring its local databases into sync with the cluster.

When the donor does not have enough write-sets cached for an IST, it runs an SST operation. In an SST, the donor uses a backup solution, like MariaDB Enterprise Backup, to copy its data directory to the joiner. When the joiner completes the SST, it begins to process the write-sets that came in during the transfer. Once it's in sync with the cluster, it becomes operational.

IST's provide the best performance for state transfers and the size of the GCache may need adjustment to facilitate their use.

Flow Control

MariaDB Enterprise Server uses Flow Control to sometimes throttle transactions and ensure that all nodes work equitably.

Write-sets that replicate to a node are collected by the node in its received queue. The node then processes the write-sets according to global ordering. Large transactions, expensive operations, or simple hardware limitations can lead to the received queue backing up over time.

When a node's received queue grows beyond certain limits, the node initiates Flow Control. In Flow Control, the node pauses replication to work through the write-sets it already has. Once it has worked the received queue down to a certain size, it re-initiates replication.

Eviction

A node or nodes will be removed or evicted from a cluster if it becomes non-responsive.

In MariaDB Enterprise Cluster, each node monitors network connectivity and response times from every other node. MariaDB Enterprise Cluster evaluates network performance using the EVS Protocol.

When a node finds another to have poor network connectivity, it adds an entry to the delayed list. If the node becomes active again and the network performance improves for a certain amount of time, an entry for it is removed from the delayed list. That is, the longer a node has network problems the longer it has to be active again to be removed from the delayed list.

If the number of entries for a node in the delayed list exceeds a threshold established for the cluster, the EVS Protocol evicts the node from the cluster.

Evicted nodes become non-operational components. They cannot rejoin the cluster until you restart MariaDB Enterprise Server.

Streaming Replication

Under normal operation, huge transactions and long-running transactions are difficult to replicate. MariaDB Enterprise Cluster rejects conflicting transactions and rolls back the changes. A transaction that takes several minutes or longer to run can encounter issues if a small transaction is run on another node and attempts to write to the same table. The large transaction fails because it encounters a conflict when it attempts to replicate.

MariaDB Enterprise Server 10.4 and later support streaming replication for MariaDB Enterprise Cluster. In streaming replication, huge transactions are broken into transactional fragments, which are replicated and applied as the operation runs. This makes it more difficult for intervening sessions to introduce conflicts.

Initiate Streaming Replication

To initiate streaming replication, set the wsrep_trx_fragment_unit and wsrep_trx_fragment_size system variables. You can set the unit to BYTES, ROWS, or STATEMENTS:

Then, run your transaction.

Streaming replication works best with very large transactions where you don't expect to encounter conflicts. If the statement does encounter a conflict, the rollback operation is much more expensive than usual. As such, it's best practice to enable streaming replication at a session-level and to disable it by setting the wsrep_trx_fragment_size system variable to 0 when it's not needed.

Galera Arbitrator

Deployments on mixed hardware can introduce issues where some MariaDB Enterprise Servers perform better than others. A Server in one part of the world might perform more reliably or be physically closer to most users than others. In cases where a particular MariaDB Enterprise Server holds logical significance for your cluster, you can weight its value in quorum calculations.

Galera Arbitrator is a separate process that runs alongside MariaDB Enterprise Server. While the Arbitrator does not take part in replication, whenever the cluster performs quorum calculations it gives the Arbitrator a vote as though it were another MariaDB Enterprise Server. In effect this means that the system has the vote of MariaDB Enterprise Server plus any running Arbitrators in determining whether it's part of the Primary Component.

Bear in mind that the Galera Arbitrator is a separate package, galera-arbitrator-4, which is not installed by default with MariaDB Enterprise Server.

Scale-out

MariaDB Enterprise Servers that join a cluster attempt to connect to the IP addresses provided to the wsrep_cluster_address system variable. This variable adjusts itself at runtime to include the addresses of all connected nodes.

To scale-out MariaDB Enterprise Cluster, start new MariaDB Enterprise Servers with the appropriate wsrep_cluster_address list and the same wsrep_cluster_name value. The new nodes establish network connectivity with the running cluster and request a state transfer to bring their local database into sync with the cluster.

Once the MariaDB Enterprise Server reports itself as being in sync with the cluster, MariaDB MaxScale can begin including it in the load distribution for the cluster.

Being a multi-primary replication solution means that any MariaDB Enterprise Server in the cluster can handle write operations, but write scale-out is minimal as every Server in the cluster needs to apply the changes.

Failover

MariaDB Enterprise Cluster does not provide failover capabilities on its own. is used to route client connections to MariaDB Enterprise Server.

Unlike a traditional load balancer, is aware of changes in the node and cluster states.

MaxScale takes nodes out of the distribution that initiate a blocking SST operation or Flow Control or otherwise go down, which allows them to recover or catch up without stopping service to the rest of the cluster.

Backups

With MariaDB Enterprise Cluster, each node contains a replica of all the data in the cluster. As such, you run on any node to back up the available data. The process for backing up a node is the same as for a single MariaDB Enterprise Server.

Encryption

MariaDB Enterprise Server supports data-at-rest encryption to secure data on disk, and data-in-transit encryption to secure data on the network.

MariaDB Enterprise Server support data-at-rest encryption of the GCache, the file used by Galera systems to cache write sets. Encrypting GCache ensures the Server encrypts both data it temporarily caches from the cluster as well as the data it permanently stores in tablespaces.

For data-in-transit, MariaDB Enterprise Cluster supports encryption the same as MariaDB Server and additionally provides data-in-transit encryption for Galera replication traffic and for State Snapshot Transfer (SST) traffic.

Data-in-Transit Encryption

MariaDB Enterprise Server 10.6 encrypts Galera replication and SST traffic using the server's TLS configuration by default. With the wsrep_ssl_mode system variable, you can configure the node to use the TLS configuration of .

MariaDB Enterprise Server 10.5 and earlier support encrypting Galera replication and SST traffic through .

TLS encryption is only available when used by all nodes in the cluster.

Enabling GCache Encryption

To encrypt data-at-rest such as GCache, stop the server, set encrypt_binlog=ON within the MariaDB Enterprise Server configuration file, and restart the server. This variable also controls encryption of the binary log and the relay log when used.

Disabling GCache Encryption

To stop using encryption on the GCache file, stop the server, set encrypt_binlog=OFF within the MariaDB Enterprise Server configuration file, and restart the server. This variable also controls encryption of the binary log and the relay log when used.

What is MariaDB Galera Cluster?

Complete Galera Cluster reference: active-active multi-primary replication, IST vs SST (gcache.size), galera_new_cluster bootstrap, pc.bootstrap recovery.

MariaDB Galera Cluster is a Linux-exclusive, multi-primary cluster designed for MariaDB, offering features such as active-active topology, read/write capabilities on any node, automatic membership and node joining, true parallel replication at the row level, and direct client connections, with an emphasis on the native MariaDB experience.

About

MariaDB Galera Cluster is a virtually synchronous multi-primary cluster for MariaDB. It is available on Linux only and only supports the (although there is experimental support for and, from , . See the system variable, or, from , the system variable.

Features

Active-active multi-primary topology
Read and write to any cluster node

Benefits

The above features yield several benefits for a DBMS clustering solution, including:

No replica lag
No lost transactions
Read scalability
Smaller client latencies

The page has instructions on how to get up and running with MariaDB Galera Cluster.

A great resource for Galera users is (codership-team 'at' googlegroups (dot) com) - If you use Galera, it is recommended you subscribe.

Galera Versions

MariaDB Galera Cluster is powered by:

MariaDB Server.
The .

The functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the package. The following version corresponds to each MariaDB Server version:

In and later, MariaDB Galera Cluster uses 4. This means that the wsrep API version is 26 and the is version 4.X.
In and before, MariaDB Galera Cluster uses 3. This means that the wsrep API is version 25 and the is version 3.X.

See for more information about how to interpret these version numbers.

Galera 4 Versions

The following table lists each version of the 4 wsrep provider, and it lists which version of MariaDB each one was first released in. If you would like to install 4 using , , or , then the package is called galera-4.

Galera Version

Released in MariaDB Version

Cluster Failure and Recovery Scenarios

While a Galera Cluster is designed for high availability, various scenarios can lead to node or cluster outages. This guide describes common failure situations and the procedures to safely recover from them.

Graceful Shutdown Scenarios

This covers situations where nodes are intentionally stopped for maintenance or configuration changes, based on a three-node cluster.

One Node is Gracefully Stopped

When one node is stopped, it sends a message to the other nodes, and the cluster size is reduced. Properties like Quorum calculation are automatically adjusted. As soon as the node is started again, it rejoins the cluster based on its wsrep_cluster_address variable.

If the write-set cache (gcache.size) on a donor node still has all the transactions that were missed, the node will rejoin using a fast Incremental State Transfer (IST). If not, it will automatically fall back to a full State Snapshot Transfer (SST).

Two Nodes Are Gracefully Stopped

The single remaining node forms a Primary Component and can serve client requests. To bring the other nodes back, you simply start them.

However, the single running node must act as a Donor for the state transfer. During the SST, its performance may be degraded, and some load balancers may temporarily remove it from rotation. For this reason, it's best to avoid running with only one node.

All Three Nodes Are Gracefully Stopped

When the entire cluster is shut down, you must bootstrap it from the most advanced node to prevent data loss.

Identify the most advanced node: On each server, check the seqno value in the /var/lib/mysql/grastate.dat file. The node with the highest seqno was the last to commit a transaction.
Bootstrap from that node: Use the appropriate MariaDB script to start a new cluster from this node only.
Bash

Unexpected Node Failure (Crash) Scenarios

This covers situations where nodes become unavailable due to a power outage, hardware failure, or software crash.

One Node Disappears from the Cluster

If one node crashes, the two remaining nodes will detect the failure after a timeout period and remove the node from the cluster. Because they still have Quorum (2 out of 3), the cluster continues to operate without service disruption. When the failed node is restarted, it will rejoin automatically as described above.

Two Nodes Disappear from the Cluster

The single remaining node cannot form a Quorum by itself. It will switch to a non-Primary state and refuse to serve queries to protect data integrity. Any query attempt will result in an error:

Recovery:

If the other nodes come back online, the cluster will re-form automatically.
If the other nodes have permanently failed, you must manually force the remaining node to become a new Primary Component. Warning: Only do this if you are certain the other nodes are permanently down.

All Nodes Go Down Without a Proper Shutdown

In a datacenter power failure or a severe bug, all nodes may crash. The grastate.dat file will not be updated correctly and will show seqno: -1.

Recovery:

On each node, run mysqld with the --wsrep-recover option. This will read the database logs and report the node's last known transaction position (GTID).
Bash
Compare the sequence numbers from the recovered position on all nodes.

Recovering from a Split-Brain Scenario

A split-brain occurs when a network partition splits the cluster, and no resulting group has a Quorum. This is most common with an even number of nodes. All nodes will become non-Primary.

Recovery:

Choose one of the partitioned groups to become the new Primary Component.
On one node within that chosen group, manually force it to bootstrap:
This group will now become operational. When network connectivity is restored, the nodes from the other partition will automatically detect this Primary Component and rejoin it.

Never execute the bootstrap command on both sides of a partition. This will create two independent, active clusters with diverging data, leading to severe data inconsistency.

What is Galera Replication?

Summary

In MariaDB Cluster, transactions are replicated using the wsrep API, synchronously ensuring consistency across nodes. Synchronous replication offers high availability and consistency but is complex and potentially slower compared to asynchronous replication. Due to these challenges, asynchronous replication is often preferred for database performance and scalability, as seen in popular systems like MySQL and PostgreSQL, which typically favor asynchronous or semi-synchronous solutions.

In MariaDB Cluster, the server replicates a transaction at commit time by broadcasting the write set associated with the transaction to every node in the cluster. The client connects directly to the DBMS and experiences behavior that is similar to native MariaDB in most cases. The wsrep API (write set replication API) defines the interface between Galera replication and MariaDB.

Synchronous vs. Asynchronous Replication

The basic difference between synchronous and asynchronous replication is that "synchronous" replication guarantees that if a change happened on one node in the cluster, then the change will happen on other nodes in the cluster "synchronously," or at the same time. "Asynchronous" replication gives no guarantees about the delay between applying changes on the "master" node and the propagation of changes to "slave" nodes. The delay with "asynchronous" replication can be short or long. This also implies that if a master node crashes in an "asynchronous" replication topology, then some of the latest changes may be lost.

Theoretically, synchronous replication has several advantages over asynchronous replication:

Clusters utilizing synchronous replication are always highly available. If one of the nodes crashed, then there would be no data loss. Additionally, all cluster nodes are always consistent.
Clusters utilizing synchronous replication allow transactions to be executed on all nodes in parallel.
Clusters utilizing synchronous replication can guarantee causality across the whole cluster. This means that if a transactionSELECT is executed on one cluster node after a transaction is executed on another cluster node, it should see the effects of that transaction.

However, in practice, synchronous database replication has traditionally been implemented via the so-called "2-phase commit" or distributed locking, which proved to be very slow. Low performance and complexity of implementation of synchronous replication led to a situation where asynchronous replication remains the dominant means for database performance scalability and availability. Widely adopted open-source databases such as MySQL or PostgreSQL offer only asynchronous or semi-synchronous replication solutions.

Galera's replication is not completely synchronous. It is sometimes called virtually synchronous replication.

Certification-Based Replication Method

An alternative approach to synchronous replication that uses group communication and transaction ordering techniques was suggested by a number of researchers. For example:

Prototype implementations have shown a lot of promise. We combined our experience in synchronous database replication and the latest research in the field to create the Galera Replication library and the wsrep API.

Galera replication is a highly transparent, scalable, and virtually synchronous replication solution for database clustering to achieve high availability and improved performance. Galera-based clusters are:

Highly available
Highly transparent
Highly scalable (near-linear scalability may be reached depending on the application)

Generic Replication Library

Galera replication functionality is implemented as a shared library and can be linked with any transaction processing system that implements the wsrep API hooks.

The Galera replication library is a protocol stack providing functionality for preparing, replicating, and applying transaction write sets. It consists of:

wsrep API specifies the interface responsibilities for DBMS and replication provider
wsrep hooks is the wsrep integration in the DBMS engine.
Galera provider implements the wsrep API for Galera library

Many components in the Galera replication library were redesigned and improved with the introduction of Galera 4.

Galera Slave Threads

Although the Galera provider certifies the write set associated with a transaction at commit time on each node in the cluster, this write set is not necessarily applied on that cluster node immediately. Instead, the write set is placed in the cluster node's receive queue on the node, and it is eventually applied by one of the cluster node's Galera slave threads.

The number of Galera slave threads can be configured with the system variable.

The Galera slave threads are able to determine which write sets are safe to apply in parallel. However, if your cluster nodes seem to have frequent consistency problems, then setting the value to 1 will probably fix the problem.

When a cluster node's state, as seen by , is⁣JOINED, then increasing the number of slave threads may help the cluster node catch up with the cluster more quickly. In this case, it may be useful to set the number of threads to twice the number of CPUs on the system.

Streaming Replication

Streaming replication was introduced in Galera 4.

In older versions of MariaDB Cluster, there was a 2GB limit on the size of the transaction you could run. The node waits on the transaction commit before performing replication and certification. With large transactions, long-running writes, and changes to huge datasets, there was a greater possibility of a conflict forcing a rollback on an expensive operation.

Using streaming replication, the node breaks huge transactions up into smaller and more manageable fragments; it then replicates these fragments to the cluster as it works instead of waiting for the commit. Once certified, the fragment can no longer be aborted by conflicting transactions. As this can have performance consequences both during execution and in the event of rollback, it is recommended that you only use it with large transactions that are unlikely to experience conflict.

For more information on streaming replication, see the documentation.

Group Commits

Group Commit support for MariaDB Cluster was introduced in Galera 4.

In MariaDB Group Commit, groups of transactions are flushed together to disk to improve performance. In previous versions of MariaDB, this feature was not available in MariaDB Cluster, as it interfered with the global ordering of transactions for replication. MariaDB Cluster can now take advantage of Group Commit.

For more information on Group Commit, see the documentation.

MariaDB Galera Cluster Overview

Overview

MariaDB Enterprise Cluster, powered by Galera, is available with MariaDB Enterprise Server. MariaDB Galera Cluster is available with MariaDB Community Server.

An Introduction to Database Replication

Replication Architectures

Primary/Replica

The most common replication architecture is Primary/Replica (also known as Master/Slave). In this model:

The Primary node is the authoritative source. It is the only node that accepts write operations (e.g., INSERT, UPDATE, DELETE).
The Primary logs these changes and sends them to one or more Replica nodes.

Multi-Primary Replication

Replication Protocols: Asynchronous vs. Synchronous

Beyond the architecture, the replication protocol determines how transactions are confirmed across the cluster.

Asynchronous Replication (Lazy Replication)

Synchronous Replication (Eager Replication)

The Trade-offs of Synchronous Replication

Advantages

Synchronous replication offers several powerful advantages over its asynchronous counterpart:

High Availability: Since all nodes are fully synchronized, if one node fails, there is zero data loss. Traffic can be immediately directed to another node without complex failover procedures, as all data replicas are guaranteed to be consistent.
Read-After-Write Consistency: Synchronous replication guarantees causality. A SELECT query issued immediately after a transaction will always see the effects of that transaction, even if the query is executed on a different node in the cluster.

Disadvantages

What this means that any increase in the number of nodes leads to an exponential growth in the transaction response times and in the probability of conflicts and deadlock rates.

Galera's Solution: Modern Synchronous Replication

Galera Cluster solves the traditional problems of synchronous replication by using a modern, certification-based approach built on several key innovations:

Group Communication: A robust messaging layer ensures that information is delivered to all nodes reliably and in the correct order, forming a solid foundation for data consistency.
Write-Set Replication: Instead of coordinating on every individual operation, database changes (writes) are grouped into a single package called a "write-set." This write-set is replicated as a single message, avoiding the high overhead of traditional two-phase commit.
Optimistic Execution: Transactions are first executed optimistically on a local node. The resulting write-set is then broadcast to the cluster for a fast, parallel certification process. If it passes certification (meaning no conflicts), it is committed on all nodes.

How it Works

MariaDB Enterprise Cluster is only available for Linux operating systems.

Architecture

There are a few things to consider when planning the hardware, virtual machines, or containers for MariaDB Enterprise Cluster.

When MariaDB Enterprise Servers start in a cluster:

Each Server attempts to establish network connectivity with the other Servers in the cluster
Groups of connected Servers form a component
When a Server establishes network connectivity with the Primary Component, it synchronizes its local database with that of the cluster

Each Server requires the minimum amount of disk space needed to store the entire database. The upper storage limit for MariaDB Enterprise Cluster is that of the smallest disk in use.
Each switch in use should have an odd number of Servers above three.
In a cluster that spans multiple switches, each data center in use should have an odd number of switches above three.

Cluster Configuration

Additional information on system variables is available in the Reference chapter.

General Configuration

The innodb_autoinc_lock_mode system variable must be set to a value of 2 to enable interleaved lock mode. MariaDB Enterprise Cluster does not support other lock modes.

Ensure also that the bind_address system variable is properly set to allow MariaDB Enterprise Server to listen for TCP/IP connections:

Cluster Name and Address

Using the wsrep_cluster_address system variable, you can define the back-end protocol (always gcomm) and comma-separated list of the IP addresses or domain names of the other nodes in the cluster.

It is best practice to list all nodes on this system variable, as this is the list the node searches when attempting to reestablish network connectivity with the primary component.

Galera Replicator Plugin

To enable MariaDB Enterprise Cluster, use the libgalera_enterprise_smm.so library:

MariaDB Enterprise Server use the older community-release of the Galera 3 plugin. This is set using the libgalera_smm.so library:

Additional information is available in the Reference chapter.

Cluster Replication

MariaDB Enterprise Cluster implements a multi-primary replication solution.

When you write to a table on a node, the node collects the write into a write-set transaction, which it then replicates to the other nodes in the cluster.

Quorum

When a component forms that has less than half the nodes in the cluster, it becomes non-operational, since it believes there is a running Primary Component to which it has lost network connectivity.

These quorum requirements, combined with the requisite number of odd nodes, avoid a split brain situation, or one in which two separate components believe they are each the Primary Component.

Dynamically Bootstrapping the Cluster

In cases where the cluster goes down and your nodes become non-operational, you can dynamically bootstrap the cluster.

First, find the most up-to-date node (that is, the node with the highest value for the wsrep_last_committed status variable):

Once you determine the node with the most recent transaction, you can designate it as the Primary Component by running the following on it:

State Transfers

In a state transfer, the node connects to another node in the cluster and attempts to bring its local database back in sync with the cluster. There are two types of state transfers:

Incremental State Transfer (IST)
State Snapshot Transfer (SST)

IST's provide the best performance for state transfers and the size of the GCache may need adjustment to facilitate their use.

Flow Control

MariaDB Enterprise Server uses Flow Control to sometimes throttle transactions and ensure that all nodes work equitably.

Eviction

A node or nodes will be removed or evicted from a cluster if it becomes non-responsive.

In MariaDB Enterprise Cluster, each node monitors network connectivity and response times from every other node. MariaDB Enterprise Cluster evaluates network performance using the EVS Protocol.

If the number of entries for a node in the delayed list exceeds a threshold established for the cluster, the EVS Protocol evicts the node from the cluster.

Evicted nodes become non-operational components. They cannot rejoin the cluster until you restart MariaDB Enterprise Server.

Streaming Replication

Initiate Streaming Replication

To initiate streaming replication, set the wsrep_trx_fragment_unit and wsrep_trx_fragment_size system variables. You can set the unit to BYTES, ROWS, or STATEMENTS:

Then, run your transaction.

Galera Arbitrator

Bear in mind that the Galera Arbitrator is a separate package, galera-arbitrator-4, which is not installed by default with MariaDB Enterprise Server.

Scale-out

Once the MariaDB Enterprise Server reports itself as being in sync with the cluster, MariaDB MaxScale can begin including it in the load distribution for the cluster.

Failover

MariaDB Enterprise Cluster does not provide failover capabilities on its own. is used to route client connections to MariaDB Enterprise Server.

Unlike a traditional load balancer, is aware of changes in the node and cluster states.

Backups

Encryption

MariaDB Enterprise Server supports data-at-rest encryption to secure data on disk, and data-in-transit encryption to secure data on the network.

Data-in-Transit Encryption

MariaDB Enterprise Server 10.5 and earlier support encrypting Galera replication and SST traffic through .

TLS encryption is only available when used by all nodes in the cluster.

Enabling GCache Encryption

Disabling GCache Encryption

What is MariaDB Galera Cluster?

Complete Galera Cluster reference: active-active multi-primary replication, IST vs SST (gcache.size), galera_new_cluster bootstrap, pc.bootstrap recovery.

About

Features

Active-active multi-primary topology
Read and write to any cluster node

Benefits

The above features yield several benefits for a DBMS clustering solution, including:

No replica lag
No lost transactions
Read scalability
Smaller client latencies

The page has instructions on how to get up and running with MariaDB Galera Cluster.

A great resource for Galera users is (codership-team 'at' googlegroups (dot) com) - If you use Galera, it is recommended you subscribe.

Galera Versions

MariaDB Galera Cluster is powered by:

MariaDB Server.
The .

The functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the package. The following version corresponds to each MariaDB Server version:

In and later, MariaDB Galera Cluster uses 4. This means that the wsrep API version is 26 and the is version 4.X.
In and before, MariaDB Galera Cluster uses 3. This means that the wsrep API is version 25 and the is version 3.X.

See for more information about how to interpret these version numbers.

Galera 4 Versions

Galera Version

Released in MariaDB Version

Cluster Failure and Recovery Scenarios

Graceful Shutdown Scenarios

This covers situations where nodes are intentionally stopped for maintenance or configuration changes, based on a three-node cluster.

One Node is Gracefully Stopped

Two Nodes Are Gracefully Stopped

The single remaining node forms a Primary Component and can serve client requests. To bring the other nodes back, you simply start them.

All Three Nodes Are Gracefully Stopped

When the entire cluster is shut down, you must bootstrap it from the most advanced node to prevent data loss.

Identify the most advanced node: On each server, check the seqno value in the /var/lib/mysql/grastate.dat file. The node with the highest seqno was the last to commit a transaction.
Bootstrap from that node: Use the appropriate MariaDB script to start a new cluster from this node only.
Bash

Unexpected Node Failure (Crash) Scenarios

This covers situations where nodes become unavailable due to a power outage, hardware failure, or software crash.

One Node Disappears from the Cluster

Two Nodes Disappear from the Cluster

The single remaining node cannot form a Quorum by itself. It will switch to a non-Primary state and refuse to serve queries to protect data integrity. Any query attempt will result in an error:

Recovery:

If the other nodes come back online, the cluster will re-form automatically.
If the other nodes have permanently failed, you must manually force the remaining node to become a new Primary Component. Warning: Only do this if you are certain the other nodes are permanently down.

All Nodes Go Down Without a Proper Shutdown

In a datacenter power failure or a severe bug, all nodes may crash. The grastate.dat file will not be updated correctly and will show seqno: -1.

Recovery:

On each node, run mysqld with the --wsrep-recover option. This will read the database logs and report the node's last known transaction position (GTID).
Bash
Compare the sequence numbers from the recovered position on all nodes.

Recovering from a Split-Brain Scenario

A split-brain occurs when a network partition splits the cluster, and no resulting group has a Quorum. This is most common with an even number of nodes. All nodes will become non-Primary.

Recovery:

Choose one of the partitioned groups to become the new Primary Component.
On one node within that chosen group, manually force it to bootstrap:
This group will now become operational. When network connectivity is restored, the nodes from the other partition will automatically detect this Primary Component and rejoin it.

Never execute the bootstrap command on both sides of a partition. This will create two independent, active clusters with diverging data, leading to severe data inconsistency.

MariaDB Galera Cluster

MariaDB Galera Cluster Overview

hashtagOverview

hashtagAn Introduction to Database Replication

hashtagReplication Architectures

hashtagPrimary/Replica

hashtagMulti-Primary Replication

hashtagReplication Protocols: Asynchronous vs. Synchronous

hashtagAsynchronous Replication (Lazy Replication)

hashtagSynchronous Replication (Eager Replication)

hashtagThe Trade-offs of Synchronous Replication

hashtagAdvantages

hashtagDisadvantages

hashtagGalera's Solution: Modern Synchronous Replication

hashtagHow it Works

hashtagArchitecture

hashtagCluster Configuration

hashtagGeneral Configuration

hashtagCluster Name and Address

hashtagGalera Replicator Plugin

hashtagCluster Replication

hashtagQuorum

hashtagDynamically Bootstrapping the Cluster

hashtagState Transfers

hashtagFlow Control

hashtagEviction

hashtagStreaming Replication

hashtagInitiate Streaming Replication

hashtagGalera Arbitrator

hashtagScale-out

hashtagFailover

hashtagBackups

hashtagEncryption

hashtagData-in-Transit Encryption

hashtagEnabling GCache Encryption

hashtagDisabling GCache Encryption

What is MariaDB Galera Cluster?

hashtagAbout

hashtagFeatures

hashtagBenefits

hashtagGalera Versions

hashtagGalera 4 Versions

hashtagCluster Failure and Recovery Scenarios

hashtagGraceful Shutdown Scenarios

hashtagOne Node is Gracefully Stopped

hashtagTwo Nodes Are Gracefully Stopped

hashtagAll Three Nodes Are Gracefully Stopped

hashtagUnexpected Node Failure (Crash) Scenarios

hashtagOne Node Disappears from the Cluster

hashtagTwo Nodes Disappear from the Cluster

hashtagAll Nodes Go Down Without a Proper Shutdown

hashtagRecovering from a Split-Brain Scenario

hashtagSee Also

What is Galera Replication?

hashtagSummary

hashtagSynchronous vs. Asynchronous Replication

hashtagCertification-Based Replication Method

hashtagGeneric Replication Library

hashtagGalera Slave Threads

hashtagStreaming Replication

hashtagGroup Commits

hashtagSee Also

MariaDB Galera Cluster

What is Galera Replication?

hashtagSummary

hashtagSynchronous vs. Asynchronous Replication

hashtagCertification-Based Replication Method

hashtagGeneric Replication Library

hashtagGalera Slave Threads

hashtagStreaming Replication

hashtagGroup Commits

hashtagSee Also

MariaDB Galera Cluster Overview

hashtagOverview

hashtagAn Introduction to Database Replication

hashtagReplication Architectures

hashtagPrimary/Replica

hashtagMulti-Primary Replication

hashtagReplication Protocols: Asynchronous vs. Synchronous

hashtagAsynchronous Replication (Lazy Replication)

Overview

An Introduction to Database Replication

Replication Architectures

Primary/Replica

Multi-Primary Replication

Replication Protocols: Asynchronous vs. Synchronous

Asynchronous Replication (Lazy Replication)

Synchronous Replication (Eager Replication)

The Trade-offs of Synchronous Replication

Advantages

Disadvantages

Galera's Solution: Modern Synchronous Replication

How it Works

Architecture

Cluster Configuration

General Configuration

Cluster Name and Address

Galera Replicator Plugin

Cluster Replication

Quorum

Dynamically Bootstrapping the Cluster

State Transfers

Flow Control

Eviction

Streaming Replication

Initiate Streaming Replication

Galera Arbitrator

Scale-out

Failover

Backups

Encryption

Data-in-Transit Encryption

Enabling GCache Encryption

Disabling GCache Encryption

About

Features

Benefits

Galera Versions

Galera 4 Versions

Cluster Failure and Recovery Scenarios

Graceful Shutdown Scenarios

One Node is Gracefully Stopped

Two Nodes Are Gracefully Stopped

All Three Nodes Are Gracefully Stopped

Unexpected Node Failure (Crash) Scenarios

One Node Disappears from the Cluster

Two Nodes Disappear from the Cluster

All Nodes Go Down Without a Proper Shutdown

Recovering from a Split-Brain Scenario

See Also

Summary

Synchronous vs. Asynchronous Replication

Certification-Based Replication Method

Generic Replication Library

Galera Slave Threads

Streaming Replication

Group Commits

See Also

Summary

Synchronous vs. Asynchronous Replication

Certification-Based Replication Method

Generic Replication Library

Galera Slave Threads

Streaming Replication

Group Commits

See Also

Overview

An Introduction to Database Replication

Replication Architectures

Primary/Replica

Multi-Primary Replication

Replication Protocols: Asynchronous vs. Synchronous

Asynchronous Replication (Lazy Replication)

Synchronous Replication (Eager Replication)

The Trade-offs of Synchronous Replication

Advantages

Disadvantages

Galera's Solution: Modern Synchronous Replication

How it Works

Architecture