Get started quickly with MariaDB Galera Cluster using these guides. Follow step-by-step instructions to deploy and configure a highly available, multi-master cluster for your applications.
MariaDB Galera Cluster
Complete MariaDB Galera Cluster guide for MariaDB. Complete reference documentation for implementation, configuration, and usage for production use.
Articles on upgrading between MariaDB versions with Galera Cluster
Galera Security
MariaDB Galera security encrypts replication/SST traffic and ensures integrity through firewalls, secure credentials, and network isolation.
Installation & Deployment
Galera Architecture
Reference
Galera Cluster for MariaDB offers synchronous multi-master replication with high availability, no data loss, and simplified, consistent scaling.
Galera Management
Galera Management in MariaDB handles synchronous multi-master replication, ensuring high availability, data consistency, failover, and seamless node provisioning across clusters.
General Operations
State Snapshot Transfers (SSTs) in Galera Cluster
State Snapshot Transfers (SSTs) in MariaDB Galera Cluster copy the full dataset from a donor node to a new or recovering joiner node, ensuring data consistency before the joiner joins replication.
Load Balancing
High Availability
MariaDB ensures high availability with Replication for async/semi-sync data copying and Galera Cluster for sync multi-master with failover and zero data loss.
WSREP Variable Details
Configuration
What is Galera Replication?
Summary
In MariaDB Cluster, transactions are replicated using the wsrep API, synchronously ensuring consistency across nodes. Synchronous replication offers high availability and consistency but is complex and potentially slower compared to asynchronous replication. Due to these challenges, asynchronous replication is often preferred for database performance and scalability, as seen in popular systems like MySQL and PostgreSQL, which typically favor asynchronous or semi-synchronous solutions.
In MariaDB Cluster, the server replicates a transaction at commit time by broadcasting the write set associated with the transaction to every node in the cluster. The client connects directly to the DBMS and experiences behavior that is similar to native MariaDB in most cases. The wsrep API (write set replication API) defines the interface between Galera replication and MariaDB.
Synchronous vs. Asynchronous Replication
The basic difference between synchronous and asynchronous replication is that "synchronous" replication guarantees that if a change happened on one node in the cluster, then the change will happen on other nodes in the cluster "synchronously," or at the same time. "Asynchronous" replication gives no guarantees about the delay between applying changes on the "master" node and the propagation of changes to "slave" nodes. The delay with "asynchronous" replication can be short or long. This also implies that if a master node crashes in an "asynchronous" replication topology, then some of the latest changes may be lost.
Theoretically, synchronous replication has several advantages over asynchronous replication:
Clusters utilizing synchronous replication are always highly available. If one of the nodes crashed, then there would be no data loss. Additionally, all cluster nodes are always consistent.
Clusters utilizing synchronous replication allow transactions to be executed on all nodes in parallel.
Clusters utilizing synchronous replication can guarantee causality across the whole cluster. This means that if a transactionSELECT is executed on one cluster node after a transaction is executed on another cluster node, it should see the effects of that transaction.
However, in practice, synchronous database replication has traditionally been implemented via the so-called "2-phase commit" or distributed locking, which proved to be very slow. Low performance and complexity of implementation of synchronous replication led to a situation where asynchronous replication remains the dominant means for database performance scalability and availability. Widely adopted open-source databases such as MySQL or PostgreSQL offer only asynchronous or semi-synchronous replication solutions.
Galera's replication is not completely synchronous. It is sometimes called virtually synchronous replication.
Certification-Based Replication Method
An alternative approach to synchronous replication that uses group communication and transaction ordering techniques was suggested by a number of researchers. For example:
Prototype implementations have shown a lot of promise. We combined our experience in synchronous database replication and the latest research in the field to create the Galera Replication library and the wsrep API.
Galera replication is a highly transparent, scalable, and virtually synchronous replication solution for database clustering to achieve high availability and improved performance. Galera-based clusters are:
Highly available
Highly transparent
Highly scalable (near-linear scalability may be reached depending on the application)
Generic Replication Library
Galera replication functionality is implemented as a shared library and can be linked with any transaction processing system that implements the wsrep API hooks.
The Galera replication library is a protocol stack providing functionality for preparing, replicating, and applying transaction write sets. It consists of:
wsrep API specifies the interface responsibilities for DBMS and replication provider
wsrep hooks is the wsrep integration in the DBMS engine.
Galera provider implements the wsrep API for Galera library
Many components in the Galera replication library were redesigned and improved with the introduction of Galera 4.
Galera Slave Threads
Although the Galera provider certifies the write set associated with a transaction at commit time on each node in the cluster, this write set is not necessarily applied on that cluster node immediately. Instead, the write set is placed in the cluster node's receive queue on the node, and it is eventually applied by one of the cluster node's Galera slave threads.
The number of Galera slave threads can be configured with the system variable.
The Galera slave threads are able to determine which write sets are safe to apply in parallel. However, if your cluster nodes seem to have frequent consistency problems, then setting the value to 1 will probably fix the problem.
When a cluster node's state, as seen by , isJOINED, then increasing the number of slave threads may help the cluster node catch up with the cluster more quickly. In this case, it may be useful to set the number of threads to twice the number of CPUs on the system.
Streaming Replication
Streaming replication was introduced in Galera 4.
In older versions of MariaDB Cluster, there was a 2GB limit on the size of the transaction you could run. The node waits on the transaction commit before performing replication and certification. With large transactions, long-running writes, and changes to huge datasets, there was a greater possibility of a conflict forcing a rollback on an expensive operation.
Using streaming replication, the node breaks huge transactions up into smaller and more manageable fragments; it then replicates these fragments to the cluster as it works instead of waiting for the commit. Once certified, the fragment can no longer be aborted by conflicting transactions. As this can have performance consequences both during execution and in the event of rollback, it is recommended that you only use it with large transactions that are unlikely to experience conflict.
For more information on streaming replication, see the documentation.
Group Commits
Group Commit support for MariaDB Cluster was introduced in Galera 4.
In MariaDB Group Commit, groups of transactions are flushed together to disk to improve performance. In previous versions of MariaDB, this feature was not available in MariaDB Cluster, as it interfered with the global ordering of transactions for replication. MariaDB Cluster can now take advantage of Group Commit.
For more information on Group Commit, see the documentation.
gcomm- This is the option to use for a working implementation.
dummy- Used for running tests and profiling, does not do any actual replication, and all following parameters are ignored.
Cluster address
The cluster address shouldn't be empty like gcomm://. This should never be hardcoded into any configuration files.
To connect the node to an existing cluster, the cluster address should contain the address of any member of the cluster you want to join.
The cluster address can also contain a comma-separated list of multiple members of the cluster. It is good practice to list all possible members of the cluster, for example. gcomm:<node1 name or ip>,<node2 name or ip2>,<node3 name or ip>
Option list
The variable is used to set a . These parameters can also be provided (and overridden) as part of the URL. Unlike options provided in a configuration file, they will not endure and need to be resubmitted with each connection.
A useful option to set is pc.wait_prim=noto ensure the server will start running even if it can't determine a primary node. This is useful if all members go down at the same time.
Port
By default, gcomm listens on all interfaces. The port is either provided in the cluster address or will default to 4567 if not set.
This page is licensed: CC BY-SA / Gnu FDL
Configuring Auto-Eviction
Auto-Eviction enhances cluster stability by automatically removing non-responsive or "unhealthy" in MariaDB Galera Cluster. This prevents a single problematic node from degrading the entire cluster's . In a Galera Cluster, each node monitors the network response times of other nodes. If a node becomes unresponsive due to reasons like memory swapping, network congestion, or a hung process, it can delay and potentially disrupt cluster operations. Auto-Eviction provides a deterministic method to isolate these misbehaving nodes effectively.
Auto-Eviction Process
The Auto-Eviction process is based on a consensus mechanism among the healthy cluster members.
Galera Replication is a core technology enabling MariaDB Galera Cluster to provide a highly available and scalable database solution. It is characterized by its virtually synchronous replication, ensuring strong data consistency across all cluster nodes.
wsrep_sst_method
Overview
State snapshot transfer method.
DETAILS
ssl_ca
Overview
CA file in PEM format (check OpenSSL docs, implies --ssl).
Galera Replication is a multi-primary replication solution for database clustering. Unlike traditional asynchronous or semi-synchronous replication, Galera ensures that transactions are committed on all nodes (or fail on all) before the client receives a success confirmation. This mechanism eliminates data loss and minimizes replica lag, making all nodes active and capable of handling read and write operations.
2. How Galera Replication Works
The core of Galera Replication revolves around the concept of write sets and the wsrep API:
Write Set Broadcasting: When a client commits a transaction on any node in the cluster, that node (the "donor" for that specific transaction) captures the changes (the "write set") associated with that transaction. This write set is then broadcasted to all other nodes in the cluster.
Certification and Application: Each receiving node performs a "certification" test to ensure that the incoming write set does not conflict with any concurrent transactions being committed locally.
If the write set passes certification, it is applied to the local database, and the transaction is committed on that node.
If a conflict is detected, the conflicting transaction (usually the one that was executed locally) is aborted, ensuring data consistency across the cluster.
Virtually Synchronous: The term "virtually synchronous" means that while the actual data application might happen slightly after the commit on the initiating node, the commit order is globally consistent, and all successful transactions are guaranteed to be applied on all active nodes. A transaction is not truly considered committed until it has passed certification on all nodes.
wsrep API: This API defines the interface between the Galera replication library (the "wsrep provider") and the database server (MariaDB). It allows the database to expose hooks for Galera to capture and apply transaction write sets.
3. Key Characteristics
Multi-Primary (Active-Active): All nodes in a Galera Cluster can be simultaneously used for both read and write operations.
Synchronous Replication (Virtual): Data is consistent across all nodes at all times, preventing data loss upon node failures.
Automatic Node Provisioning (SST/IST): When a new node joins or an existing node rejoins, Galera automatically transfers the necessary state to bring it up to date.
State Snapshot Transfer (SST): A full copy of the database is transferred from an existing node to the joining node.
Incremental State Transfer (IST): Only missing write sets are transferred if the joining node is not too far behind.
Automatic Membership Control: Nodes automatically detect and manage cluster membership changes (nodes joining or leaving).
Galera Replication essentially transforms a set of individual MariaDB servers into a robust, highly available, and consistent distributed database system.
Using MariaDB Replication with MariaDB Galera Cluster
MariaDB Galera Cluster provides high availability with synchronous replication, while adding asynchronous replication boosts redundancy for disaster recovery or reporting.
Monitoring and Delay List: Each node in the cluster monitors the group communication response times from all its peers. If a given node fails to respond within the expected timeframes, the other nodes will add an entry for it to their internal "delayed list."
Eviction Trigger: If a majority of the cluster nodes independently add the same peer to their delayed lists, it triggers the Auto-Eviction protocol.
Eviction: The cluster evicts the unresponsive node, removing it from the cluster membership. The evicted node will enter a non-primary state and must be restarted to rejoin the cluster.
The sensitivity of this process is determined by the evs.auto_evict parameter.
Configuration
Auto-Eviction is configured by passing the evs.auto_evictparameter within the wsrep_provider_optionssystem variable in your MariaDB configuration file (my.cnf).
The value of evs.auto_evict determines the threshold for eviction. It defines how many times a peer can be placed on the delayed list before the node votes to evict it.
In the above example example, if a node registers that a peer has been delayed 5 times, it will vote to have that peer evicted from the cluster.
To disable Auto-Eviction, you can set the value to 0:
Even when disabled, the node will continue to monitor response times and log information about delayed peers; it just won't vote to evict them.
Related Parameters for Failure Detection
The Auto-Eviction feature is directly related to the EVS (Extended Virtual Synchrony) protocol parameters that control how the cluster detects unresponsive nodes in the first place. These parameters define what it means for a node to be "delayed."
Parameter
Description
Frequency of node checking for inactive peers.
Time duration after which a non-responsive node is marked as "suspect."
Time duration after which a non-responsive node is marked as "inactive" and removed.
Tuning these values in conjunction with evs.auto_evict allows you to define how aggressively the cluster will fence off struggling nodes.
The recommended strategy for creating a full, consistent backup of a MariaDB Galera Cluster is to perform the backup on a single node. Because all nodes in a healthy cluster contain the same data, a complete backup from one node represents a snapshot of the entire cluster at a specific point in time.
The preferred tool for this is . mariadb-backup creates a "hot" backup without blocking the node from serving traffic for an extended period.
The Challenge of Consistency in a Live Cluster
While taking a backup, the donor node is still receiving and applying transactions from the rest of the cluster. If the backup process is long, it's possible for the data at the end of the backup to be newer than the data at the beginning, leading to an inconsistent state within the backup files.
To prevent this, it's important to temporarily pause the node's replication stream during the backup process.
Recommended Backup Procedure
This procedure ensures a fully consistent backup with minimal impact on the cluster's availability.
1. Select a Backup Node
Choose a node from your cluster to serve as the backup source. It's a good practice to use a non-primary node if you are directing writes to a single server.
2. Desynchronize the Node (Pause Replication)
To guarantee consistency, you should temporarily pause the node's ability to apply new replicated transactions. This is done by setting the wsrep_desync to ON.
Take the selected node out of your rotation so it no longer receives application traffic.
Connect to the node with a mariadb client and execute:
The node will finish applying any transactions already in its queue and then pause, entering a . The rest of the cluster will continue to operate normally.
3. Perform the Backup
With the node's replication paused, run the mariadb-backup to create a full backup.
4. Resynchronize the Node
Once the backup is complete, you can allow the node to rejoin the cluster's replication stream.
Connect to the node again and execute:
The node will now request an from its peers to receive all the transactions it missed while it was desynchronized and quickly catch up.
Once the node is fully synced (you can verify this by checking that is ), add it back to your load balancer's rotation.
This procedure ensures you get a fully consistent snapshot of your cluster's data with zero downtime for your application.
This page is licensed: CC BY-SA / Gnu FDL
Galera Test Repositories
To facilitate development and QA, we have created some test repositories for
the Galera wsrep provider.
These are test repositories. There will be periods when they do not work at all, or work incorrectly, or possibly cause earthquakes, typhoons, and tornadoes. You have been warned.
Galera Test Repositories for YUM
Replace ${dist} in the code below for
the YUM-based distribution you are testing. Valid distributions are:
centos5-amd64
centos5-x86
centos6-amd64
Galera Test Repositories for APT
Replace ${dist} in the code below
for the APT-based distribution
you are testing. Valid ones are:
wheezy
jessie
sid
This page is licensed: CC BY-SA / Gnu FDL
Load Balancing in MariaDB Galera Cluster
While a client application can connect directly to any node in a MariaDB Galera Cluster, this is not a practical approach for a production environment. A direct connection creates a single point of failure and does not allow the application to take advantage of the cluster's high availability and read-scaling capabilities.
A load balancer or database proxy is an essential component that sits between your application and the cluster. Its primary responsibilities are:
Provide a Single Endpoint: Your application connects to the load balancer's virtual IP address, not to the individual database nodes.
Health Checks: The load balancer constantly monitors the health of each cluster node (e.g., is it Synced? is it up or down?).
Traffic Routing: It intelligently distributes incoming client connections and queries among the healthy nodes in the cluster.
Automatic Failover: If a node fails, the load balancer automatically stops sending traffic to it, providing seamless failover for your application.
Recommended Load Balancer: MariaDB MaxScale
For MariaDB Galera Cluster, the recommended load balancer is MariaDB MaxScale. Unlike a generic TCP proxy, MaxScale is a database-aware proxy that understands the Galera Cluster protocol. This allows it to make intelligent routing decisions based on the real-time state of the cluster nodes.
Common Routing Strategies
A database-aware proxy like MaxScale can be configured to use several different routing strategies.
Read-Write Splitting (Recommended)
This is the most common and highly recommended strategy for general-purpose workloads.
How it Works: The load balancer is configured to send all write operations (INSERT, UPDATE, DELETE) to a single, designated primary node. All read operations (SELECT) are then distributed across the remaining available nodes.
Advantages:
Read Connection Load Balancing
In this simpler strategy, the load balancer distributes all connections evenly across all available nodes.
How it Works: Each new connection is sent to the next available node in a round-robin fashion.
Disadvantages: This approach can easily lead to transaction conflicts if your application sends writes to multiple nodes simultaneously. It is generally only suitable for applications that are almost exclusively read-only.
Other Load Balancing Solutions
While MariaDB MaxScale is the recommended solution, other proxies and load balancers can also be used with Galera Cluster, including:
ProxySQL: Another popular open-source, database-aware proxy.
HAProxy: A very common and reliable TCP load balancer. When used with Galera, HAProxy is typically configured with a simple TCP health check or a custom script to determine node availability.
Cloud Load Balancers: Cloud providers like AWS (ELB/NLB), Google Cloud, and Azure offer native load balancing services that can be used to distribute traffic across a Galera Cluster.
wsrep_certificate_expiration_hours_warning
Overview
Print warning about certificate expiration if the X509 certificate used for wsrep connections is about to expire in hours given as an argument. If the value is 0, warnings are not printed.
Usage
The wsrep_certificate_expiration_hours_warning system variable can be set in a configuration file:
The global value of the wsrep_certificate_expiration_hours_warning system variable can also be set dynamically at runtime by executing :
When the wsrep_certificate_expiration_hours_warning system variable is set dynamically at runtime, its value will be reset the next time the server restarts. To make the value persist on restart, set it in a configuration file too.
Details
The wsrep_certificate_expiration_hours_warning system variable can be used to configure certificate expiration warnings for MariaDB Enterprise Cluster, powered by Galera:
When the wsrep_certificate_expiration_hours_warning system variable is set to 0, certificate expiration warnings are not printed to the .
When the wsrep_certificate_expiration_hours_warning system variable is set to a value N, which is greater than 0, certificate expiration warnings are printed to the MariaDB Error Log when the node's certificate expires in N
Parameters
socket.ssl_cert
Overview
Defines the path to the SSL certificate.
The wsrep_provider_options system variable applies to MariaDB Enterprise Cluster, powered by Galera and to Galera Cluster available with MariaDB Community Server. This page relates specifically to the socket.ssl_cert wsrep_provider_options.
Details
The node uses the certificate as a self-signed public key in encrypting replication traffic over SSL. You can use either an absolute path or one relative to the working directory. The file must use PEM format.
Examples
Display Current Value
wsrep_provider_options define optional settings the node passes to the wsrep provider.
To display current wsrep_provider_options values:
The expected output will display the option and the value. Options with no default value, for example SSL options, will not be displayed in the output.
Set in Configuration File
When changing a setting for a wsrep_provider_options in the config file, you must list EVERY option that is to have a value other than the default value. Options that are not explicitly listed are reset to the default value.
Options are set in the my.cnf configuration file. Use the ; delimiter to set multiple options.
The configuration file must be updated on each node. A restart to each node is needed for changes to take effect.
Use a quoted string that includes every option where you want to override the default value. Options that are not in the list will reset to their default value.
To set the option in the configuration file:
Set Dynamically
The socket.ssl_cert option cannot be set dynamically. It can only be set in the configuration file.
Trying to change a non-dynamic option with SET results in an error:
Overview of Hybrid Replication
Hybrid replication leverages standard, asynchronous MariaDB Replication to copy data from a synchronous MariaDB Galera Cluster to an external server or another cluster. This configuration establishes a one-way data flow, where the entire Galera Cluster serves as the source (primary) for one or more asynchronous replicas. This advanced setup combines the strengths of both replication methods: synchronous replication ensures high availability within the primary site, while asynchronous replication caters to specific use cases, allowing for flexible data distribution.
Common Use Cases
Implementing a hybrid replication setup is a powerful technique for solving several common business needs:
wsrep_sst_common
wsrep_sst_common Variables
The wsrep_sst_common script provides shared functionality used by various State Snapshot Transfer (SST) methods in Galera Cluster. It centralizes the handling of common configurations such as authentication credentials, SSL/TLS encryption parameters, and other security-related settings. This ensures consistent and secure communication between cluster nodes during the SST process.
Building the Galera wsrep Package on Ubuntu and Debian
The instructions on this page were used to create the galera package on the Ubuntu and Debian Linux distributions. This package contains the wsrep provider for .
The version of the wsrep provider is 25.3.5. We also provide 25.2.9 for those that need or want it. Prior to that, the wsrep version was 23.2.7.
Install prerequisites:
Certification-Based Replication
Certification-based replication uses and transaction ordering techniques to achieve synchronous replication.
Transactions execute optimistically in a , or replica, and then at commit time, they run a coordinated certification process to enforce global consistency. It achieves global coordination with the help of a broadcast service that establishes a global total order among concurrent transactions.
Requirements for Certification-Based Replication
It is not possible to implement certification-based replication for all database systems. It requires certain features of the database in order to work:
socket.ssl_ca
Overview
Defines the path to the SSL Certificate Authority (CA) file.
The wsrep_provider_options system variable applies to MariaDB Enterprise Cluster, powered by Galera and to Galera Cluster available with MariaDB Community Server. This page relates specifically to the socket.ssl_ca wsrep_provider_options.
socket.ssl
Overview
Explicitly enables TLS usage by the wsrep provider.
The wsrep_provider_options system variable applies to MariaDB Enterprise Cluster, powered by Galera and to Galera Cluster available with MariaDB Community Server. This page relates specifically to the socket.ssl wsrep_provider_options.
socket.ssl_key
Overview
Defines the path to the SSL certificate key.
The wsrep_provider_options system variable applies to MariaDB Enterprise Cluster, powered by Galera and to Galera Cluster available with MariaDB Community Server. This page relates specifically to the socket.ssl_key wsrep_provider_options.
gcs.check_appl_proto
Controls whether the node performs application-level protocol version checks when joining a cluster.
The wsrep_provider_options system variable applies to MariaDB Enterprise Cluster, powered by Galera and to Galera Cluster available with MariaDB Community Server. This page relates specifically to the gcs.check_appl_protowsrep_provider_options.
Minimizes Transaction Conflicts: By directing all writes to one node, you significantly reduce the chance of two nodes trying to modify the same row at the same time, which would lead to deadlocks and transaction rollbacks.
Maximizes Read Scalability: It fully utilizes the other nodes in the cluster for scaling out read-intensive workloads.
Use Case
Description
Disaster Recovery (DR)
Galera Cluster provides high availability and automatic failover. Use asynchronous replication for a distant replica, promoting it during site outages.
Feeding Analytics/BI Systems
Replicate from OLTP Galera Cluster to a data warehouse or analytics server to run heavy queries without affecting production performance.
Upgrades and Migrations
Use an asynchronous replica to test new MariaDB versions or migrate to new hardware with minimal downtime.
Key Challenges and Considerations
Before implementing a hybrid setup, it is critical to understand the technical challenges:
Challenge
Description
GTID Management
Galera Cluster and MariaDB Replication use different GTID formats and implementations, requiring careful configuration to avoid conflicts.
Replication Lag
The external replica experiences the usual latencies of asynchronous replication, causing it to lag behind the real-time state of the cluster.
Failover Complexity
Failover within Galera Cluster is automatic, but failing over to the asynchronous DR replica is manual and requires careful planning.
Description: Defines the authentication credentials used by the State Snapshot Transfer (SST) process, typically formatted as user:password. These credentials are essential for authenticating the SST user on the donor node, ensuring that only authorized joiner nodes can initiate and receive data during the SST operation. Proper configuration of this variable is critical to maintain the security and integrity of the replication process between Galera cluster nodes.
tca (tcert)
Description: Specifies the Certificate Authority (CA) certificate file used for SSL/TLS encryption during State Snapshot Transfers (SSTs). When encryption is enabled, this certificate allows the joining node (client) to authenticate the identity of the donor node, ensuring secure and trusted communication between them.
tcapath (tcap)
Description: Specifies the path to a directory that contains a collection of trusted Certificate Authority (CA) certificates. Instead of providing a single CA certificate file, this option allows the use of multiple CA certificates stored in separate files within the specified directory. It is useful in environments where trust needs to be established with multiple certificate authorities.
tcert (tpem)
Description: This variable stores the path to the TLS/SSL certificate file for the specific node. The certificate, typically in PEM format, is used by the node to authenticate itself to other nodes during secure SST operations. It is derived from the tcert option in the [sst] section.
tkey (tkey)
Description: Represents the private key file that corresponds to the public key certificate specified by tpem. This private key is essential for decrypting data and establishing a secure connection during State Snapshot Transfer (SST). It enables the receiving node to authenticate encrypted information and participate in secure replication within the cluster.
Example
Set in Configuration File
To configure common SST options, add them to the [sst] group in your configuration file:
Build the packages by executing build.sh under scripts/ directory with -p switch:
When finished, you will have the Debian packages for galera library and arbitrator in the parent directory.
Running galera test suite
If you want to run the galera test suite (mysql-test-run --suite=galera), you need to install the galera library as either /usr/lib/galera/libgalera_smm.so or /usr/lib64/galera/libgalera_smm.so
The node uses the CA file to verify the signature on the certificate. You can use either an absolute path or one relative to the working directory. The file must use PEM format.
Option Name
socket.ssl_ca
Default Value
"" (an empty string)
Dynamic
NO
Debug
NO
Examples
Display Current Value
wsrep_provider_options define optional settings the node passes to the wsrep provider.
To display current wsrep_provider_options values:
The expected output will display the option and the value. Options with no default value, for example SSL options, will not be displayed in the output.
Set in Configuration File
When changing a setting for a wsrep_provider_options in the config file, you must list EVERY option that is to have a value other than the default value. Options that are not explicitly listed are reset to the default value.
Options are set in the my.cnf configuration file. Use the ; delimiter to set multiple options.
The configuration file must be updated on each node. A restart to each node is needed for changes to take effect.
Use a quoted string that includes every option where you want to override the default value. Options that are not in the list will reset to their default value.
To set the option in the configuration file:
Set Dynamically
The socket.ssl_ca option cannot be set dynamically. It can only be set in the configuration file.
Trying to change a non-dynamic option with SET results in an error:
Details
The socket.ssl option is used to specify if SSL encryption should be used.
Option Name
socket.ssl
Default Value
NO
Dynamic
NO
Debug
NO
Examples
Display Current Value
wsrep_provider_options define optional settings the node passes to the wsrep provider.
To display current wsrep_provider_options values:
The expected output will display the option and the value. Options with no default value, for example SSL options, will not be displayed in the output.
Set in Configuration File
When changing a setting for a wsrep_provider_options in the config file, you must list EVERY option that is to have a value other than the default value. Options that are not explicitly listed are reset to the default value.
Options are set in the my.cnf configuration file. Use the ; delimiter to set multiple options.
The configuration file must be updated on each node. A restart to each node is needed for changes to take effect.
Use a quoted string that includes every option where you want to override the default value. Options that are not in the list will reset to their default value.
To set the option in the configuration file:
Set Dynamically
The socket.ssl option cannot be set dynamically. It can only be set in the configuration file.
Trying to change a non-dynamic option with SET results in an error:
Details
The node uses the certificate key, a self-signed private key, in encrypting replication traffic over SSL. You can use either an absolute path or one relative to the working directory. The file must use PEM format.
Option Name
socket.ssl_key
Maximum Value
"" (an empty string)
Dynamic
NO
Debug
NO
Examples
Display Current Value
wsrep_provider_options define optional settings the node passes to the wsrep provider.
To display current wsrep_provider_options values:
The expected output will display the option and the value. Options with no default value, for example SSL options, will not be displayed in the output.
Set in Configuration File
When changing a setting for a wsrep_provider_options in the config file, you must list EVERY option that is to have a value other than the default value. Options that are not explicitly listed are reset to the default value.
Options are set in the my.cnf configuration file. Use the ; delimiter to set multiple options.
The configuration file must be updated on each node. A restart to each node is needed for changes to take effect.
Use a quoted string that includes every option where you want to override the default value. Options that are not in the list will reset to their default value.
To set the option in the configuration file:
Set Dynamically
The socket.ssl_key option cannot be set dynamically. It can only be set in the configuration file.
Trying to change a non-dynamic option with SET results in an error:
Place this code block in a file at /etc/yum.repos.d/galera.repo
[galera-test]
name = galera-test
baseurl = http://yum.mariadb.org/galera/repo/rpm/${dist}
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
# run the following command:
sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xcbcb082a1bb943db 0xF1656F24C74CD1D8
# Add the following line to your /etc/apt/sources.list file:
deb http://yum.mariadb.org/galera/repo/deb ${dist} main
SHOW GLOBAL VARIABLES LIKE 'wsrep_provider_options';
Transactional Database: The database must be transactional. Specifically, it has to be able to roll back uncommitted changes.
Atomic Changes: Replication events must be able to change the database atomically. All of a series of database operations in a transaction must occur, or nothing occurs.
Global Ordering: Replication events must be ordered globally. Specifically, they are applied on all instances in the same order.
How the Process Works
Certification-Based Replication
The main idea in certification-based replication is that a transaction executes conventionally until it reaches the commit point, assuming there is no conflict. This is called optimistic execution.
When the client issues a COMMIT command, but before the actual commit occurs, all changes made to the database by the transaction and the primary keys of the changed rows are collected into a write-set. The database then sends this write-set to all of the other nodes.
The write-set then undergoes a deterministic certification test, using the primary keys. This is done on each node in the cluster, including the node that originates the write-set. It determines whether or not the node can apply the write-set.
If the certification test fails, the node drops the write-set and the cluster rolls back the original transaction. If the test succeeds, however, the transaction commits and the write-set is applied to the rest of the cluster.
Galera Cluster assigns each transaction a global ordinal sequence number, or seqno, during replication. When a transaction reaches the commit point, the node checks the sequence number against that of the last successful transaction. The interval between the two is the area of concern, given that transactions that occur within this interval have not seen the effects of each other. All transactions in this interval are checked for primary key conflicts with the transaction in question. The certification test fails if it detects a conflict.
The procedure is deterministic and all replicas receive transactions in the same order. Thus, all nodes reach the same decision about the outcome of the transaction. The node that started the transaction can then notify the client application whether or not it has committed the transaction.
Galera Cluster automatically uses the highest protocol version supported by all nodes. This prevents older nodes, which lack support for newer features, from joining or disrupting the cluster until an upgrade solution is available.
However, MySQL and MariaDB have evolved differently, and their internal protocol versions are incomparable. This incompatibility prevents a mixed-node cluster (MySQL nodes and MariaDB nodes) from forming, which blocks rolling migrations.
Migration Usage: When (e.g., Percona XtraDB Cluster) to MariaDB Galera Cluster, this parameter must be set to FALSE (OFF) on all nodes to disable the protocol check. Once the cluster is fully migrated to MariaDB, it should be set back to TRUE.
Known reporting issue in early versions
The variable may appear as OFF in plugins even though the default behavior is TRUE. Explicitly configure it to ensure the desired state during migration.
Option Name
gcs.check_appl_proto
Default Value
TRUE
Dynamic
NO
Debug
NO
Examples
Display Current Value
wsrep_provider_options define optional settings the node passes to the wsrep provider.
To display current wsrep_provider_options values:
The expected output will display the option and the value. Options with no default value will not be displayed in the output.
Set in Configuration File
When changing a setting for a wsrep_provider_options in the config file, you must list EVERY option that is to have a value other than the default value. Options that are not explicitly listed are reset to the default value.
Options are set in the my.cnf configuration file. Use the ; delimiter to set multiple options.
The configuration file must be updated on each node. A restart to each node is needed for changes to take effect.
Use a quoted string that includes every option where you want to override the default value. Options that are not in the list will reset to their default value.
To set the option in the configuration file (example for migration):
Set Dynamically
The gcs.check_appl_proto option cannot be set dynamically. It can only be set in the configuration file.
Trying to change a non-dynamic option with SET results in an error:
This system variable specifies the logical name of the cluster. Every Cluster Node that connects to each other must have the same logical name in order to form a component or join the Primary Component.
Parameters
Examples
Configuration
Set the cluster name using an options file:
Show Configuration
To view the current cluster name, use the statement:
MariaDB Galera Cluster Overview
MariaDB Enterprise Cluster is a solution designed to handle high workloads exceeding the capacity of a single server. It is based on Galera Cluster technology integrated with MariaDB Enterprise Server and includes features like data-at-rest encryption for added security. This multi-primary replication alternative is ideal for maintaining data consistency across multiple servers, providing enhanced reliability and scalability.
Overview
MariaDB Enterprise Cluster, powered by Galera, is available with MariaDB Enterprise Server. MariaDB Galera Cluster is available with MariaDB Community Server.
In order to handle increasing load and especially when that load exceeds what a single server can process, it is best practice to deploy multiple MariaDB Enterprise Servers with a replication solution to maintain data consistency between them. MariaDB Enterprise Cluster is a multi-primary replication solution that serves as an alternative to the single-primary MariaDB Replication.
An Introduction to Database Replication
Database replication is the process of continuously copying data from one database server (a "node") to another, creating a distributed and resilient system. The goal is for all nodes in this system to contain the same set of data, forming what is known as a database cluster. From the perspective of a client application, this distributed nature is often transparent, allowing it to interact with the cluster as if it were a single database.
Replication Architectures
Primary/Replica
The most common replication architecture is Primary/Replica (also known as Master/Slave). In this model:
The Primary node is the authoritative source. It is the only node that accepts write operations (e.g., INSERT, UPDATE, DELETE).
The Primary logs these changes and sends them to one or more Replica nodes.
Multi-Primary Replication
In a multi-primary system, every node in the cluster acts as a primary. This means any node can accept write operations. When a node receives an update, it automatically propagates that change to all other primary nodes in the cluster. Each primary node logs its own changes and communicates them to its peers to maintain synchronization.
Replication Protocols: Asynchronous vs. Synchronous
Beyond the architecture, the replication protocol determines how transactions are confirmed across the cluster.
Asynchronous Replication (Lazy Replication)
In asynchronous replication, the primary node commits a transaction locally first and then sends the changes to the replicas in the background. The transaction is confirmed as complete to the client immediately after it's saved on the primary. This means there is a brief period, known as replication lag, where the replicas have not yet received the latest data.
Synchronous Replication (Eager Replication)
In synchronous replication, a transaction is not considered complete (committed) until it has been successfully applied and confirmed on all participating nodes. When the client receives confirmation, it is a guarantee that the data exists consistently across the cluster.
The Trade-offs of Synchronous Replication
Advantages
Synchronous replication offers several powerful advantages over its asynchronous counterpart:
High Availability: Since all nodes are fully synchronized, if one node fails, there is zero data loss. Traffic can be immediately directed to another node without complex failover procedures, as all data replicas are guaranteed to be consistent.
Read-After-Write Consistency: Synchronous replication guarantees causality. A SELECT query issued immediately after a transaction will always see the effects of that transaction, even if the query is executed on a different node in the cluster.
Disadvantages
Traditionally, eager replication protocols coordinate nodes one operation at a time. They use a two phase commit, or distributed locking. A system with number of nodes due to process operations with a throughput of transactions per second gives you messages per second with:
What this means that any increase in the number of nodes leads to an exponential growth in the transaction response times and in the probability of conflicts and deadlock rates.
For this reason, asynchronous replication remains the dominant replication protocol for database performance, scalability and availability. Widely adopted open source databases, such as MySQL and PostgreSQL only provide asynchronous replication solutions.
Galera's Solution: Modern Synchronous Replication
Galera Cluster solves the traditional problems of synchronous replication by using a modern, certification-based approach built on several key innovations:
Group Communication: A robust messaging layer ensures that information is delivered to all nodes reliably and in the correct order, forming a solid foundation for data consistency.
Write-Set Replication: Instead of coordinating on every individual operation, database changes (writes) are grouped into a single package called a "write-set." This write-set is replicated as a single message, avoiding the high overhead of traditional two-phase commit.
Optimistic Execution: Transactions are first executed optimistically on a local node. The resulting write-set is then broadcast to the cluster for a fast, parallel certification process. If it passes certification (meaning no conflicts), it is committed on all nodes.
The certification-based replication system that Galera Cluster uses is built on these powerful approaches, delivering the benefits of synchronous replication without the traditional performance bottlenecks.
How it Works
MariaDB Enterprise Cluster is built on MariaDB Enterprise Server with Galera Cluster and MariaDB MaxScale. In MariaDB Enterprise Server 10.5 and later, it features enterprise-specific options, such as data-at-rest encryption for the write-set cache, that are not available in other Galera Cluster implementations.
As a multi-primary replication solution, any MariaDB Enterprise Server can operate as a Primary Server. This means that changes made to any node in the cluster replicate to every other node in the cluster, using certification-based replication and global ordering of transactions for the InnoDB storage engine.
MariaDB Enterprise Cluster is only available for Linux operating systems.
Architecture
There are a few things to consider when planning the hardware, virtual machines, or containers for MariaDB Enterprise Cluster.
MariaDB Enterprise Cluster architecture involves deploying with multiple instances of MariaDB Enterprise Server. The Servers are configured to use multi-primary replication to maintain consistency between themselves while routes reads and writes between them.
The application establishes a client connection to . MaxScale then routes statements to one of the MariaDB Enterprise Servers in the cluster. Writes made to any node in this cluster replicate to all the other nodes of the cluster.
When MariaDB Enterprise Servers start in a cluster:
Each Server attempts to establish network connectivity with the other Servers in the cluster
Groups of connected Servers form a component
When a Server establishes network connectivity with the Primary Component, it synchronizes its local database with that of the cluster
In planning the number of systems to provision for MariaDB Enterprise Cluster, it is important to keep cluster operation in mind. Ensuring that it has enough disk space and that it is able to maintain a Primary Component in the event of outages.
Each Server requires the minimum amount of disk space needed to store the entire database. The upper storage limit for MariaDB Enterprise Cluster is that of the smallest disk in use.
Each switch in use should have an odd number of Servers above three.
In a cluster that spans multiple switches, each data center in use should have an odd number of switches above three.
In case of planning Servers to the switch, switches to the data center, and data centers in the cluster, this model helps preserve the Primary Component. A minimum of three in use means that a single Server or switch can fail without taking down the cluster.
Using an odd number above three reduces the risk of a split-brain situation (that is, a case where two separate groups of Servers believe that they are part of the Primary Component and remain operational).
Cluster Configuration
Nodes in MariaDB Enterprise Cluster are individual MariaDB Enterprise Servers configured to perform multi-primary cluster replication. This configuration is set using a series of system variables in the configuration file.
Additional information on system variables is available in the Reference chapter.
General Configuration
The innodb_autoinc_lock_mode system variable must be set to a value of 2 to enable interleaved lock mode. MariaDB Enterprise Cluster does not support other lock modes.
Ensure also that the bind_address system variable is properly set to allow MariaDB Enterprise Server to listen for TCP/IP connections:
Cluster Name and Address
MariaDB Enterprise Cluster requires that you set a name for your cluster, using the wsrep_cluster_name system variable. When nodes connect to each other, they check the cluster name to ensure that they've connected to the correct cluster before replicating data. All Servers in the cluster must have the same value for this system variable.
Using the wsrep_cluster_address system variable, you can define the back-end protocol (always gcomm) and comma-separated list of the IP addresses or domain names of the other nodes in the cluster.
It is best practice to list all nodes on this system variable, as this is the list the node searches when attempting to reestablish network connectivity with the primary component.
Note: In certain environments, such as deployments in the cloud, you may also need to set the wsrep_node_address system variable, so that MariaDB Enterprise Server properly informs other Servers how to reach it.
Galera Replicator Plugin
MariaDB Enterprise Server connects to other Servers and replicates data from the cluster through a wsrep Provider called the Galera Replicator plugin. In order to enable clustering, specify the path to the relevant .so file using the wsrep_provider system variable.
MariaDB Enterprise Server 10.4 and later installations use an enterprise-build of the Galera Enterprise 4 plugin. This includes all the features of Galera Cluster 4 as well as enterprise features like GCache encryption.
To enable MariaDB Enterprise Cluster, use the libgalera_enterprise_smm.so library:
MariaDB Enterprise Server use the older community-release of the Galera 3 plugin. This is set using the libgalera_smm.so library:
In addition to system variables, there is a set of options that you can pass to the wsrep Provider to configure or to otherwise adjust its operations. This is done through the wsrep_provider_options system variable:
Additional information is available in the Reference chapter.
Cluster Replication
MariaDB Enterprise Cluster implements a multi-primary replication solution.
When you write to a table on a node, the node collects the write into a write-set transaction, which it then replicates to the other nodes in the cluster.
Your application can write to any node in the cluster. Each node certifies the replicated write-set. If the transaction has no conflicts, the nodes apply it. If the transaction does have conflicts, it is rejected and all of the nodes revert the changes.
Quorum
The first node you start in MariaDB Enterprise Cluster bootstraps the Primary Component. Each subsequent node that establishes a connection joins and synchronizes with the Primary Component. A cluster achieves a quorum when more than half the nodes are joined to the Primary Component.
When a component forms that has less than half the nodes in the cluster, it becomes non-operational, since it believes there is a running Primary Component to which it has lost network connectivity.
These quorum requirements, combined with the requisite number of odd nodes, avoid a split brain situation, or one in which two separate components believe they are each the Primary Component.
Dynamically Bootstrapping the Cluster
In cases where the cluster goes down and your nodes become non-operational, you can dynamically bootstrap the cluster.
First, find the most up-to-date node (that is, the node with the highest value for the wsrep_last_committed status variable):
Once you determine the node with the most recent transaction, you can designate it as the Primary Component by running the following on it:
The node bootstraps the Primary Component onto itself. Other nodes in the cluster with network connectivity then submit state transfer requests to this node to bring their local databases into sync with what's available on this node.
State Transfers
From time to time a node can fall behind the cluster. This can occur due to expensive operations being issued to it or due to network connectivity issues that lead to write-sets backing up in the queue. Whatever the cause, when a node finds that it has fallen too far behind the cluster, it attempts to initiate a state transfer.
In a state transfer, the node connects to another node in the cluster and attempts to bring its local database back in sync with the cluster. There are two types of state transfers:
Incremental State Transfer (IST)
State Snapshot Transfer (SST)
When the donor node receives a state transfer request, it checks its write-set cache (that is, the GCache) to see if it has enough saved write-sets to bring the joiner into sync. If the donor node has the intervening write-sets, it performs an IST operation, where the donor node only sends the missing write-sets to the joiner. The joiner applies these write-sets following the global ordering to bring its local databases into sync with the cluster.
When the donor does not have enough write-sets cached for an IST, it runs an SST operation. In an SST, the donor uses a backup solution, like MariaDB Enterprise Backup, to copy its data directory to the joiner. When the joiner completes the SST, it begins to process the write-sets that came in during the transfer. Once it's in sync with the cluster, it becomes operational.
IST's provide the best performance for state transfers and the size of the GCache may need adjustment to facilitate their use.
Flow Control
MariaDB Enterprise Server uses Flow Control to sometimes throttle transactions and ensure that all nodes work equitably.
Write-sets that replicate to a node are collected by the node in its received queue. The node then processes the write-sets according to global ordering. Large transactions, expensive operations, or simple hardware limitations can lead to the received queue backing up over time.
When a node's received queue grows beyond certain limits, the node initiates Flow Control. In Flow Control, the node pauses replication to work through the write-sets it already has. Once it has worked the received queue down to a certain size, it re-initiates replication.
Eviction
A node or nodes will be removed or evicted from a cluster if it becomes non-responsive.
In MariaDB Enterprise Cluster, each node monitors network connectivity and response times from every other node. MariaDB Enterprise Cluster evaluates network performance using the EVS Protocol.
When a node finds another to have poor network connectivity, it adds an entry to the delayed list. If the node becomes active again and the network performance improves for a certain amount of time, an entry for it is removed from the delayed list. That is, the longer a node has network problems the longer it has to be active again to be removed from the delayed list.
If the number of entries for a node in the delayed list exceeds a threshold established for the cluster, the EVS Protocol evicts the node from the cluster.
Evicted nodes become non-operational components. They cannot rejoin the cluster until you restart MariaDB Enterprise Server.
Streaming Replication
Under normal operation, huge transactions and long-running transactions are difficult to replicate. MariaDB Enterprise Cluster rejects conflicting transactions and rolls back the changes. A transaction that takes several minutes or longer to run can encounter issues if a small transaction is run on another node and attempts to write to the same table. The large transaction fails because it encounters a conflict when it attempts to replicate.
MariaDB Enterprise Server 10.4 and later support streaming replication for MariaDB Enterprise Cluster. In streaming replication, huge transactions are broken into transactional fragments, which are replicated and applied as the operation runs. This makes it more difficult for intervening sessions to introduce conflicts.
Initiate Streaming Replication
To initiate streaming replication, set the wsrep_trx_fragment_unit and wsrep_trx_fragment_size system variables. You can set the unit to BYTES, ROWS, or STATEMENTS:
Then, run your transaction.
Streaming replication works best with very large transactions where you don't expect to encounter conflicts. If the statement does encounter a conflict, the rollback operation is much more expensive than usual. As such, it's best practice to enable streaming replication at a session-level and to disable it by setting the wsrep_trx_fragment_size system variable to 0 when it's not needed.
Galera Arbitrator
Deployments on mixed hardware can introduce issues where some MariaDB Enterprise Servers perform better than others. A Server in one part of the world might perform more reliably or be physically closer to most users than others. In cases where a particular MariaDB Enterprise Server holds logical significance for your cluster, you can weight its value in quorum calculations.
Galera Arbitrator is a separate process that runs alongside MariaDB Enterprise Server. While the Arbitrator does not take part in replication, whenever the cluster performs quorum calculations it gives the Arbitrator a vote as though it were another MariaDB Enterprise Server. In effect this means that the system has the vote of MariaDB Enterprise Server plus any running Arbitrators in determining whether it's part of the Primary Component.
Bear in mind that the Galera Arbitrator is a separate package, galera-arbitrator-4, which is not installed by default with MariaDB Enterprise Server.
Scale-out
MariaDB Enterprise Servers that join a cluster attempt to connect to the IP addresses provided to the wsrep_cluster_address system variable. This variable adjusts itself at runtime to include the addresses of all connected nodes.
To scale-out MariaDB Enterprise Cluster, start new MariaDB Enterprise Servers with the appropriate wsrep_cluster_address list and the same wsrep_cluster_name value. The new nodes establish network connectivity with the running cluster and request a state transfer to bring their local database into sync with the cluster.
Once the MariaDB Enterprise Server reports itself as being in sync with the cluster, MariaDB MaxScale can begin including it in the load distribution for the cluster.
Being a multi-primary replication solution means that any MariaDB Enterprise Server in the cluster can handle write operations, but write scale-out is minimal as every Server in the cluster needs to apply the changes.
Failover
MariaDB Enterprise Cluster does not provide failover capabilities on its own. is used to route client connections to MariaDB Enterprise Server.
Unlike a traditional load balancer, is aware of changes in the node and cluster states.
MaxScale takes nodes out of the distribution that initiate a blocking SST operation or Flow Control or otherwise go down, which allows them to recover or catch up without stopping service to the rest of the cluster.
Backups
With MariaDB Enterprise Cluster, each node contains a replica of all the data in the cluster. As such, you run on any node to back up the available data. The process for backing up a node is the same as for a single MariaDB Enterprise Server.
Encryption
MariaDB Enterprise Server supports data-at-rest encryption to secure data on disk, and data-in-transit encryption to secure data on the network.
MariaDB Enterprise Server support data-at-rest encryption of the GCache, the file used by Galera systems to cache write sets. Encrypting GCache ensures the Server encrypts both data it temporarily caches from the cluster as well as the data it permanently stores in tablespaces.
For data-in-transit, MariaDB Enterprise Cluster supports encryption the same as MariaDB Server and additionally provides data-in-transit encryption for Galera replication traffic and for State Snapshot Transfer (SST) traffic.
Data-in-Transit Encryption
MariaDB Enterprise Server 10.6 encrypts Galera replication and SST traffic using the server's TLS configuration by default. With the wsrep_ssl_mode system variable, you can configure the node to use the TLS configuration of .
MariaDB Enterprise Server 10.5 and earlier support encrypting Galera replication and SST traffic through .
TLS encryption is only available when used by all nodes in the cluster.
Enabling GCache Encryption
To encrypt data-at-rest such as GCache, stop the server, set encrypt_binlog=ON within the MariaDB Enterprise Server configuration file, and restart the server. This variable also controls encryption of the binary log and the relay log when used.
Disabling GCache Encryption
To stop using encryption on the GCache file, stop the server, set encrypt_binlog=OFF within the MariaDB Enterprise Server configuration file, and restart the server. This variable also controls encryption of the binary log and the relay log when used.
Complete Galera Cluster reference: active-active multi-primary replication, IST vs SST (gcache.size), galera_new_cluster bootstrap, pc.bootstrap recovery.
MariaDB Galera Cluster is a Linux-exclusive, multi-primary cluster designed for MariaDB, offering features such as active-active topology, read/write capabilities on any node, automatic membership and node joining, true parallel replication at the row level, and direct client connections, with an emphasis on the native MariaDB experience.
About
galera_small
MariaDB Galera Cluster is a virtually synchronous multi-primary cluster for MariaDB. It is available on Linux only and only supports the (although there is experimental support for and, from , . See the system variable, or, from , the system variable.
Features
Active-active multi-primary topology
Read and write to any cluster node
Benefits
The above features yield several benefits for a DBMS clustering solution, including:
No replica lag
No lost transactions
Read scalability
Smaller client latencies
The page has instructions on how to get up and running with MariaDB Galera Cluster.
A great resource for Galera users is (codership-team'at'googlegroups(dot)com) - If you use Galera, it is recommended you subscribe.
Galera Versions
MariaDB Galera Cluster is powered by:
MariaDB Server.
The .
The functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the package. The following version corresponds to each MariaDB Server version:
In and later, MariaDB Galera Cluster uses 4. This means that the wsrep API version is 26 and the is version 4.X.
In and before, MariaDB Galera Cluster uses 3. This means that the wsrep API is version 25 and the is version 3.X.
See for more information about how to interpret these version numbers.
Galera 4 Versions
The following table lists each version of the 4 wsrep provider, and it lists which version of MariaDB each one was first released in. If you would like to install 4 using , , or , then the package is called galera-4.
Galera Version
Released in MariaDB Version
Cluster Failure and Recovery Scenarios
While a Galera Cluster is designed for high availability, various scenarios can lead to node or cluster outages. This guide describes common failure situations and the procedures to safely recover from them.
Graceful Shutdown Scenarios
This covers situations where nodes are intentionally stopped for maintenance or configuration changes, based on a three-node cluster.
One Node is Gracefully Stopped
When one node is stopped, it sends a message to the other nodes, and the cluster size is reduced. Properties like Quorum calculation are automatically adjusted. As soon as the node is started again, it rejoins the cluster based on its wsrep_cluster_address variable.
If the write-set cache (gcache.size) on a donor node still has all the transactions that were missed, the node will rejoin using a fast Incremental State Transfer (IST). If not, it will automatically fall back to a full State Snapshot Transfer (SST).
Two Nodes Are Gracefully Stopped
The single remaining node forms a Primary Component and can serve client requests. To bring the other nodes back, you simply start them.
However, the single running node must act as a Donor for the state transfer. During the SST, its performance may be degraded, and some load balancers may temporarily remove it from rotation. For this reason, it's best to avoid running with only one node.
All Three Nodes Are Gracefully Stopped
When the entire cluster is shut down, you must bootstrap it from the most advanced node to prevent data loss.
Identify the most advanced node: On each server, check the seqno value in the /var/lib/mysql/grastate.dat file. The node with the highest seqno was the last to commit a transaction.
Bootstrap from that node: Use the appropriate MariaDB script to start a new cluster from this node only.
Bash
Unexpected Node Failure (Crash) Scenarios
This covers situations where nodes become unavailable due to a power outage, hardware failure, or software crash.
One Node Disappears from the Cluster
If one node crashes, the two remaining nodes will detect the failure after a timeout period and remove the node from the cluster. Because they still have Quorum (2 out of 3), the cluster continues to operate without service disruption. When the failed node is restarted, it will rejoin automatically as described above.
Two Nodes Disappear from the Cluster
The single remaining node cannot form a Quorum by itself. It will switch to a non-Primary state and refuse to serve queries to protect data integrity. Any query attempt will result in an error:
Recovery:
If the other nodes come back online, the cluster will re-form automatically.
If the other nodes have permanently failed, you must manually force the remaining node to become a new Primary Component. Warning: Only do this if you are certain the other nodes are permanently down.
All Nodes Go Down Without a Proper Shutdown
In a datacenter power failure or a severe bug, all nodes may crash. The grastate.dat file will not be updated correctly and will show seqno: -1.
Recovery:
On each node, run mysqld with the --wsrep-recover option. This will read the database logs and report the node's last known transaction position (GTID).
Bash
Compare the sequence numbers from the recovered position on all nodes.
Recovering from a Split-Brain Scenario
A split-brain occurs when a network partition splits the cluster, and no resulting group has a Quorum. This is most common with an even number of nodes. All nodes will become non-Primary.
Recovery:
Choose one of the partitioned groups to become the new Primary Component.
On one node within that chosen group, manually force it to bootstrap:
This group will now become operational. When network connectivity is restored, the nodes from the other partition will automatically detect this Primary Component and rejoin it.
Never execute the bootstrap command on both sides of a partition. This will create two independent, active clusters with diverging data, leading to severe data inconsistency.
See Also
(codership-team 'at' googlegroups (dot) com) - A great mailing list for Galera users.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB Galera Cluster Usage Guide
MariaDB Galera Cluster usage guide
Quickstart Guide: MariaDB Galera Cluster Usage
This guide provides essential information for effectively using and interacting with a running MariaDB Galera Cluster. It covers connection methods, operational considerations, monitoring, and best practices for applications.
1. Connecting to the Cluster
Since Galera Cluster is multi-primary, any node can accept read and write connections.
a. Using a Load Balancer (Recommended for Production):
Deploying a load balancer or proxy (like MariaDB MaxScale, ProxySQL, or HAProxy) is the recommended approach.
MariaDB MaxScale: Provides intelligent routing (e.g., readwritesplit, router), connection pooling, and advanced cluster awareness (e.g., binlogrouter for replication clients, switchover for failover).
Other Load Balancers: Configure them to distribute connections across your Galera nodes, typically using health checks on port 3306 or other cluster-specific checks.
b. Direct Connection:
You can connect directly to any individual node's IP address or hostname using standard MariaDB client tools or connectors (e.g., mariadb command-line client, MariaDB Connector/J, Connector/Python).
Example (Command-line):Bash
While simple, this method lacks automatic failover; your application would need to handle connection retries and failover logic.
2. Basic Operations (Reads & Writes)
Active-Active: You can perform both read and write operations on any node in the cluster. All successful write operations are synchronously replicated to all other nodes.
Transactions: Standard SQL transactions (START TRANSACTION, COMMIT, ROLLBACK) work as expected. Galera handles the replication of committed transactions.
3. DDL (Data Definition Language) Operations
DDL operations (like CREATE TABLE, ALTER TABLE, DROP TABLE) require special attention in a synchronous multi-primary cluster to avoid conflicts and outages.
Total Order Isolation (TOI) - Default:
This is Galera's default DDL method.
The DDL statement is executed on all nodes in the same order, and it temporarily blocks other transactions on all nodes while it applies.
4. Monitoring Cluster Status
Regularly monitor your Galera Cluster to ensure its health and consistency.
wsrep_cluster_size: Number of nodes currently in the Primary Component.SQL
Expected value: the total number of nodes configured (e.g., 3).
wsrep_local_state_comment / wsrep_local_state: The state of the current node.
5. Handling Node Failures and Recovery
Galera Cluster is designed for automatic recovery, but understanding the process is key.
Node Failure: If a node fails, the remaining nodes continue to operate as the Primary Component. The failed node will automatically attempt to rejoin when it comes back online.
Split-Brain Scenarios: If the network partitions the cluster, nodes will try to form a "Primary Component." The partition with the majority of nodes forms the new Primary Component. If no majority can be formed (e.g., a 2-node cluster splits), the cluster will become inactive. A 3-node or higher cluster is recommended to avoid this.
Manual Bootstrapping (Last Resort): If the entire cluster goes down or a split-brain occurs where no Primary Component forms, you might need to manually "bootstrap" a new Primary Component from one of the healthy nodes.
6. Application Best Practices
Use Connection Pooling: Essential for managing connections efficiently in high-traffic applications.
Short Transactions: Keep transactions as short and concise as possible to minimize conflicts and improve throughput. Long-running transactions increase the risk of rollbacks due to certification failures.
Primary Keys: All tables should have a primary key. Galera relies on primary keys for efficient row-level replication. Tables without primary keys can cause performance degradation and issues.
By following these guidelines, you can effectively manage and operate your MariaDB Galera Cluster for high availability and performance.
Further Resources:
Performing Schema Upgrades in Galera Cluster
Performing schema changes (i.e., Data Definition Language or DDL statements like ALTER TABLE, CREATE INDEX) in a MariaDB Galera Cluster requires special handling. Because Galera is a multi-primary cluster where all nodes must remain in sync, a schema change on one node must be safely replicated to all other nodes without causing inconsistencies or blocking the entire cluster for an extended period.
MariaDB Galera Cluster provides two methods for handling schema upgrades:
Method
Description
Total Order Isolation (TOI)
Default and safest method. The DDL statement is replicated to all nodes, blocking the entire cluster until all preceding transactions complete.
The method used is controlled by the wsrep_OSU_method .
Total Order Isolation (TOI)
Total Order Isolation is the default method for schema upgrades (wsrep_OSU_method = 'TOI'). It ensures maximum data consistency by treating the DDL statement like any other .
How TOI Works
When you execute a DDL statement, such as ALTER TABLE..., on any node in a cluster, the following process occurs:
Replication: The statement is replicated across all nodes in the cluster.
Transaction Wait: Each node waits for any pre-existing transactions to complete before proceeding.
Execution: Once caught up, the node executes the DDL statement.
Advantages of TOI
Simplicity and Safety: It is the easiest and safest method. It guarantees that the schema is identical on all nodes at all times.
Consistency: There is no risk of data drifting or replication errors due to schema mismatches.
Disadvantages of TOI
A major drawback of DDL statements is that they , preventing any node from processing write transactions during a schema change. This can lead to significant application downtime, especially for large tables that take a long time to alter.
When to Use TOI
TOI is the recommended method for:
Schema changes that are known to be very fast.
Environments where a short period of cluster-wide write unavailability is acceptable.
Situations where schema consistency is the absolute highest priority.
Rolling Schema Upgrade (RSU)
Rolling Schema Upgrade is a non-blocking method (wsrep_OSU_method = 'RSU') that allows you to perform schema changes without taking the entire cluster offline.
How RSU Works
The RSU method tells the cluster to not replicate the DDL statement. The change is only applied to the local node where you execute the command. It is then the administrator's responsibility to apply the same change to the other nodes one by one.
Steps to Apply Schema Changes to a Cluster
Set the RSU Method:
On the first node, set the session to RSU mode:
SET SESSION wsrep_OSU_method = 'RSU';
Remove the Node from Rotation:
Remove the node from the to stop it from receiving traffic.
Apply the Schema Change:
Execute the DDL statement (e.g.,
Advantages of RSU
: The cluster remains online and available to serve traffic throughout the entire process, as you only ever take one node out of rotation at a time.
No Cluster-Wide Blocking: Application writes can continue on the active nodes.
Disadvantages of RSU
Complexity and Risk: The process is manual and more complex, which introduces a higher risk of human error.
Temporary Inconsistency: For the duration of the upgrade, your cluster will have a mixed schema, where some nodes have the old schema and others have the new one. This can cause if a transaction that relies on the new schema is sent to a node that has not yet been upgraded.
When to Use RSU
RSU is the best method for:
Applying long-running schema changes to large tables where cluster downtime is not acceptable.
Environments where high availability is the top priority.
It requires careful planning and a good understanding of your application's queries to ensure that no replication errors occur during the upgrade process.
This page is licensed: CC BY-SA / Gnu FDL
Quorum Control with Weighted Votes
This page is a deep-dive into the advanced feature of weighted quorum. For a general overview of Quorum, its role in monitoring, and basic recovery, see Understanding Quorum, Monitoring, and Recovery.
MariaDB Galera Cluster supports a weighted quorum, where each node can be assigned a weight in the range of 0 to 255, with which it will participate in quorum calculations. This provides fine-grained control over which nodes are most critical for forming a Primary Component, especially in complex or geographically distributed topologies.
By default, every node has a weight of 1. You can customize a node's weight during runtime by setting the pc.weight :
Quorum Calculation with Weights
The quorum is preserved if, and only if, the sum of the weights of the nodes in a new component is strictly more than half the total weight of the preceding Primary Component (minus any nodes that left gracefully).
The formal calculation is:
Where:
: Members of the last seen Primary Component.
: Members that are known to have left gracefully.
: Members of the current component being evaluated.
Changing a node's weight is a cluster-wide membership event. If a occurs at the exact moment a weight-change message is being delivered, it can lead to a corner case where the entire cluster becomes .
Practical Examples of Weighted Quorum
Prioritizing a Primary Node
In a three-node cluster, to make node1 the most critical for maintaining the :
node1: pc.weight = 2
node2: pc.weight = 1
If node2 and node3 fail, node1 remains primary. If node1 fails, the other two nodes become non-primary.
Simple Primary/Replica Failover
In a two-node cluster, to ensure node1 is always the primary in case of a network split:
node1 (Primary): pc.weight = 1
node2 (Replica): pc.weight = 0
Primary and Secondary Site Scenario
For a with two nodes at a primary site and two at a secondary site:
Primary Site:
node1: pc.weight = 2
node2: pc.weight = 2
Secondary Site:
node3: pc.weight = 1
node4: pc.weight = 1
If the secondary site or the fails, the primary site maintains quorum. Additionally, one node at the primary site can fail without causing an outage.
This page is licensed: CC BY-SA / Gnu FDL
Securing Communications in Galera Cluster
By default, Galera Cluster replicates data between each node without encrypting it. This is generally acceptable when the cluster nodes runs on the same host or in networks where security is guaranteed through other means. However, in cases where the cluster nodes exist on separate networks or they are in a high-risk network, the lack of encryption does introduce security concerns as a malicious actor could potentially eavesdrop on the traffic or get a complete copy of the data by triggering an SST.
To mitigate this concern, Galera Cluster allows you to encrypt data in transit as it is replicated between each cluster node using the Transport Layer Security (TLS) protocol. TLS was formerly known as Secure Socket Layer (SSL), but strictly speaking the SSL protocol is a predecessor to TLS and, that version of the protocol is now considered insecure. The documentation still uses the term SSL often and for compatibility reasons TLS-related server system and status variables still use the prefix ssl_, but internally, MariaDB only supports its secure successors.
In order to secure connections between the cluster nodes, you need to ensure that all servers were compiled with TLS support. See to determine how to check whether a server was compiled with TLS support.
For each cluster node, you also need a certificate, private key, and the Certificate Authority (CA) chain to verify the certificate. If you want to use self-signed certificates that are created with OpenSSL, then see for information on how to create those.
Securing Galera Cluster Replication Traffic
In order to enable TLS for Galera Cluster's replication traffic, there are a number of that you need to set, such as:
You need to set the path to the server's certificate by setting the wsrep_provider_option.
You need to set the path to the server's private key by setting the wsrep_provider_option.
You need to set the path to the certificate authority (CA) chain that can verify the server's certificate by setting the wsrep_provider_option.
It is also a good idea to set MariaDB Server's regular TLS-related system variables, so that TLS will be enabled for regular client connections as well. See for information on how to do that.
For example, to set these variables for the server, add the system variables to a relevant server in an :
And then restart the server to make the changes persistent.
By setting both MariaDB Server's TLS-related system variables and Galera Cluster's TLS-related wsrep_provider_options, the server can secure both external client connections and Galera Cluster's replication traffic.
Securing State Snapshot Transfers
The method that you would use to enable TLS for would depend on the value of .
mariadb-backup
See for more information.
xtrabackup-v2
See : TLS for more information.
mysqldump
This SST method simply uses the (previously mysqldump) utility, so TLS would be enabled by following the guide at
rsync
This SST method supports encryption in transit via . See for more information.
This page is licensed: CC BY-SA / Gnu FDL
Upgrading from MariaDB 10.5 to MariaDB 10.6 with Galera Cluster
Galera Cluster ships with the MariaDB Server. Upgrading a Galera Cluster node is very similar to upgrading a server from to . For more information on that process as well as incompatibilities between versions, see the .
Performing a Rolling Upgrade
The following steps can be used to perform a rolling upgrade from to when using Galera Cluster. In a rolling upgrade, each node is upgraded individually, so the cluster is always operational. There is no downtime from the application's perspective.
First, before you get started:
First, take a look at to see what has changed between the major versions.
Check whether any system variables or options have been changed or removed. Make sure that your server's configuration is compatible with the new MariaDB version before upgrading.
Check whether replication has changed in the new MariaDB version in any way that could cause issues while the cluster contains upgraded and non-upgraded nodes.
Before you upgrade, it would be best to take a backup of your database. This is always a good idea to do before an upgrade. We would recommend .
Then, for each node, perform the following steps:
1
Modify the repository configuration, so the system's package manager installs
see for more information.
see for more information.
see for more information.
When this process is done for one node, move onto the next node.
When upgrading the Galera wsrep provider, sometimes the Galera protocol version can change. The Galera wsrep provider should not start using the new protocol version until all cluster nodes have been upgraded to the new version, so this is not generally an issue during a rolling upgrade. However, this can cause issues if you restart a non-upgraded node in a cluster where the rest of the nodes have been upgraded.
This page is licensed: CC BY-SA / Gnu FDL
Installing Galera from Source
There are binary installation packages available for RPM and Debian-based distributions, which will pull in all required Galera dependencies.
If these are not available, you will need to build Galera from source.
The wsrep API for Galera Cluster is included by default. Follow the usual instructions
Preparation
make cannot manage dependencies for the build process, so the following packages need to be installed first:
Upgrading from MariaDB 10.4 to MariaDB 10.5 with Galera Cluster
ships with the MariaDB Server. Upgrading a Galera Cluster node is very similar to upgrading a server from to . For more information on that process as well as incompatibilities between versions, see the .
Performing a Rolling Upgrade
The following steps can be used to perform a rolling upgrade from to when using Galera Cluster. In a rolling upgrade, each node is upgraded individually, so the cluster is always operational. There is no downtime from the application's perspective.
Recovering a Primary Component
In a MariaDB Galera Cluster, an individual node is considered to have "failed" when it loses communication with the cluster's . This can happen for many reasons, including hardware failure, a software crash, loss of network connectivity, or a critical error during a .
From the perspective of the cluster, a node has failed when the other members can no longer see it. From the perspective of the failed node itself (assuming it hasn't crashed), it has simply lost its connection to the Primary Component and will enter a non-operational state to protect data integrity.
The EVS Protocol
Node failure detection is handled automatically by Galera's
Understanding Quorum, Monitoring, and Recovery
Quorum is essential for maintaining data consistency in a MariaDB Galera Cluster by safeguarding against network partitions or . It ensures that the cluster processes database queries and transactions only when a majority of are operational, healthy, and in communication.
Primary Component
This majority group is known as the Primary Component. Nodes not in this group switch to a non-primary state, halting queries and entering a read-only "safe mode" to prevent data discrepancies. The primary function of Quorum is to avoid "" scenarios, which occur when network partitions lead to parts of the cluster operating independently and accepting writes. By ensuring only the partition with a majority of nodes becomes the Primary Component, Quorum effectively prevents these inconsistencies.
Using MariaDB Replication with MariaDB Galera Cluster
and can be used together. However, there are some things that have to be taken into account.
Tutorials
If you want to use and together, then the following tutorials may be useful:
Manual SST of Galera Cluster Node With mariadb-backup
Perform a manual node provision. This guide details the steps to manually backup a donor and restore it to a joiner node in a Galera Cluster.
Sometimes it can be helpful to perform a "manual SST" when Galera's fail. This can be especially useful when the cluster's is very large, since a normal SST can take a long time to fail in that case.
A manual SST essentially consists of taking a backup of the donor, loading the backup on the joiner, and then manually editing the cluster state on the joiner node. This page will show how to perform this process with .
Process
1
Resetting the Quorum (Cluster Bootstrap)
This page provides a step-by-step guide for an emergency recovery procedure. For a general overview of what Quorum is and how to monitor it, see
When a network failure or a crash affects over half of your cluster nodes, the cluster might lose its . In such cases, the remaining nodes may return an Unknown command error for many queries. This behavior is a safeguard to prevent data inconsistency.
You can confirm this by checking the wsrep_cluster_status on all nodes:
Introduction to Galera Architecture
MariaDB Galera Cluster provides a synchronous replication system that uses an approach often called eager replication. In this model, nodes in a cluster synchronize with all other nodes by applying replicated updates as a single transaction. This means that when a transaction COMMITs, all nodes in the cluster have the same value. This process is accomplished using through a group communication framework.
Core Architectural Components
The internal architecture of MariaDB Galera Cluster revolves around four primary components:
Using MariaDB GTIDs with MariaDB Galera Cluster
MariaDB's are very useful when used with , which is primarily what that feature was developed for. , on the other hand, was developed by Codership for all MySQL and MariaDB variants, and the initial development of the technology pre-dated MariaDB's implementation. As a side effect, (at least until ) only partially supports MariaDB's implementation.
GTID Support for Write Sets Replicated by Galera Cluster
Galera Cluster has its own that is substantially different from . However, it would still be beneficial if was able to associate a Galera Cluster write set with a
Known Limitations
This article contains information on known problems and limitations of MariaDB Galera Cluster.
Limitations from codership.com:
Currently, replication works only with the . Any writes to tables of other types, including system (MySQL) tables are not replicated (this limitation excludes DDL statements such as
[mariadb]
...
# warn 3 days before certificate expiration
wsrep_certificate_expiration_hours_warning=72
SET GLOBAL wsrep_certificate_expiration_hours_warning=72;
SHOW GLOBAL VARIABLES LIKE 'wsrep_provider_options';
Resume Processing: After execution, the node can process new transactions.
ALTER TABLE...
) on the isolated node.
Return the Node to Rotation:
Once the ALTER statement is complete, add the node back to the load balancer.
Repeat for All Nodes:
Repeat steps 1-4 for each node in the cluster, one at a time, until all nodes have the updated schema.
Rolling Schema Upgrade (RSU)
Advanced, non-blocking method. The DDL is executed on the local node, with changes applied manually to each node in sequence, keeping the cluster online.
It ensures consistency but can cause brief pauses in application activity, especially on busy clusters.
Best Practice: Execute DDL during maintenance windows or low-traffic periods.
Rolling Schema Upgrade (RSU) / Percona's pt-online-schema-change:
For large tables or critical production systems, use tools like pt-online-schema-change (from Percona Toolkit) which performs DDL without blocking writes.
This tool works by creating a new table, copying data, applying changes, and then swapping the tables. It's generally preferred for minimizing downtime for ALTER TABLE operations.
wsrep_OSU_method:
This system variable controls how DDL operations are executed.
TOI (default): Total Order Isolation.
RSU: Rolling Schema Upgrade (requires manual steps with pt-online-schema-change).
NBO (Non-Blocking Operations): A newer method allowing non-blocking DDL for some operations, but not fully implemented for all DDL types. Use with caution and test thoroughly.
Synced (4): Node is fully synchronized and operational.
Donor/Desync (2): Node is transferring state to another node.
Joining (1): Node is in the process of joining the cluster.
Donor/Stalled (1): Node is stalled.
wsrep_incoming_addresses: List of incoming connections from other cluster nodes.
wsrep_cert_deps_distance: Indicates flow control. A high value suggests that this node is falling behind and flow control may activate.
wsrep_flow_control_paused: Percentage of time the node was paused due to flow control. High values indicate a bottleneck.
wsrep_local_recv_queue / wsrep_local_send_queue: Size of the receive/send queue. Ideally, these should be close to 0. Sustained high values indicate replication lag or node issues.
Choose the node that was most up-to-date.
Stop MariaDB on that node.
Start it with: sudo galera_new_cluster or sudo systemctl start mariadb --wsrep-new-cluster.
Start other nodes normally; they will rejoin the bootstrapped component.
Retry Logic: Implement retry logic in your application for failed transactions (e.g., due to certification failures, deadlock, or temporary network issues).
Connect to a Load Balancer: Always direct your application's connections through a load balancer or proxy to leverage automatic failover and intelligent routing.
The Replicas receive the stream of changes and apply them to their own copy of the data. Replicas are typically used for read-only queries, backups, or as a hot standby for failover.
Transaction Reordering: This clever technique allows for the reordering of non-conflicting transactions before they are committed, which significantly increases parallelism and reduces the rate of transaction rollbacks.
As a member of the Primary Component, the Server becomes operational — able to accept read and write queries from clients
During startup, the Primary Component is the Server bootstrapped to run as the Primary Component. Once the cluster is online, the Primary Component is any combination of Servers which includes a minimum of more than half the total number of Servers.
A Server or group of Servers that loses network connectivity with the majority of the cluster becomes non-operational.
In a cluster that spans multiple data centers, use an odd number of data centers above three.
Each data center in use should have at least one Server dedicated to backup operations. This can be another cluster node or a separate Replica Server kept in sync using MariaDB Replication.
Check whether any new features have been added to the new MariaDB version. If a new feature in the new MariaDB version cannot be replicated to the old MariaDB version, then do not use that feature until all cluster nodes have been upgrades to the new MariaDB version.
Next, make sure that the Galera version numbers are compatible.
If you are upgrading from the most recent release to , then the versions will be compatible.
You want to have a large enough gcache to avoid a State Snapshot Transfer (SST) during the rolling upgrade. The gcache size can be configured by setting gcache.size for example:wsrep_provider_options="gcache.size=2G"
2
If you use a load balancing proxy such as or HAProxy, make sure to drain the server from the pool so it does not receive any new connections.
3
.
4
Uninstall the old version of MariaDB and the Galera wsrep provider.
sudoapt-getremovemariadb-servergalera
sudoyumremoveMariaDB-servergalera
sudozypperremoveMariaDB-servergalera
5
Install the new version of MariaDB and the Galera wsrep provider.
see for more information.
see for more information.
see for more information.
6
Make any desired changes to configuration options in , such as my.cnf. This includes removing any system variables or options that are no longer supported.
7
On Linux distributions that use systemd you may need to increase the service startup timeout as the default timeout of 90 seconds may not be sufficient. See for more information.
8
.
9
Run with the --skip-write-binlog option.
mysql_upgrade does two things:
Ensures that the system tables in the database are fully compatible with the new version.
Does a very quick check of all tables and marks them as compatible with the new version of MariaDB.
RPM-based:
Debian-based:
If running on an alternative system, or the commands are available, the following packages are required. You will need to check the repositories for the correct package names on your distribution—these may differ between distributions or require additional packages:
MariaDB Database Server with wsrep API
Git, CMake (on Fedora, both cmake and cmake-fedora are required), GCC and GCC-C++, Automake, Autoconf, and Bison, as well as development releases of libaio and ncurses.
Building
You can use Git to download the source code, as MariaDB source code is available through GitHub.
1. Clone the repository:
Check out the branch (e.g., 10.5-galera or 11.1-galera), for example:
Building the Database Server
The standard and Galera Cluster database servers are the same, except that for Galera Cluster, the wsrep API patch is included. Enable the patch with the CMake configuration options. WITH_WSREP and WITH_INNODB_DISALLOW_WRITES. To build the database server, run the following commands:
There are also some build scripts in the *BUILD/* directory, which may be more convenient to use. For example, the following pre-configures the build options discussed above:
There are several others as well, so you can select the most convenient.
<>
Besides the server with the Galera support, you will also need a Galera provider.
Preparation
make cannot manage dependencies itself, so the following packages need to be installed first:
If running on an alternative system, or the commands are available, the following packages are required. You will need to check the repositories for the correct package names on your distribution - these may differ between distributions, or require additional packages:
Galera Replication Plugin
SCons, as well as development releases of Boost (libboost_program_options, libboost_headers1), Check and OpenSSL.
Building
Run:
After this, the source files for the Galera provider will be in the galera directory.
Building the Galera Provider
The Galera Replication Plugin both implements the wsrep API and operates as the database server's wsrep Provider. To build, cd into the galera/ directory and do:
The path to libgalera_smm.so needs to be defined in the my.cnf configuration file.
Building Galera Replication Plugin from source on FreeBSD runs into issues due to Linux dependencies. To overcome these, either install the binary package: pkg install galera, or use the ports build available at /usr/ports/databases/galera.
Configuration
After building, a number of other steps are necessary:
Create the database server user and group:
Install the database (the path may be different if you specified CMAKE_INSTALL_PREFIX):
If you want to install the database in a location other than /usr/local/mysql/data , use the --basedir or --datadir options.
Change the user and group permissions for the base directory.
Create a system unit for the database server.
Galera Cluster can now be started using the service command and is set to start at boot.
This page is licensed: CC BY-SA / Gnu FDL
First, before you get started:
First, take a look at to see what has changed between the major versions.
Check whether any system variables or options have been changed or removed. Make sure that your server's configuration is compatible with the new MariaDB version before upgrading.
Check whether replication has changed in the new MariaDB version in any way that could cause issues while the cluster contains upgraded and non-upgraded nodes.
Check whether any new features have been added to the new MariaDB version. If a new feature in the new MariaDB version cannot be replicated to the old MariaDB version, then do not use that feature until all cluster nodes have been upgrades to the new MariaDB version.
Next, make sure that the Galera version numbers are compatible.
If you are upgrading from the most recent release to , then the versions will be compatible.
See ?: s for information on which MariaDB releases uses which Galera wsrep provider versions.
You want to have a large enough gcache to avoid a during the rolling upgrade. The gcache size can be configured by setting For example:wsrep_provider_options="gcache.size=2G"
Before you upgrade, it would be best to take a backup of your database. This is always a good idea to do before an upgrade. We would recommend .
Then, for each node, perform the following steps:
1
Modify the repository configuration, so the system's package manager installs
see for more information.
see for more information.
see for more information.
2
If you use a load balancing proxy such as or HAProxy, make sure to drain the server from the pool so it does not receive any new connections.
3
.
4
Uninstall the old version of MariaDB and the Galera wsrep provider.
5
Install the new version of MariaDB and the Galera wsrep provider.
see for more information.
see for more information.
see for more information.
6
Make any desired changes to configuration options in , such as my.cnf. This includes removing any system variables or options that are no longer supported.
7
On Linux distributions that use systemd you may need to increase the service startup timeout as the default timeout of 90 seconds may not be sufficient. See for more information.
8
.
9
Run with the --skip-write-binlog option.
mysql_upgrade does two things:
Ensures that the system tables in the database are fully compatible with the new version.
When this process is done for one node, move onto the next node.
When upgrading the Galera wsrep provider, sometimes the Galera protocol version can change. The Galera wsrep provider should not start using the new protocol version until all cluster nodes have been upgraded to the new version, so this is not generally an issue during a rolling upgrade. However, this can cause issues if you restart a non-upgraded node in a cluster where the rest of the nodes have been upgraded.
If a node hasn't sent a packet within the evs.keepalive_period, other nodes begin sending heartbeat beacons to it.
If the node remains silent for the duration of evs.suspect_timeout, the other nodes will mark it as "suspect."
Once all members of the Primary Component agree that a node is suspect, it is declared inactive and .
Additionally, if no messages are received from a node for a period greater than , it is declared failed immediately, regardless of consensus.
Cluster Fault Tolerance
A safeguard mechanism ensures the cluster remains operational even if some nodes become unresponsive. If a node is active but overwhelmed—perhaps from excessive memory swapping—it will be labeled as failed. This process ensures that one struggling node doesn't disrupt the entire cluster's functionality.
The Availability vs. Partition Tolerance Trade-off
Within the context of the CAP Theorem (Consistency, Availability, Partition Tolerance), Galera Cluster strongly prioritizes Consistency. This leads to a direct trade-off when configuring the failure detection timeouts, especially on unstable networks like a Wide Area Network (WAN).
Low Timeouts
Setting low values for evs.suspect_timeout allows the cluster to detect a genuinely failed node very quickly, minimizing downtime. However, on an unstable network, this can lead to "false positives," where a temporarily slow node is incorrectly evicted.
High Timeouts
Setting higher values makes the cluster more tolerant of network partitions and slow nodes. However, if a node truly fails, the cluster will remain unavailable for a longer period while it waits for the timeout to expire.
Recovering a Single Failed Node
Recovery from a single node failure is typically automatic. If one node in a cluster with three or more members fails, the rest of the cluster maintains Quorum and continues to operate. When the failed node comes back online, it will automatically connect to the cluster and initiate a State Transfer to synchronize its data. No data is lost in a single node failure.
Recovering the Primary Component After a Full Cluster Outage
A full cluster outage occurs when all nodes shut down or when Quorum is lost completely, leaving no Primary Component. In this scenario, you must manually intervene to safely restart the cluster.
Manual Bootstrap (Using grastate.dat)
This is the traditional recovery method. You must manually identify the node with the most recent data and force it to become the first node in a new cluster.
Stop all nodes in the cluster.
Identify the most advanced node by checking the seqno value in the grastate.dat file in each node's data directory. The node with the highest seqno is the correct one to start from.
Bootstrap the new Primary Component by starting the MariaDB service on that single advanced node using a special command (e.g., galera_new_cluster).
Start the other nodes normally. They will connect to the new Primary Component and sync their data.
Automatic Recovery with pc.recovery
Modern versions of Galera Cluster enable the pc.recovery parameter by default. This feature attempts to automate the recovery of the Primary Component.
When pc.recovery is enabled, nodes that were part of the last known Primary Component will save the state of that component to a file on disk called gvwstate.dat. If the entire cluster goes down, it can automatically recover its state once all the nodes from that last saved component achieve connectivity with each other.
Understanding the gvwstate.dat file
The gvwstate.dat file is created in the data directory of a node when it is part of a Primary Component and is deleted upon graceful shutdown. It contains the node's own UUID and its view of the other members of the component. An example:
my_uuid: The UUID of the node that owns this file.
view_id: An identifier for the specific cluster view.
member: The UUIDs of all nodes that were part of this saved Primary Component.
Advanced Procedure: Modifying the Saved State
Avoid manually editing the gvwstate.dat file unless absolutely necessary. Doing so may cause data inconsistency or prevent the cluster from starting. This action should only be considered in critical recovery situations.
In the rare case that you need to force a specific set of nodes to form a new Primary Component, you can manually edit the gawtate.dat file on each of those nodes. By ensuring that each node's file lists itself and all other desired members in the member fields, you can force them to recognize each other and form a new component when you start them.
Failures During State Transfers
A node failure can also occur if a State Snapshot Transfer (SST) is interrupted. This will cause the receiving node (the "joiner") to abort its startup process. To recover, simply restart the MariaDB service on the failed joiner node.
Quorum is achieved when more than 50% of the total nodes in the last known membership are in communication.
Odd Number of Nodes (Recommended): In a 3-node cluster, a majority is 2. The cluster can tolerate the failure of 1 node and remain operational.
Even Number of Nodes: In a 2-node cluster, a majority is also 2. If one node fails, the remaining node represents only 50% of the cluster, which is not a majority, and it will lose Quorum. This is why a 2-node cluster has no fault tolerance without an external voting member.
The Galera Arbitrator (garbd)
The Galera Arbitrator (garbd) is the standard solution for clusters with an even number of nodes. It is a lightweight, stateless daemon that acts as a voting member in the cluster without being a full database node. It participates in Quorum calculations, effectively turning an even-numbered cluster into an odd-numbered one. In the diagram, in a 2-node cluster, adding garbd makes the total number of voting members 3, allowing the cluster to maintain Quorum if one database node fails.
Understanding and Recovering from a Split-Brain
A split-brain occurs when a network partition divides the cluster and no resulting group of nodes has a majority (e.g., a 4-node cluster splitting into two groups of 2). By design, both halves of the cluster will fail to achieve a majority, and all nodes will enter a non-Primary state.
If you need to restore service before the network issue is fixed, you must manually intervene:
Choose ONE side of the partition to become the new Primary Component.
On a single node within that chosen group, execute the following command:
The nodes in this group will now form a new Primary Component. When network connectivity is restored, the nodes from the other partition will automatically rejoin.
Never execute the bootstrap command on both sides of a partition, as this will create two independent, active clusters with diverging data.
Advanced Quorum Control
As a more advanced alternative to garbd for fine-grained control, nodes can also be assigned a specific voting weight.
You can check the health of the cluster and its Quorum status at any time by querying the following status variables.
Variable
Description
Healthy Value
Status of the component the node belongs to.
Primary
Number of nodes in the current component.
Matches expected total
Unique identifier for the cluster's state.
Same on all nodes
Identifier for the cluster membership group.
Recovering from a Full Cluster Shutdown
If the entire cluster loses Quorum (e.g., from a simultaneous crash or shutdown), you must manually bootstrap a new Primary Component to restore service. This must be done from the node that contains the most recent data to avoid any data loss.
Identifying the Most Advanced Node
MariaDB Galera Cluster provides a safe_to_bootstrap flag in the /var/lib/mysql/grastate.dat file to make this process safer and easier.
After a Graceful Shutdown
The last node to shut down will be the most up-to-date and will have safe_to_bootstrap: 1 set in its grastate.dat file. You should always look for and bootstrap from this node.
After a Cluster-wide Crash
If all nodes crashed, they will all likely have safe_to_bootstrap: 0. In this case, you must manually determine the most advanced node by finding the one with the highest seqno in its grastate.dat file or by using the --wsrep-recover utility.
Bootstrapping and Restarting
Once you have identified the correct node, you will start the MariaDB service on that node only using a special bootstrap command (e.g., galera_new_cluster). After it comes online and forms a new Primary Component, you can start the other nodes normally, and they will rejoin the cluster.
If the node is a replication master, then its replication slaves only replicate transactions that are in the binary log, so this means that the transactions that correspond to Galera Cluster write-sets would not be replicated by any replication slaves by default. If you would like a node to write its replicated write sets to the , then you will have to set . If the node has any replication slaves, then this would also allow those slaves to replicate the transactions that corresponded to those write sets.
If a Galera Cluster node is also a , then some additional configuration may be needed.
If the node is a replication slave, then the node's will be applying transactions that it replicates from its replication master. Transactions applied by the slave SQL thread will only generate Galera Cluster write-sets if the node has set. Therefore, in order to replicate these transactions to the rest of the nodes in the cluster, must be set.
If the node is a replication slave, then it is probably also a good idea to enable wsrep_restart_slave. When this is enabled, the node will restart its whenever it rejoins the cluster.
Parallel Replication Support
Historically, Galera Cluster nodes acting as asynchronous replication slaves were restricted to single-threaded execution (slave_parallel_threads=0). Enabling parallel replication often resulted in deadlocks due to conflicts between ordering and Galera's internal pre-commit ordering.
As of MariaDB 12.1.1, this limitation has been resolved.
This fix is specific to MariaDB 12.1.1 and newer versions. It has not been backported to earlier release series such as 10.5, 10.6, 10.11, or 11.4.
On supported versions, you can safely configure slave_parallel_threads to a value greater than 0 to improve the performance of incoming replication streams.
It is most common to set to the same value on each node in a given cluster. Since MariaDB Galera Cluster uses a virtually synchronous certification-based replication, all nodes should have the same data, so in a logical sense, a cluster can be considered in many cases a single logical server for purposes related to . The of each cluster node might even contain roughly the same transactions and if log_slave_updates=ON is set and if wsrep GTID mode is enabled and if non-Galera transactions are not being executed on any nodes.
Setting a Different server_id on Each Cluster Node
There are cases when it might make sense to set a different value on each node in a given cluster. For example, if is set and if another cluster or a standard MariaDB Server is using to replicate transactions from each cluster node individually, then it would be required to set a different value on each node for this to work.
Keep in mind that if replication is set up in a scenario where each cluster node has a different value, and if the replication topology is set up in such a way that a cluster node can replicate the same transactions through Galera and through MariaDB replication, then you may need to configure the cluster node to ignore these transactions when setting up MariaDB replication. You can do so by setting to the server IDs of all nodes in the same cluster when executing . For example, this might be required when circular replication is set up between two separate clusters, and each cluster node has a different value, and each cluster has set.
Ensure the donor and joiner nodes have the same mariadb-backup version.
2
Create backup directory on donor
3
Take backup
Take a full backup the of the donor node with mariadb-backup. The --galera-info option should also be provided, so that the node's cluster state is also backed up.
4
MariaDB Server process running
Verify that the MariaDB Server process is stopped on the joiner node. This will depend on your .
For example, on systems, you can execute::
5
Create the backup directory on the joiner node.
6
Copy backup
Copy the backup from the donor node to the joiner node.
7
Prepare backup
on the joiner node.
8
Get the ID
Get the Galera Cluster version ID from the donor node's grastate.dat file.
For example, a very common version number is "2.1".
1
Get the node's cluster state
Get the state from the Galera info file in the backup that was copied to the joiner node.
The name of this file depends on the MariaDB version:
MariaDB 11.4 and later: mariadb_backup_galera_info
MariaDB 11.3 and earlier: xtrabackup_galera_info
For MariaDB 11.4 and later:
For MariaDB 11.3 and earlier:
The file contains the values of the and status variables. The values are written in the following format:
For example:
2
Create a grastate.dat file
Create the file in the backup directory of the joiner node. The Galera Cluster version ID, the cluster uuid, and the seqno from previous steps will be used to fill in the relevant fields.
For example, with the example values from the last two steps, we could do:
3
Remove contents
Remove the existing contents of the on the joiner node.
4
Copy contents
Copy the contents of the backup directory to the the on joiner node.
5
Check datadir permissions
Make sure the permissions of the are correct on the joiner node.
6
Start the MariaDB Server process on the joiner node.
This will depend on your . For example, on systems, you may execute::
7
Watch the MariaDB
On the joiner node, verify that the node does not need to perform a due to the manual SST.
If none of your nodes return a value of Primary, you must manually intervene to reset the Quorum and bootstrap a new Primary Component.
Find the Most Advanced Node
Before you can reset the Quorum, you must identify the most advanced node in the cluster. This is the node whose local database committed the last transaction. Starting the cluster from any other node can result in data loss.
The "Safe-to-Bootstrap" Feature
To facilitate a safe restart and prevent an administrator from choosing the wrong node, modern versions of Galera Cluster include a "Safe-to-Bootstrap" feature.
When a cluster is shut down gracefully, the last node to be stopped will be the most up-to-date. Galera tracks this and marks only that last node as safe to bootstrap from by setting a flag in its state file. If you attempt to bootstrap from a node marked as unsafe, Galera will refuse and show a message in the logs. In the case of a sudden, simultaneous crash, all nodes will be considered unsafe, requiring manual intervention.
Procedure for Selecting the Right Node
The procedure to select the right node depends on how the cluster was stopped.
Orderly Cluster Shutdown
In the case of a planned, orderly shutdown, you only need to follow the recommendation of the "Safe-to-Bootstrap" feature. On each node, inspect the /var/lib/mysql/grastate.datfile and look for the one where safe_to_bootstrap: 1 is set.
Use this node for the bootstrap.
Full Cluster Crash
In the case of a hard crash, all nodes will likely have safe_to_bootstrap: 0. You must therefore manually determine which node is the most advanced.
On each node, run the mysqld daemon with the --wsrep-recover option. This will read the InnoDB storage engine logs and report the last known transaction position in the MariaDB error log.
Inspect the error log for a line similar to this:
Compare the sequence number (the number after the colon) from all nodes. The node with the highest sequence number is the most advanced.
On that most advanced node, you can optionally edit the /var/lib/mysql/grastate.dat file and set safe_to_bootstrap: 1 to signify that you have willfully chosen this node.
Bootstrap the New Primary Component
Once you have identified the most advanced node, there are two methods to bootstrap the new Primary Component from it.
Automatic Bootstrap (Recommended)
This method is recommended if the mysqld process is still running on the most advanced node. It is non-destructive and can preserve the GCache, increasing the chance of a fast Incremental State Transfer (IST) for the other nodes.
To perform an automatic bootstrap, connect to the most advanced node with a and execute:
This node will now form a new Primary Component by itself.
Manual Bootstrap
This method involves a full shutdown and a special startup of the most advanced node.
Ensure the mysqld service is stopped on all nodes in the cluster.
On the most advanced node only, start the cluster using the :
Start the Remaining Nodes
After the first node is successfully running and has formed the new Primary Component, start the MariaDB service normally on all of the other nodes.
They will detect the existing Primary Component, connect to it, and automatically initiate a State Transfer to synchronize their data and rejoin the cluster.
The foundation of the cluster is the standard MariaDB Server, typically using the InnoDB storage engine. This component serves clients that connect to it and executes queries as a normal database server would.
The wsrep API
The wsrep API is a generic replication plugin interface for databases. It defines a set of application callbacks and replication plugin calls that connect the MariaDB Server to the replication provider. It consists of two main elements:
wsrep Hooks: These hooks integrate with the database server engine to enable write-set replication.
dlopen(): This function is used to make the wsrep_provider (the Galera Plugin) available to the wsrep hooks.
In this model, the wsrep API considers the database to have a "state." When clients modify the database content, its state changes. The API represents these changes as a series of atomic transactions. In a healthy cluster, all nodes always have the same state because they synchronize by replicating and applying these state changes in the same serial order.
From a technical perspective, the process flow is as follows:
A state change (transaction) occurs on one node in the cluster.
Within the MariaDB Server, the wsrep hooks translate these changes into a write-set.
The dlopen() function makes the wsrep provider's functions available to the hooks.
The Galera Replication plugin handles the and replication of the write-set to the rest of the cluster.
On each node in the cluster, the write-set is applied as a high-priority transaction.
Galera Replication Plugin
The Galera Replication Plugin implements the wsrep API and acts as the wsrep Provider. It handles the core replication service functionality. The plugin itself consists of the following components:
Layer
Description
Certification Layer
Prepares write-sets and performs certification checks to ensure they can be applied without conflict.
Replication Layer
Manages the replication protocol and provides total ordering capability for transactions.
Group Communication Framework
Provides the plugin architecture for the various group communication systems that connect to Galera Cluster.
Group Communication (GComm) Framework
The Galera Replication Plugin uses a Group Communication (GComm) framework for its messaging layer. The GComm system implements a virtual synchrony Quality of Service (QoS), which unifies data delivery and cluster membership services into a clear, formal model.
While virtual synchrony guarantees data consistency, it does not guarantee the temporal synchrony needed for smooth multi-primary operations. To address this, Galera Cluster implements its own runtime-configurable Flow Control, which keeps nodes synchronized to within a fraction of a second.
The GComm framework also provides a total ordering of messages from multiple sources, which it uses to generate Global Transaction IDs in a multi-primary cluster.
Fundamental Concepts
Global Transaction ID (GTID)
To keep the database state identical across all nodes, the wsrep API uses a Global Transaction ID (GTID). This allows the cluster to uniquely identify every state change and to know the current state of any node in relation to others. An example GTID looks like this:
45eec521-2f34-11e0-0800-2a36050b826b:94530586304
It consists of two components:
State UUID: A unique identifier for the database state and its sequence of changes.
Ordinal Sequence Number (seqno): A 64-bit signed integer that denotes the position of the transaction in the sequence.
Read and Write Scaling
A direct benefit of Galera's multi-master architecture is the ability to scale both read and write operations.
Write Scaling: Because every node in the cluster can accept write operations, you can distribute your application's write traffic across multiple nodes. This can increase write throughput, though it's important to remember that all writes must still be replicated and certified on all nodes, which can introduce contention on high-velocity workloads.
Read Scaling: This is the most significant performance advantage. Since all nodes are kept synchronized, they all contain the same data. This allows you to distribute read queries across all nodes in the cluster, providing excellent horizontal scaling for read-heavy applications. This architecture is ideal for use with a load balancer (like MariaDB MaxScale) that can perform read-write splitting.
This scaling is typically managed by a load balancer, which distributes traffic intelligently across the cluster.
MariaDB has a feature called wsrep GTID mode. When this mode is enabled, MariaDB uses some tricks to try to associate each Galera Cluster write set with a that is globally unique, but that is also consistent for that write set on each cluster node. These tricks work in some cases, but can still become inconsistent among cluster nodes.
Enabling Wsrep GTID Mode
Several things need to be configured for wsrep GTID mode to work, such as
wsrep_gtid_domain_id needs to be set to the same value on all nodes in a given cluster, so that each cluster node uses the same domain when assigning for Galera Cluster's write sets. When replicating between two clusters, each cluster should have this set to a different value, so that each cluster uses different domains when assigning for their write sets.
needs to be enabled on all nodes in the cluster. See MDEV-9855.
needs to be set to the same path on all nodes in the cluster. See .
And as an extra safety measure:
should be set to a different value on all nodes in a given cluster, and each of these values should be different than the configured wsrep_gtid_domain_id value. This is to prevent a node from using the same domain used for Galera Cluster's write sets when assigning for non-Galera transactions, such as DDL executed with wsrep_sst_method=RSU set or DML executed with wsrep_on=OFF set.
If you want to avoid writes accidentally local GTIDS, you can avoid it with by setting this:
If a Galera Cluster node is also a , then that node's will be applying transactions that it replicates from its replication master. If the node has set, then each transaction that the applies will also generate a Galera Cluster write set that is replicated to the rest of the nodes in the cluster.
The node acting as slave includes the transaction's original Gtid_Log_Event in the replicated write set, so all nodes should associate the write set with its original GTID. See MDEV-13431.
, which implicitly modify the mysql. tables — those are replicated). There is, however, experimental support for
- see the
system variable)
Unsupported explicit locking include , , (, ,…). Using transactions properly should be able to overcome these limitations. Global locking operators like are supported.
All tables should have a primary key (multi-column primary keys are supported). operations are unsupported on tables without a primary key. Also, rows in tables without a primary key may appear in a different order on different nodes.
The and the cannot be directed to a table. If you enable these logs, then you must forward the log to a file by setting .
Transaction size. While Galera does not explicitly limit the transaction size, a write set is processed as a single memory-resident buffer, and as a result, extremely large transactions (e.g. ) may adversely affect node performance. To avoid that, the wsrep_max_ws_rows and wsrep_max_ws_size system variables limit transaction rows to 128K and the transaction size to 2Gb by default. If necessary, users may want to increase those limits. Future versions will add support for transaction fragmentation.
Other observations, in no particular order:
If you are using for state transfer, and it fails for whatever reason (e.g., you do not have the database account it attempts to connect with, or it does not have the necessary permissions), you will see an SQL SYNTAX error in the server . Don't let it fool you, this is just a fancy
way to deliver a message (the pseudo-statement inside the bogus SQL will contain the error message).
Do not use transactions of any essential size. Just to insert 100K rows, the server might require an additional 200-300 Mb. In a less fortunate scenario, it can be 1.5 GB for 500K rows, or 3.5 GB for 1M rows. See MDEV-466 for some numbers (you'll see that it's closed, but it's not closed because it was fixed).
Locking is lax when DDL is involved. For example, if your DML transaction uses a table, and a parallel DDL statement is started, in the normal MySQL setup, it would have waited for the metadata lock, but in the Galera context, it will be executed right away. It happens even if you are running a single node, as long as you have configured it as a cluster node. See also . This behavior might cause various side effects; the consequences have not been investigated yet. Try to avoid such parallelism.
Do not rely on auto-increment values to be sequential. Galera uses a mechanism based on autoincrement increment to produce unique non-conflicting sequences, so on every single node, the sequence will have gaps. See
A command may fail with ER_UNKNOWN_COM_ERROR, producing 'WSREP has not yet prepared node for application use' (or 'Unknown command' in older versions) error message. It happens when a cluster is suspected to be split, and the node is in a smaller part, for example, during a network glitch, when nodes temporarily lose each other. It can also occur during state transfer. The node takes this measure to prevent data inconsistency. It's usually a temporary state; it can be detected by checking the value. The node, however, allows the SHOW and SET commands during this period.
After a temporary split, if the 'good' part of the cluster is still reachable and its state was modified, resynchronization occurs. As a part of it, nodes of the 'bad' part of the cluster drop all client connections. It might be quite unexpected, especially if the client was idle and did not even know anything was happening. Please also note that after the connection to the isolated node is restored, if there is a flow on the node, it takes a long time for it to synchronize, during which the "good" node says that the cluster is already of the normal size and synced, while the rejoining node says it's only joined (but not synced). The connections keep getting 'unknown command'. It should pass eventually.
While is checked on startup and can only be ROW (see ), it can be changed at runtime. Do NOT change at runtime, it is likely to cause replication failure, but make all other nodes crash.
If you are using rsync for state transfer, and a node crashes before the state transfer is over, the rsync process might hang forever, occupying the port and preventing the node. The problem will show up as 'port in use' in the server error log. Find the orphaned rsync process and kill it
manually.
Performance: By design performance of the cluster cannot be higher than the performance of the slowest node; however, even if you have only one node, its performance can be considerably lower compared to running the same server in a standalone mode (without wsrep provider). It is particularly true for big enough transactions (even those which are well within current limitations on transaction size quoted above).
Windows is not supported.
Replication filters: When using a Galera cluster, replication filters should be used with caution. See for more details. See also and .
Flashback isn't supported in Galera due to an incompatible binary log format.
FLUSH PRIVILEGES is not replicated.
The needed to be disabled by setting prior to MariaDB Galera Cluster 5.5.40, MariaDB Galera Cluster 10.0.14, and
In an asynchronous replication setup where a master replicates to a Galera node acting as a slave, parallel replication (slave-parallel-threads > 1) on the slave is currently not supported (see ).
The disk-based is not encrypted ().
Nodes may have different table definitions, especially temporarily during operations, but the same apply as they do for row-based replication
The wsrep_sst_mariabackup script handles the actual data transfer and processing during an SST. The variables it reads from the [sst] group control aspects of the backup format, compression, transfer mechanism, and logging.
The wsrep_sst_mariadbbackup script parses the following options:
streamfmt (sfmt)
Default: mbstream
Description: Defines the streaming format used by mariabackup
transferfmt (tfmt)
Default: socat
Description: Specifies the transfer format or utility used to move the data stream from the donor to the joiner node. socat
sockopt (socket options)
Description: Allows additional socket options to be passed to the underlying network communication. This includes settings for TCP buffers, keep-alives, or other network-related tunables to optimize transfer performance.
progress
Description: Controls whether progress information about the SST is displayed or logged. Enabling this option provides visual indicators or detailed log entries about the transfer's advancement.
time (ttime)
Default: 0
Description: When set to 1, logs the time spent on specific operations during the SST process to help with performance analysis. Specifies a space-separated list of extra files or directories to copy from the donor to the joiner node during the SST.
cpat
Description: Related to a "copy pattern" or specific path handling during the SST. Its function depends on how the wsrep_sst_mariabackup script uses this pattern for file or directory management.
compressor (scomp)
Description: Specifies the compression utility to be used on the data stream before transfer. Common values include gzip, pigz, lz4, or qpress, which reduce the data size for faster transmission over the network.
decompressor (sdecomp)
Description: Specifies the decompression utility to be used on the receiving end (joiner node) to decompress the data stream that was compressed by scomp. It should correspond to the scomp setting.
rlimit (resource limit)
Description: Sets a maximum data transfer rate for State Snapshot Transfers (SST) in which the node serves as a donor. Therlimitparameter accepts any value supported by thepvutility’s--rate-limitoption. Note that using this option requires
use-extra (uextra)
Default: 0
Description: Controls the SST connection method that mariabackup
sst-special-dirs (speciald)
Default: 1
Description: A boolean flag that controls whether mariabackup
sst-initial-timeout (stimeout)
Default: 300
Description: Sets an initial timeout in seconds for the SST process. If the SST operation does not establish a connection or make progress within this period, it will be aborted.
sst-syslog (ssyslog)
Default: 0
Description: A boolean flag (0 or 1) that controls whether SST-related messages should be logged to syslog. This can be useful for centralized logging and monitoring of Galera cluster events.
sst-log-archive (sstlogarchive)
Default: 1
Description: A boolean flag (0 or 1) that determines whether SST logs should be archived. Archiving logs helps in post-mortem analysis and troubleshooting of SST failures.
sst-log-archive-dir (sstlogarchivedir)
Description: Specifies the directory where SST logs should be archived if sstlogarchive is enabled.
Example
Set in Configuration File
To configure wsrep_sst_mariabackup options, add them to the [sst] group in your configuration file:
Using Streaming Replication for Large Transactions
Streaming Replication optimizes replication of large or long-running transactions in MariaDB Galera Cluster. Typically, a node executes a transaction fully and replicates the complete write-set to other nodes at time. Although efficient for most workloads, this approach can be challenging for very large or lengthy transactions.
With Streaming Replication, the initiating node divides the transaction into smaller fragments. These fragments are certified and replicated to other nodes while the transaction is ongoing. Once a fragment is certified and applied to the replicas, it becomes immune to abortion by conflicting transactions, thus improving the chances of the entire transaction succeeding. This method also supports processing of transaction write-sets over two Gigabytes.
Streaming Replication is available in Galera Cluster 4.0 and later versions. Both and newer, and and newer, on supported platforms, include Galera 4.
When to Use Streaming Replication
In most cases, the standard replication method is sufficient. Streaming Replication is a specialized tool for specific scenarios. The best practice is to enable it only at the session level for the specific transactions that require it.
Large Data Transactions
This is the primary use case. When performing a massive , , or , normal replication requires the originating node to hold the entire transaction locally and then send a very large write-set at commit time. This can cause two problems:
A significant replication lag, as the entire cluster waits for the large write-set to be transferred and applied.
The replica nodes, while busy applying the large transaction, cannot commit other transactions, which can trigger and throttle the entire cluster.
With Streaming Replication, the node replicates the data in fragments throughout the transaction's lifetime. This spreads the network load and allows replica nodes to apply other concurrent transactions between fragments, minimizing the impact on the overall
Long-Running Transactions
A transaction that remains open for a long time has a higher chance of that commits first. When this happens, the long-running transaction is aborted.
Streaming Replication mitigates this by committing the transaction in fragments. Once a fragment is , it is "locked in" and cannot be aborted by a new conflicting transaction.
Certification keys derive from record locks, not gap locks. If a streaming transaction holds a gap lock, another node's transaction can still apply a in that gap, potentially aborting the streaming transaction.
High-Contention ("Hot") Records
For applications that frequently update the same row (e.g., a counter, a job queue, or a locking scheme), Streaming Replication can be used to force a critical update to replicate immediately. This effectively locks the , preventing other transactions from modifying it and increasing the chance that the critical transaction will commit successfully.
How to Enable and Use Streaming Replication
Streaming Replication should be enabled at the session level just for the transactions that need it. This is controlled by two session variables:
defines what a "unit" of replication is.
defines how many units make up a fragment.
To enable streaming, you set both variables:
In the above example, the node will create, certify, and replicate a fragment after every 10 SQL statements within the transaction.
The available fragment units for wsrep_trx_fragment_unit are:
Parameter
Description
To disable Streaming Replication, you can set wsrep_trx_fragment_size back to 0.
Managing a "Hot Record"
Consider an application that manages a work order queue. To prevent two users from getting the same queue position, you can use Streaming Replication for the single critical update.
Begin the transaction:
After reading necessary data, enable Streaming Replication for just the next statement:
Perform the critical update. This statement will be immediately fragmented and replicated:
This ensures the queue_position update is replicated and certified across the cluster before the rest of the transaction proceeds, preventing .
Limitations and Performance Considerations
Before using Streaming Replication, consider the following limitations:
Performance Overhead
When Streaming Replication is enabled, Galera records all write-sets to a log table () on every node to ensure persistence in case of a crash. This adds write overhead and can impact performance, which is why it should only be used when necessary.
Cost of Rollbacks
If a streaming transaction needs to be rolled back after some fragments have already been applied, the rollback operation consumes system resources on all nodes as they undo the previously applied fragments. Frequent rollbacks of streaming transactions can become a performance problem.
For these reasons, it is always a good application design policy to use shorter, smaller transactions whenever possible.
This page is licensed: CC BY-SA / Gnu FDL
Configuring MariaDB Galera Cluster
A number of options need to be set in order for Galera Cluster to work when using MariaDB. These should be set in the MariaDB option file.
Mandatory Options
Several options are mandatory, which means that they must be set in order for Galera Cluster to be enabled or to work properly with MariaDB. The mandatory options are:
— Path to the Galera library
— See
— See
— Enable wsrep replication
— This is the default value, or alternately (before MariaDB 10.6) or (MariaDB 10.6 and later).
— This is the default value, and should not be changed.
Performance-related Options
These are optional optimizations that can be made to improve performance.
— This is not usually recommended in the case of standard MariaDB. However, it is a safer, recommended option with Galera Cluster, since inconsistencies can always be fixed by recovering from another node.
— This tells InnoDB to use interleaved method. Interleaved is the fastest and most scalable lock mode, and should be used when BINLOG_FORMAT is set to ROW.
Setting the auto-increment lock mode for InnoDB to interleaved, you’re allowing slaves threads to operate in parallel.
— This makes state transfers quicker for new nodes. You should start with four slave threads per CPU core.
The logic here is that, in a balanced system, four slave threads can typically saturate a CPU core. However, I/O performance can increase this figure several times over. For example, a single-core ThinkPad R51 with a 4200 RPM drive can use thirty-two slave threads. The value should not be set higher than
Writing Replicated Write Sets to the Binary Log
Like with , write sets that are received by a node with are not written to the by default. If you would like a node to write its replicated write sets to the , then you will have to set . This is especially helpful if the node is a replication master. See .
Replication Filters
Like with , can be used to filter write sets from being replicated by . However, they should be used with caution because they may not work as you'd expect.
The following replication filters are honored for DML, but not DDL:
The following replication filters are honored for DML and DDL for tables that use both the and storage engines:
However, it should be kept in mind that if replication filters cause inconsistencies that lead to replication errors, then nodes may abort.
See also and .
Network Ports
Galera Cluster needs access to the following ports:
Standard MariaDB Port (default: 3306) - For MySQL client connections and that use the mysqldump method. This can be changed by setting .
Galera Replication Port (default: 4567) - For Galera Cluster replication traffic, multicast replication uses both UDP transport and TCP on this port. Can be changed by setting .
If you want to run multiple Galera Cluster instances on one server, then you can do so by starting each instance with , or if you are using , then you can use the relevant .
You need to ensure that each instance is configured with a different .
You also need to ensure that each instance is configured with different .
This page is licensed: CC BY-SA / Gnu FDL
Flow Control in Galera Cluster
Flow Control is a key feature in MariaDB Galera Cluster that ensures nodes remain synchronized. In synchronous replication, no node should lag significantly in processing transactions.
Picture the cluster as an assembly line; if one worker slows down, the whole line must adjust to prevent a breakdown.
Flow Control manages this by aligning all nodes' replication processes:
Preventing Memory Overflow
Without Flow Control, a slow node's replication queue can grow unchecked, consuming all server memory and potentially crashing the MariaDB process due to an Out-Of-Memory (OOM) error.
Maintaining Synchronization
It maintains synchronization across the cluster, ensuring all nodes have nearly identical database states at all times.
Flow Control Sequence
The Flow Control process is an automatic feedback loop triggered by the state of a node's replication queue.
Queue Growth: A node (the "slow node") begins receiving from its peers faster than it can apply them. This causes its local receive queue, measured by the wsrep_local_recv_queue , to grow.
Upper Limit Trigger: When the receive queue size exceeds the configured upper limit, defined by the gcs.fc_limit , the slow node triggers Flow Control.
Monitoring Flow Control
As an administrator, observing Flow Control is a key indicator of a performance bottleneck in your cluster. You can monitor it using the following global :
Variable Name
Description
Troubleshooting Flow Control Issues
If you observe frequent Flow Control pauses, it is essential to identify and address the underlying cause.
Key Configuration Parameters
These my.cnf control the sensitivity of Flow Control:
Parameter
Description
Default Value
Modifying these values is an advanced tuning step. In most cases, it is better to fix the underlying cause of the bottleneck rather than relaxing the Flow Control limits.
Common Causes and Solutions
Cause
Description
Solution
This page is licensed: CC BY-SA / Gnu FDL
Upgrading from MariaDB 10.3 to MariaDB 10.4 with Galera Cluster
MariaDB starting with
Since , the MySQL-wsrep patch has been merged into MariaDB Server. Therefore, in and above, the functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the Galera wsrep provider library package.
Beginning in , Galera Cluster ships with the MariaDB Server. Upgrading a Galera Cluster node is very similar to upgrading a server from to . For more information on that process as well as incompatibilities between versions, see the .
Performing a Rolling Upgrade
The following steps can be used to perform a rolling upgrade from to when using Galera Cluster. In a rolling upgrade, each node is upgraded individually, so the cluster is always operational. There is no downtime from the application's perspective.
First, before you get started:
First, take a look at to see what has changed between the major versions.
Check whether any system variables or options have been changed or removed. Make sure that your server's configuration is compatible with the new MariaDB version before upgrading.
Check whether replication has changed in the new MariaDB version in any way that could cause issues while the cluster contains upgraded and non-upgraded nodes.
Before you upgrade, it would be best to take a backup of your database. This is always a good idea to do before an upgrade. We would recommend .
Then, for each node, perform the following steps:
1
Modify the repository configuration, so the system's package manager installs .
see for more information.
see for more information.
see for more information.
When this process is done for one node, move onto the next node.
When upgrading the Galera wsrep provider, sometimes the Galera protocol version can change. The Galera wsrep provider should not start using the new protocol version until all cluster nodes have been upgraded to the new version, so this is not generally an issue during a rolling upgrade. However, this can cause issues if you restart a non-upgraded node in a cluster where the rest of the nodes have been upgraded.
This page is licensed: CC BY-SA / Gnu FDL
Galera Cluster Deployment Variants
MariaDB Galera Cluster is flexible and can be deployed in several different topologies to meet various business needs, from high availability within a single data center to geographically distributed disaster recovery. The primary deployment patterns are designed for Local Area Networks (LAN) and Wide Area Networks (WAN).
Standard LAN Cluster (Single Data Center)
This is the most common deployment pattern for achieving high availability and read scaling within a single data center.
Topology
The cluster consists of an (typically 3 or 5) located in the same data center, connected by a high-speed, low-latency network.
Purpose
Purpose
Description
Key Consideration
While this provides excellent protection against server failure, the entire cluster is vulnerable to a full data center outage.
Wide Area Network (WAN) Cluster (Multi-Data Center)
This pattern is designed for and providing lower latency for a geographically distributed user base.
Topology
The cluster nodes are distributed across two or more physical locations (data centers), connected by a WAN link. To maintain , it is essential to have an odd number of nodes and an odd number of locations. A typical setup involves three data centers with one or more nodes in each.
Purpose:
Aspect
Description
Key Considerations:
Consideration
Description
Two-Node Cluster with a Galera Arbitrator
This is a special deployment variant used to achieve high availability with only two full data nodes.
Topology
The cluster consists of two MariaDB Galera nodes and one lightweight process, which runs on a third, separate machine.
Purpose:
Cost-Effective High Availability: It provides automatic failover for a two-node cluster without the resource cost of a third full database server.
How it Works
The Galera Arbitrator acts as a voting member for quorum calculations but does not store any data or handle any client traffic. This creates a 3-member cluster from a voting perspective. If one of the data nodes fails, the remaining data node and the arbitrator still form a majority (2 out of 3), allowing the cluster to maintain quorum and stay online.
Key Consideration
If the node running the arbitrator fails, the cluster reverts to a standard 2-node setup with no automatic failover capability. Therefore, the arbitrator itself should be .
This page is licensed: CC BY-SA / Gnu FDL
Managing Sequences in Galera Cluster
allows for the generation of unique integers independent of any specific table. While standard sequences function normally in a standalone MariaDB server, using them in a MariaDB Galera Cluster requires specific configurations to ensure conflict-free operation and optimal performance.
Streaming Replication Support in MariaDB
Starting from MariaDB 10.11.16 (and Galera 26.4.16), sequences are fully supported in transactions utilizing streaming replication. In earlier versions, using NEXTVAL() within a transaction where wsrep_trx_fragment_size > 0 would cause an ERROR 1235. The WSREP API now ensures proper serialization of sequence state in transaction fragments, allowing sequences to be used effectively in large-scale ETL and batch operations. See MDEV-34124
Configuring Sequences for Galera
Because Galera is a multi-primary system, multiple nodes may attempt to generate sequence values simultaneously. To prevent duplicate values and certification failures, the cluster utilizes an offset-based generation strategy.
Mandatory: INCREMENT BY 0
For a sequence to function correctly in a multi-node environment, it must be defined with INCREMENT BY 0.
This setting instructs the node to ignore the sequence's internal increment logic. Instead, the node applies the cluster-wide wsrep_auto_increment_control logic using the following formula:
Where:
Node_Offset: The unique identifier for the specific node (e.g., 1, 2, or 3).
Cluster_Size: The total number of nodes in the cluster.
N: The iteration count (0, 1, 2, ...).
Visualizing the Offset Logic
The following diagram illustrates how two nodes in a 3-node cluster generate unique IDs simultaneously without communicating or locking.
This ensures that nodes generate interleaved, non-conflicting IDs preventing Certification Failures (Error 1213) without requiring network locks.
Cache Configuration Strategies
The CACHE option is the primary lever for balancing performance against data continuity. In Galera, replication introduces a "Flush-on-Sync" behavior: when any node commits a sequence update, other nodes must sync their state, discarding any unused values in their local cache.
Cache Setting
Usage Scenario
Description
Use Case: Active-Active Ticket Reservation System
A common requirement for Galera is a distributed system (e.g., ticket sales) where users in different regions must be able to book items simultaneously without "race conditions" or duplicate Booking IDs.
In this example, we configure a sequence to allow high-speed concurrent bookings from multiple nodes.
Troubleshooting
Error
Cause
Resolution
This page is licensed: CC BY-SA / Gnu FDL
Upgrading from MariaDB 10.6 to MariaDB 10.11 with Galera Cluster
Galera Cluster ships with the MariaDB Server. Upgrading a Galera Cluster node is very similar to upgrading a server from to . For more information on that process as well as incompatibilities between versions, see the .
Methods
Stopping all nodes, upgrading all nodes, then starting the nodes
Rolling upgrade with IST (however, see )
Note that rolling upgrade with SST does not work
Performing a Rolling Upgrade
The following steps can be used to perform a rolling upgrade from to when using Galera Cluster. In a rolling upgrade, each node is upgraded individually, so the cluster is always operational. There is no downtime from the application's perspective.
First, before you get started:
First, take a look at to see what has changed between the major versions.
Check whether any system variables or options have been changed or removed. Make sure that your server's configuration is compatible with the new MariaDB version before upgrading.
Check whether replication has changed in the new MariaDB version in any way that could cause issues while the cluster contains upgraded and non-upgraded nodes.
Before you upgrade, it would be best to take a backup of your database. This is always a good idea to do before an upgrade. We would recommend .
Then, for each node, perform the following steps:
1
Modify the repository configuration, so the system's package manager installs
see for more information.
see for more information.
see for more information.
When this process is done for one node, move onto the next node.
When upgrading the Galera wsrep provider, sometimes the Galera protocol version can change. The Galera wsrep provider should not start using the new protocol version until all cluster nodes have been upgraded to the new version, so this is not generally an issue during a rolling upgrade. However, this can cause issues if you restart a non-upgraded node in a cluster where the rest of the nodes have been upgraded.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB Galera Cluster Guide
MariaDB Galera Cluster quickstart guide
Quickstart Guide: MariaDB Galera Cluster
MariaDB Galera Cluster provides a multi-primary (active-active) cluster solution for MariaDB, enabling high availability, read/write scalability, and true synchronous replication. This means any node can handle read and write operations, with changes instantly replicated to all other nodes, ensuring no replica lag and no lost transactions. It's exclusively available on Linux.
Using the Notification Command (wsrep_notify_cmd)
MariaDB Galera Cluster provides a powerful automation feature through the wsrep_notify_cmd . When this variable is configured, the MariaDB server will automatically execute a specified command or script in response to changes in the cluster's membership or the local node's state.
This is extremely useful for integrating the cluster with external systems:
System
Description
Rapid Node Recovery with IST and the GCache
This page provides a deep-dive into Incremental State Transfer (IST), a method for a node to synchronize with the cluster. For information on a fallback mechanism, see .
If one data center experiences a complete outage, the cluster can maintain quorum and continue to operate from the remaining locations.
Reduced Latency
Client applications can be directed to the topologically closest node, reducing network latency for read operations.
Network Latency
WAN replication is sensitive to network latency. The round-trip time between the most distant nodes will set the baseline for transaction commit latency. High latency affects writes.
Network Stability
The WAN link must be stable and reliable. Frequent network partitions can lead to nodes being evicted from the cluster, impacting stability.
Segments
Nodes within the same location can be configured into a segment using the gmcast.segmentparameter, optimizing communication by using the fast local network for replication.
Pause Message: The node broadcasts a "Flow Control PAUSE" message to all other nodes in the cluster.
Replication Pauses: Upon receiving this message, all nodes in the cluster temporarily stop replicating newtransactions. They continue to process any transactions already in their queues.
Queue Clears: The slow node now has a chance to catch up and apply the transactions from its backlog without new ones arriving.
Lower Limit Trigger: When the node's receive queue size drops below a lower threshold (defined as gcs.fc_limit * gcs.fc_factor), the node broadcasts a "Flow Control RESUME" message.
Replication Resumes: The entire cluster resumes normal replication.
Indicates the fraction of time since the last FLUSH STATUS that the node has been paused by Flow Control. A value near 0.0 is healthy; 0.2 or higher indicates issues.
Check whether any new features have been added to the new MariaDB version. If a new feature in the new MariaDB version cannot be replicated to the old MariaDB version, then do not use that feature until all cluster nodes have been upgrades to the new MariaDB version.
Next, make sure that the Galera version numbers are compatible.
If you are upgrading from the most recent release to , then the versions will be compatible. uses Galera 3 (i.e. Galera wsrep provider versions 25.3.x), and uses Galera 4 (i.e. Galera wsrep provider versions 26.4.x). This means that upgrading to also upgrades the system to Galera 4. However, Galera 3 and Galera 4 should be compatible for the purposes of a rolling upgrade, as long as you are using Galera 26.4.2 or later.
Ideally, you want to have a large enough gcache to avoid a State Snapshot Transfer (SST) during the rolling upgrade. The gcache size can be configured by setting gcache.size For example:wsrep_provider_options="gcache.size=2G"
2
If you use a load balancing proxy such as MaxScale or HAProxy, make sure to drain the server from the pool so it does not receive any new connections.
3
.
4
Uninstall the old version of MariaDB and the Galera wsrep provider.
5
Install the new version of MariaDB and the Galera wsrep provider.
see for more information.
see for more information.
see for more information.
6
Make any desired changes to configuration options in , such as my.cnf. This includes removing any system variables or options that are no longer supported.
7
On Linux distributions that use systemd you may need to increase the service startup timeout as the default timeout of 90 seconds may not be sufficient. See for more information.
8
.
9
Run with the --skip-write-binlog option.
mysql_upgrade does two things:
Ensures that the system tables in the l database are fully compatible with the new version.
Does a very quick check of all tables and marks them as compatible with the new version of MariaDB.
Check whether any new features have been added to the new MariaDB version. If a new feature in the new MariaDB version cannot be replicated to the old MariaDB version, then do not use that feature until all cluster nodes have been upgrades to the new MariaDB version.
Next, make sure that the Galera version numbers are compatible.
If you are upgrading from the most recent release to , then the versions will be compatible.
You want to have a large enough gcache to avoid a State Snapshot Transfer (SST) during the rolling upgrade. The gcache size can be configured by setting gcache.size For example:wsrep_provider_options="gcache.size=2G"
2
If you use a load balancing proxy such as or HAProxy, make sure to drain the server from the pool so it does not receive any new connections.
3
.
4
Uninstall the old version of MariaDB and the Galera wsrep provider.
5
Install the new version of MariaDB and the Galera wsrep provider.
see for more information.
see for more information.
see for more information.
6
Make any desired changes to configuration options in , such as my.cnf. This includes removing any system variables or options that are no longer supported.
7
On Linux distributions that use systemd you may need to increase the service startup timeout as the default timeout of 90 seconds may not be sufficient. See for more information.
8
.
9
Run with the --skip-write-binlog option.
mysql_upgrade does two things:
Ensures that the system tables in the database are fully compatible with the new version.
Does a very quick check of all tables and marks them as compatible with the new version of MariaDB.
At least three nodes: For redundancy and avoiding split-brain scenarios (bare-metal or virtual machines).
Linux Operating System: A compatible Debian-based (e.g., Ubuntu, Debian) or RHEL-based (e.g., CentOS, Fedora) distribution.
Synchronized Clocks: All nodes should have NTP configured for time synchronization.
SSH Access: Root or sudo access to all nodes for installation and configuration.
Network Connectivity: All nodes must be able to communicate with each other over specific ports (see Firewall section). Low latency between nodes is ideal.
rsync: Install rsync on all nodes, as it's commonly used for State Snapshot Transfers (SST).
sudo apt install rsync (Debian/Ubuntu)
2. Installation (on Each Node)
Install MariaDB Server and the Galera replication provider on all nodes of your cluster.
a. Add MariaDB Repository:
It's recommended to install from the official MariaDB repositories to get the latest stable versions. Use the MariaDB Repository Configuration Tool (search "MariaDB Repository Generator") to get specific instructions for your OS and MariaDB version.
Example for Debian/Ubuntu (MariaDB 10.11):
b. Install MariaDB Server and Galera:
c. Secure MariaDB Installation:
Run the security script on each node to set the root password and remove insecure defaults.
Set a strong root password.
Answer Y to remove anonymous users, disallow remote root login, remove test database, and reload privilege tables.
3. Firewall Configuration (on Each Node)
Open the necessary ports on each node's firewall to allow inter-node communication.
Adjust for your firewall system (e.g., firewalld for RHEL-based systems).
4. Configure Galera Cluster (galera.cnf on Each Node)
Create a configuration file (e.g., /etc/mysql/conf.d/galera.cnf) on each node. The content will be largely identical, with specific changes for each node's name and address.
Example galera.cnf content:
Important:
wsrep_cluster_address: List the IP addresses of all nodes in the cluster on every node.
wsrep_node_name: Must be unique for each node (e.g., node1, node2, node3).
wsrep_node_address: Set to the specific IP address of the node you are configuring.
5. Start the Cluster
a. Bootstrapping the First Node:
Start MariaDB on the first node with the --wsrep-new-cluster option. This tells it to form a new cluster. Do this only once for the initial node of a new cluster.
b. Starting Subsequent Nodes:
For the second and third nodes, start the MariaDB service normally. They will discover and join the existing cluster using the wsrep_cluster_address specified in their configuration.
6. Verify Cluster Operation
After all nodes are started, verify that they have joined the cluster.
a. Check Cluster Size (on any node):
Connect to MariaDB on any node and check the cluster status:
Inside the MariaDB shell:
The Value should match the number of nodes in your cluster (e.g., 3).
b. Test Replication:
On node1, create a new database and a table:
On node2 (or node3), connect to MariaDB and check for the new database and table:
Update a service discovery tool with the current list of active cluster members.
Configuration
To use this feature, you set the wsrep_notify_cmd variable in your MariaDB configuration file (my.cnf) to the full path of the script you want to execute:
The MariaDB server user must have the permissions for the specified script.
Passed Parameters
When a cluster event occurs, the server executes the configured script and passes several arguments to it, providing context about the event. The script can then use these arguments to take appropriate action.
The script is called with the following parameters:
Position
Parameter
Description
$1
Status
The new status of the node (e.g., Synced, Donor).
$2
View ID
A unique identifier for the current cluster membership view.
$3
Members List
A comma-separated list of the names of all members in the current view.
Joining: The node is starting to join the cluster.
Joined: The node has finished a state transfer and is catching up.
Synced: The node is a fully operational member of the cluster.
Donor: The node is currently providing a State Snapshot Transfer (SST).
Desynced: The node has been manually desynchronized (wsrep_desync=ON).
View ID ($2)
The View ID is a unique identifier composed of the view sequence number and the UUID of the node that initiated the view change. It changes every time a node joins or leaves the cluster.
Send custom alerts to a monitoring system when a node's status changes.
An Incremental State Transfer (IST) is the fast and efficient process where a joining node receives only the missing transactions it needs to catch up with the cluster, rather than receiving a full copy of the entire database.
This is the preferred provisioning method because it is:
Fast: Transferring only the missing changes is significantly faster than copying the entire dataset.
Non-Blocking: The donor node can continue to serve read and write traffic while an IST is in progress.
Conditions for IST
IST is an automatic process, but it is only possible if the following conditions are met:
The joining node has previously been a member of the cluster (its state UUID matches the cluster's).
All of the write-sets that the joiner is missing are still available in the donor node's Write-set Cache (GCache).
If these conditions are not met, the cluster automatically falls back to performing a full State Snapshot Transfer (SST).
Skipping Foreign Key Checks
This functionality is available from MariaDB 12.0.
Appliers need to verify foreign key constraints during normal operation in multi-active topologies. Therefore, appliers are configured to enable FK checking.
However, during node joining, in IST and latter catch-up period, the node is still idle (from local connections), and the only source for incoming transactions is the cluster sending certified write sets for applying. IST happens with parallel applying — there is a possibility that foreign key check cause lock conflicts between appliers accessing FK child and parent tables. Also, excessive FK checking slows down the IST process.
To address that issue, you can relax FK checks for appliers during IST and catch-up periods. The relaxed FK check mode is configurable by setting this flag:
When this operation mode is set, and the node is processing IST or catch-up, appliers skip FK checking.
The Write-Set Cache (GCache)
The GCache is a special cache on each node whose primary purpose is to store recent write-sets specifically to facilitate Incremental State Transfers. The size and configuration of the GCache are therefore critical for the cluster's recovery speed and high availability.
How the GCache Enables IST
When a node attempts to rejoin the cluster, it reports the sequence number (seqno) of the last transaction it successfully applied. The potential donor node then checks its GCache for the very next seqno in that sequence.
The donor has the necessary history. It streams all subsequent write-sets from its GCache to the joiner. The joiner applies them in order and quickly becomes Synced.
The node was disconnected for too long, and the required history has been purged from the cache. IST is not possible, and an SST is initiated.
Configuring the GCache
You can control the GCache behavior with several parameters in the [galera] section of your configuration file (my.cnf).
Parameter
Description
Controls the size of the on-disk ring-buffer file. A larger GCache can hold more history, increasing the chance of a fast IST over SST.
Specifies where GCache files are stored. Best practice is to place it on the fastest available storage like SSD or NVMe.
Enabled by default in , it allows a node to recover its GCache post-restart, enabling immediate service as a donor for IST.
Tuning gcache.size
The gcache.size parameter is the most critical setting for ensuring nodes can use IST. A GCache that is too small is the most common reason for a cluster falling back to a full SST.
The ideal size depends on your cluster's write rate and the amount of downtime you want to tolerate for a node before forcing an SST. For instance, do you want a node that is down for 1 hour for maintenance to recover instantly (IST), or can you afford a full SST?
Calculating Size Based on Write Rate
The most accurate way to size your GCache is to base it on your cluster's write rate.
1
Find your cluster's write rate:
You can calculate this using the wsrep_received_bytes status variable. First, check the value and note the time:
Wait for a significant interval during peak load (e.g., 3600 seconds, or 1 hour). Run the query again:
Now, calculate the rate (bytes per second):
write_rate=time2−time1recv2−recv1
2
Calculate your desired GCache size:
Decide on the time window you want to support (e.g., 2 hours = 7200 seconds).
In this example, a gcache.size of 140M would allow a node to be down for 2 hours and still rejoin using IST.
3
Check your current GCache validity period:
Conversely, you can use your write rate to see how long your current GCache size is valid:
A General Heuristic for Sizing
If you cannot calculate the write rate, you can use a simpler heuristic based on your data directory size as a starting point.
Start with the size of your data directory.
Subtract the size of the GCache's ring buffer file itself (default: galera.cache).
Consider your SST method:
If you use mysqldump for SST, you can also subtract the size of your InnoDB log files (as mysqldump does not copy them).
If you use rsync or xtrabackup, the log files are copied, so they should be part of the total size.
These calculations are guidelines. If your cluster nodes frequently request SSTs, it is a clear sign your gcache.size is too small. In cases where you must avoid SSTs as much as possible, you should use a much larger GCache than suggested, assuming you have the available storage.
Recommended for multi-primary writes. Disables pre-allocation. Every NEXTVAL() triggers a commit and flush. No values are lost during sync, ensuring continuity, but at the cost of higher I/O and latency.
CACHE 1000
Single Writer
Recommended for write-heavy single nodes. Allows the writer node to batch updates to disk, offering performance similar to a standalone server. However, if a secondary node writes to the sequence, the primary node's cache is discarded, creating gaps.
CACHE 50
Hybrid
A balanced approach. Reduces disk flushes significantly compared to CACHE 1 while limiting the size of potential gaps during occasional concurrent writes.
ERROR 1235 ... doesn't yet support SEQUENCEs
The server version is older than 10.11.16 and Streaming Replication is enabled.
Upgrade to a supported version or disable Streaming Replication (SET SESSION wsrep_trx_fragment_size=0).
ERROR 1213: Deadlock found when trying to get lock
The sequence was likely defined with INCREMENT BY 1 (default), causing nodes to contend for the same value.
Alter the sequence to use the Galera offset logic: ALTER SEQUENCE my_seq INCREMENT BY 0;
Definitive Galera Cluster monitoring: SHOW GLOBAL STATUS wsrep_% variables, Primary quorum checks, wsrep_local_state_comment, and flow control metrics.
From a , you can check the status of write-set replication throughout the cluster using standard queries. Status variables that relate to write-set replication have the prefix wsrep_, meaning that you can display them all using the following query:
Understanding Quorum and Cluster Integrity
The most fundamental aspect of a healthy cluster is Quorum. Quorum is a mechanism that ensures data consistency by requiring a majority of nodes to be online and in communication to form a Primary Component. Only the Primary Component will process transactions. This prevents where a network partition could otherwise lead to data conflicts.
You can check the cluster's integrity and Quorum status using these key variables. For a healthy cluster, the values for these variables must be identical on every node.
Parameter
Description
Expected Value
Checking Individual Node Status
You can monitor the status of individual nodes to ensure they are in working order and able to receive write-sets.
Status Variable
Description
Expected Value
Notes
Understanding Galera Node States
The value of wsrep_local_state_comment tells you exactly what a node is doing. The most common states include:
Node Status
Description
Checking Replication Health
These can help identify performance issues and bottlenecks.
Many status variables are differential and reset after each FLUSH STATUS command.
Metric Name
Description
Recovering a Cluster After a Full Outage
If the entire cluster shuts down or , you must manually re-establish a Primary Component by bootstrapping from the most advanced node.
1
Identify the Most Advanced Node
The "most advanced" node is the one that contains the most recent data. You must bootstrap the cluster from this node to avoid any data loss.
This page is licensed: CC BY-SA / Gnu FDL
wsrep_ssl_mode
This system variable is available from MariaDB 11.4 and 10.6.
Select which SSL implementation is used for wsrep provider communications: PROVIDER - wsrep provider internal SSL implementation; SERVER - use server side SSL implementation; SERVER_X509 - as SERVER and require valid X509 certificate.
Usage
The wsrep_ssl_mode system variable is used to configure the WSREP TLS Mode used by MariaDB Enterprise Cluster, powered by Galera.
When set to SERVER or SERVER_X509, MariaDB Enterprise Cluster uses the TLS configuration for MariaDB Enterprise Server:
When set to PROVIDER, MariaDB Enterprise Cluster obtains its TLS configuration from the system variable:
Details
The wsrep_ssl_mode system variable configures the WSREP TLS Mode. The following WSREP TLS Modes are supported:
When the wsrep_ssl_mode system variable is set to PROVIDER, each node obtains its TLS configuration from the system variable. The following options are used:
When the wsrep_ssl_mode system variable is set to SERVER or SERVER_X509, each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. The following system variables are used:
Parameters
Galera Cluster System Tables
Starting with Galera 4 (used in and later), several system tables related to replication are available in the mysql database. These tables can be queried by administrators to get a real-time view of the cluster's layout, membership, and current operations.
You can view these tables with the following query:
mysql vs. mariadb
Galera Load Balancer (glbd)
The Galera Load Balancer (glbd) is no longer under active development.
It is provided here for historical and reference purposes only.
For new deployments, we recommend using a modern, fully supported proxy such as .
Galera Load Balancer (glbd) is a simple, multi-threaded TCP connection balancer, optimized for database workloads.
It was inspired by pen, but unlike it, GLB focuses only on balancing generic TCP connections
COMMIT;
SET SESSION wsrep_trx_fragment_unit = 'statements';
SET SESSION wsrep_trx_fragment_size = 10;
START TRANSACTION;
SET SESSION wsrep_trx_fragment_unit = 'statements';
SET SESSION wsrep_trx_fragment_size = 1;
UPDATE work_orders
SET queue_position = queue_position + 1;
SET SESSION wsrep_trx_fragment_size = 0;
sudo apt-get remove mariadb-server galera
sudo yum remove MariaDB-server galera
sudo zypper remove MariaDB-server galera
sudo apt-get remove mariadb-server galera
sudo yum remove MariaDB-server galera
sudo zypper remove MariaDB-server galera
CREATE DATABASE test_db;
USE test_db;
CREATE TABLE messages (id INT AUTO_INCREMENT PRIMARY KEY, text VARCHAR(255));
INSERT INTO messages (text) VALUES ('Hello from node1!');
SHOW DATABASES; -- test_db should appear
USE test_db;
SELECT * FROM messages; -- 'Hello from node1!' should appear
sudo apt install mariadb-server mariadb-client galera-4 -y # For MariaDB 10.4+ or later, galera-4 is the provider.
# For older versions (e.g., 10.3), use galera-3.
sudo mariadb-secure-installation
# Example for UFW (Ubuntu)
sudo ufw allow 3306/tcp # MariaDB client connections
sudo ufw allow 4567/tcp # Galera replication (multicast and unicast)
sudo ufw allow 4567/udp # Galera replication (multicast)
sudo ufw allow 4568/tcp # Incremental State Transfer (IST)
sudo ufw allow 4444/tcp # State Snapshot Transfer (SST)
sudo ufw reload
sudo ufw enable # If firewall is not already enabled
[mysqld]
# Basic MariaDB settings
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0 # Binds to all network interfaces. Adjust if you have a specific private IP for cluster traffic.
# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so # Adjust path if different (e.g., /usr/lib64/galera-4/libgalera_smm.so)
# Galera Cluster Configuration
wsrep_cluster_name="my_galera_cluster" # A unique name for your cluster
# IP addresses of ALL nodes in the cluster, comma-separated.
# Use private IPs if available for cluster communication.
wsrep_cluster_address="gcomm://node1_ip_address,node2_ip_address,node3_ip_address"
# This node's specific configuration
wsrep_node_name="node1" # Must be unique for each node (e.g., node1, node2, node3)
wsrep_node_address="node1_ip_address" # This node's own IP address
sudo systemctl stop mariadb # Ensure it's stopped
sudo galera_new_cluster # This command often wraps the systemctl start --wsrep-new-cluster
# Alternatively: sudo systemctl start mariadb --wsrep-new-cluster
CREATE SEQUENCE seq_tickets START WITH 1 INCREMENT BY 0;
-- 1. Create a sequence optimized for concurrent access
-- CACHE 1 ensures no IDs are skipped if customers bounce between nodes
CREATE SEQUENCE seq_booking_id START WITH 1000 INCREMENT BY 0 CACHE 1;
-- 2. The Application Logic (Run on Node A or Node B)
START TRANSACTION;
-- Generate a unique Booking ID instantly (No cluster lock needed)
SELECT NEXTVAL(seq_booking_id) INTO @new_id;
-- Insert the reservation
INSERT INTO bookings (id, customer, event)
VALUES (@new_id, 'Jane Doe', 'Concert 2025');
COMMIT;
SHOW GLOBAL STATUS LIKE 'wsrep_%'
A boolean indicating if the current component is the Primary Component (1 for true, 0 for false).
sudo yum install rsync (RHEL/CentOS)
WSREP TLS Mode
Values
Description
Provider
PROVIDER
TLS is optional for Enterprise Cluster replication traffic.
Each node obtains its TLS configuration from the wsrep_provider_options system variable. When the provider is not configured to use TLS on a node, the node will connect to the cluster without TLS.
The Provider WSREP TLS Mode is backward compatible with ES 10.5 and earlier. When performing a rolling upgrade from ES 10.5 and earlier, the Provider WSREP TLS Mode can be configured on the upgraded nodes.
Server
SERVER
TLS is mandatory for Enterprise Cluster replication traffic, but X509 certificate verification is not performed.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. When MariaDB Enterprise Server is not configured to use TLS on a node, the node will fail to connect to the cluster.
The Server WSREP TLS Mode is the default in ES 10.6.
Server X509
SERVER_X509
TLS and X509 certificate verification are mandatory for Enterprise Cluster replication traffic.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. When MariaDB Enterprise Server is not configured to use TLS on a node, the node will fail to connect to the cluster.
Optionally set this system variables to the path of the CA chain directory. The directory must have been processed by openssl rehash. When your CA chain is stored in a single file, use the ssl_ca system variable instead.
Examine the grastate.dat file located in the MariaDB data directory (e.g., /var/lib/mysql/).
Look for the seqno: value in this file. The node with the highest seqno is the most advanced node. If a node was shut down gracefully, its seqno may be -1; these nodes should not be used to bootstrap if a node with a positive seqno is available.
2
Bootstrap the New Primary Component
Once you have identified the most advanced node, start the MariaDB service only on that node using a special bootstrap procedure using the command:
You can start mariadbd with the option.
This node will come online and form a new Primary Component by itself, with a cluster size of 1.
3
Start the Other Nodes
After the first node is successfully running as a new Primary Component, start the MariaDB service normally on all of the other nodes.
They will detect the existing Primary Component, connect to it, and automatically initiate a State Transfer (IST or SST) to synchronize their data and rejoin the cluster.
wsrep_cluster_status
Status of the component the node belongs to. The healthy value is Primary. Any other value indicates issues.
Primary
wsrep_cluster_size
Number of nodes in the current component. This should match the expected total nodes in the cluster.
Matches total expected nodes
wsrep_cluster_state_uuid
Unique identifier for the cluster's state. It must be consistent across all nodes.
Same on all nodes
wsrep_cluster_conf_id
Identifier for the cluster membership group. It must be the same on all nodes.
wsrep_ready
Indicates if the node can accept queries.
ON
If OFF, the node will reject almost all queries.
wsrep_connected
Indicates if the node has network connectivity with other nodes.
ON
If OFF, the node is isolated.
wsrep_local_state_comment
Shows the current node state in a readable format.
N/A
Synced
The node is a healthy, fully operational, and active member of the cluster.
The node is establishing a connection and synchronizing with the cluster.
Joined
The node has received a state transfer but is applying transactions to catch up before syncing.
Initialized
The node is not connected to any cluster component.
wsrep_local_recv_queue_avg
Average size of the queue of write-sets waiting to be applied. A value consistently higher than 0.0 indicates falling behind and may trigger Flow Control.
wsrep_flow_control_paused
Fraction of time the node has been paused by Flow Control. A value close to 0.0 is ideal; a high value indicates a performance bottleneck.
wsrep_local_send_queue_avg
Average size of the queue of write-sets waiting to be sent to other nodes. Values much greater than 0.0 can indicate network throughput issues.
wsrep_cert_deps_distance
Represents the node’s potential for parallel transaction application, helping to optimally tune the wsrep_slave_threadsparameter.
You'll see queries referencing the mysql database (e.g., FROM mysql.wsrep_cluster). This is intentional. MariaDB, a MySQL fork, retains the mysql name for its internal system schema to ensure historical and backward compatibility where it manages user permissions and system tables.
This is different from the command-line client, which should always be invoked as mariadb.
These tables are managed by the cluster itself and should not be modified by users, with the exception of wsrep_allowlist.
wsrep_allowlist
This table stores a list of allowed IP addresses that can join the cluster and perform a state transfer (IST/SST). It is a security feature to prevent unauthorized nodes from joining.
To add a new node to the allowlist, you can INSERT its IP address:
If a node attempts to join and its IP address is not in the allowlist, the join will fail. The DONOR nodes will log a warning similar to this:
The joining node will fail with a connection timeout error.
wsrep_cluster
This table contains a single row with a high-level view of the cluster's identity, state, and capabilities.
Attribute
Description
cluster_uuid
The unique identifier for the cluster.
view_id
Corresponds to the wsrep_cluster_conf_id status variable, representing the current membership view ID.
view_seqno
The global transaction sequence number associated with this cluster view.
protocol_version
The wsrep protocol version in use.
capabilities
A bitmask of capabilities provided by the Galera library.
You can query its contents like this:
wsrep_cluster_members
This table provides a real-time list of all the nodes that are currently members of the cluster component.
Node
Description
node_uuid
The unique identifier for each individual node.
cluster_uuid
The UUID of the cluster this node belongs to.
node_name
The human-readable name of the node, set by the wsrep_node_name parameter.
node_incoming_address
The IP address and port where the node is listening for client connections.
Querying this table gives you a quick overview of the current cluster membership:
wsrep_streaming_log
This table contains metadata for Streaming Replication transactions that are currently in progress. Each row represents a write-set fragment. The table is typically empty unless a large or long-running transaction with streaming enabled is active.
Fragment
Description
node_uuid
The UUID of the node where the streaming transaction originated.
trx_id
The transaction identifier.
seqno
The sequence number of the specific write-set fragment.
flags
Flags associated with the fragment.
frag
The binary log events contained in the fragment.
Example of querying the table during a streaming transaction:
This page is licensed: CC BY-SA / Gnu FDL
.
Features
Feature
Description
Server Draining
Remove servers smoothly without interrupting active connections.
High Performance
Uses Linux epoll API (2.6+).
Multithreading
Leverages multi-core CPUs for better performance.
Optional Watchdog Module
Monitors server health.
Seamless Client Integration
Uses libglb for load balancing without changing applications, by intercepting connect() calls.
Installation
GLB must be built from source. There are no pre-built packages.
This installs:
glbd (daemon) → /usr/sbin
libglb (shared library)
Running as a Service
To run as a service:
Manage with:
Configuration
GLB can be configured either via command-line options or via a configuration file.
Command-Line Options {#command-line-options}
Configuration File (glbd.cfg)
Parameter
Description
LISTEN_ADDR
Address/port GLB listens on for client connections
DEFAULT_TARGETS
Space-separated list of backend servers
OTHER_OPTIONS
Extra GLB options (e.g. balancing policy)
Example:
Destination Selection Policies
GLB supports five policies:
Policy
Description
Least Connected(default)
Routes new connections to the server with the fewest active connections (adjusted for weight).
Round Robin
Sequentially cycles through available servers.
Single
Routes all connections to the highest-weight server until it fails or a higher-weight server is available.
Random
Distributes connections randomly among servers.
Source Tracking
Routes all connections from the same client IP to the same server (best-effort).
-T | --top option: restricts balancing to servers with the highest weight.
Runtime Management
GLB can be managed at runtime via:
FIFO file
Control socket (-c <addr:port>)
Commands
Command
Example
Description
Add/Modify server
echo "192.168.0.1:3307:5" | nc 127.0.0.1 4444
Add backend with weight 5
Drain server
echo "192.168.0.1:3307:0" | nc 127.0.0.1 4444
Stop new connections, keep existing
Delete server
echo "192.168.0.1:3307:-1" | nc 127.0.0.1 4444
Remove backend and close active connections
Get routing table
echo "getinfo" | nc 127.0.0.1 4444
Show backends, weight, usage, connections
Performance Statistics
Example:
Field
Description
in / out
Bytes received/sent via client interface
recv / send
Bytes passed and number of recv()/send() calls
conns
Created / concurrent connections
poll
Read-ready / write-ready / total poll calls
elapsed
Time since last report (seconds)
Watchdog
The watchdog module performs asynchronous health checks beyond simple TCP reachability.
Enable with:
Runs mysql.sh with host:port as first argument.
Exit code 0 = healthy, non-zero = failure.
Use -i to set check interval.
With Galera, -D|--discover enables auto-discovery of nodes.
libglb (Shared Library)
libglb enables transparent load balancing by intercepting the connect() system call.
Basic Example
Environment Variables
Variable
Description
GLB_WATCHDOG
Same as --watchdog option
GLB_TARGETS
Comma-separated list of backends (H:P:W)
GLB_BIND
Local bind address for intercepted connections
GLB_POLICY
Balancing policy (single, random, source)
GLB_CONTROL
Control socket for runtime commands
Operational Limits
Limited by system open files (ulimit -n)
With default 1024 → ~493 connections
With 4096 (typical unprivileged user) → ~2029 connections
You should have an IBM Cloud account; otherwise, you can register here.
At the end of the tutorial, you will have a cluster with MariaDB up and running. IBM Cloud uses Bitnami charts to deploy MariaDB Galera with Helm
We will provision a new Kubernetes Cluster for you if, you already have one, skip to step 2
We will deploy the IBM Cloud Block Storage plug-in; if you already have it, skip to step 3
MariaDB Galera deployment
Step 1: Provision Kubernetes Cluster
Click the Catalog button on the top
Select Service from the catalog
Search for Kubernetes Service and click on it
You are now at the Kubernetes deployment page; you need to specify some details about the cluster
Choose a standard or free plan; the free plan only has one worker node and no subnet. to provision a standard cluster, you will need to upgrade account to Pay-As-You-Go
To upgrade to a Pay-As-You-Go account, complete the following steps:
Now choose your location settings; for more information, please visit
Choose Geography (continent)
Choose Single or Multizone. In single zone, your data is only kept in one datacenter; on the other hand, with Multizone it is distributed to multiple zones, thus safer in an unforeseen zone failure
Choose a Worker Zone if using Single zones or Metro if Multizone
If you wish to use Multizone please set up your account with or
If at your current location selection, there is no available Virtual LAN, a new Vlan will be created for you
Choose a Worker node setup or use the preselected one, set Worker node amount per zone
Choose Master Service Endpoint, In VRF-enabled accounts, you can choose private-only to make your master accessible on the private network or via VPN tunnel. Choose public-only to make your master publicly accessible. When you have a VRF-enabled account, your cluster is set up by default to use both private and public endpoints. For more information visit .
Give cluster a name
Give desired tags to your cluster; for more information, visit
Click create
Wait for you cluster to be provisioned
Your cluster is ready for usage
Step 2: Deploy IBM Cloud Block Storage Plug-in
The Block Storage plug-in is a persistent, high-performance iSCSI storage that you can add to your apps by using Kubernetes Persistent Volumes (PVs).
Click the Catalog button on the top
Select Software from the catalog
Search for IBM Cloud Block Storage plug-in and click on it
On the application page Click in the dot next to the cluster, you wish to use
Click on Enter or Select Namespace and choose the default Namespace or use a custom one (if you get error please wait 30 minutes for the cluster to finalize)
Give a name to this workspace
Click install and wait for the deployment
Step 3: Deploy MariaDB Galera
We will deploy MariaDB on our cluster
Click the Catalog button on the top
Select Software from the catalog
Search for MariaDB and click on it
On the application page Click in the dot next to the cluster, you wish to use
Click on Enter or Select Namespace and choose the default Namespace or use a custom one
Give a unique name to workspace, which you can easily recognize
Select which resource group you want to use, it's for access controll and billing purposes. For more information please visit
Give tags to your MariaDB Galera, for more information visit
Click on Parameters with default values, You can set deployment values or use the default ones
Please set the MariaDB Galera root password in the parameters
After finishing everything, tick the box next to the agreements and click install
The MariaDB Galera workspace will start installing, wait a couple of minutes
Your MariaDB Galera workspace has been successfully deployed
Verify MariaDB Galera Installation
Go to in your browser
Click on Clusters
Click on your Cluster
Now you are at your clusters overview, here Click on Actions and Web terminal from the dropdown menu
Click install - wait couple of minutes
Click on Actions
Click Web terminal, and a terminal will open up
Type in the terminal; please change NAMESPACE to the namespace you choose at the deployment setup:
Enter your pod with bash; please replace PODNAME with your mariadb pod's name
After you are in your pod , please verify that MariaDB is running on your pod's cluster. Please enter the root password after the prompt
You have successfully deployed MariaDB Galera on IBM Cloud!
This page is licensed: CC BY-SA / Gnu FDL
Configuring MariaDB Replication between MariaDB Galera Cluster and MariaDB Server
can be used to replicate between MariaDB Galera Cluster and MariaDB Server. This article will discuss how to do that.
Configuring the Cluster
Before we set up replication, we need to ensure that the cluster is configured properly. This involves the following steps:
Set on all nodes in the cluster. See and for more information on why this is important. It is also needed to .
Set to the same value on all nodes in the cluster. See for more information on what this means.
Configuring Wsrep GTID Mode
If you want to use replication, then you also need to configure some things to . For example:
needs to be set on all nodes in the cluster.
needs to be set to the same value on all nodes in the cluster so that each cluster node uses the same domain when assigning for Galera Cluster's write sets.
needs to be enabled on all nodes in the cluster. See about that.
And as an extra safety measure:
should be set to a different value on all nodes in a given cluster, and each of these values should be different than the configured value. This is to prevent a node from using the same domain used for Galera Cluster's write sets when assigning for non-Galera transactions, such as DDL executed with set or DML executed with set.
Configuring the Replica
Before we set up replication, we also need to ensure that the MariaDB Server replica is configured properly. This involves the following steps:
Set to a different value than the one that the cluster nodes are using.
Set to a value that is different than the and values that the cluster nodes are using.
Set and if you want the replica to log the transactions that it replicates.
Setting up Replication
Our process to set up replication is going to be similar to the process described at , but it will be modified a bit to work in this context.
Start the cluster
The very first step is to start the nodes in the first cluster. The first node will have to be . The other nodes can be started normally.
Once the nodes are started, you need to pick a specific node that will act as the replication primary for the MariaDB Server.
1
Backup the Database on the Cluster's Primary Node and Prepare It
The first step is to simply take and prepare a fresh of the node that you have chosen to be the replication primary. For example:
And then you would prepare the backup as you normally would. For example:
2
Start the New Replica
Now that the backup has been restored to the MariaDB Server replica, you can start the MariaDB Server process.
1
Create a Replication User on the Cluster's Primary
Before the MariaDB Server replica can begin replicating from the cluster's primary, you need to on the primary that the replica can use to connect, and you need to the user account the privilege. For example:
2
Setting up Circular Replication
You can also set up between the cluster and MariaDB Server, which means that the MariaDB Server replicates from the cluster, and the cluster also replicates from the MariaDB Server.
1
Create a Replication User on the MariaDB Server Primary
Before circular replication can begin, you also need to on the MariaDB Server, since it will be acting as the replication primary to the cluster's replica, and you need to the user account the privilege. For example:
2
This page is licensed: CC BY-SA / Gnu FDL
Galera Cluster Status Variables
Viewing Galera Cluster Status Variables
Galera status variables can be viewed with the statement.
See also the .
List of Galera Cluster status variables
MariaDB Galera Cluster has the following status variables:
wsrep_applier_thread_count
Description: Stores the current number of applier threads to make clear how many slave threads of this type there are.
wsrep_apply_oooe
Description: How often write sets have been applied out of order, an indicator of parallelization efficiency.
wsrep_apply_oool
Description: How often write sets with a higher sequence number were applied before ones with a lower sequence number, implying slow write sets.
wsrep_apply_waits
Description: Number of times the applier had to wait for an earlier transaction (lower seqno) to be applied before it could apply the next write set.
wsrep_apply_window
Description: Average distance between highest and lowest concurrently applied seqno.
wsrep_cert_deps_distance
Description: Average distance between the highest and the lowest sequence numbers that can possibly be applied in parallel, or the potential degree of parallelization.
wsrep_cert_index_size
Description: The number of entries in the certification index.
wsrep_cert_interval
Description: Average number of transactions received while a transaction replicates.
wsrep_cluster_capabilities
Description:
wsrep_cluster_conf_id
Description: Total number of cluster membership changes that have taken place.
wsrep_cluster_size
Description: Number of nodes currently in the cluster.
wsrep_cluster_state_uuid
Description: UUID state of the cluster. If it matches the value in , the local and cluster nodes are in sync.
wsrep_cluster_status
Description: Cluster component status. Possible values are PRIMARY (primary group configuration, quorum present), NON_PRIMARY (non-primary group configuration, quorum lost), or DISCONNECTED (not connected to group, retrying).
wsrep_cluster_weight
Description: The total weight of the current members in the cluster. The value is counted as a sum of pc.weight of the nodes in the current primary component.
wsrep_commit_oooe
Description: How often a transaction was committed out of order.
wsrep_commit_oool
Description: No meaning.
wsrep_commit_window
Description: Average distance between highest and lowest concurrently committed seqno.
wsrep_connected
Description: Whether or not MariaDB is connected to the wsrep provider. Possible values are ON or OFF.
wsrep_desync_count
Description: Returns the number of operations in progress that require the node to temporarily desync from the cluster.
wsrep_evs_delayed
Description: Provides a comma separated list of all the nodes this node has registered on its delayed list.
wsrep_evs_evict_list
Description: Lists the UUID’s of all nodes evicted from the cluster. Evicted nodes cannot rejoin the cluster until you restart their mysqld processes.
wsrep_evs_repl_latency
Description: This status variable provides figures for the replication latency on group communication. It measures latency (in seconds) from the time point when a message is sent out to the time point when a message is received. As replication is a group operation, this essentially gives you the slowest ACK and longest RTT in the cluster. The format is min/avg/max/stddev
wsrep_evs_state
Description: Shows the internal state of the EVS protocol.
wsrep_flow_control_paused
Description: The fraction of time since the last FLUSH STATUS command that replication was paused due to flow control.
wsrep_flow_control_paused_ns
Description: The total time spent in a paused state measured in nanoseconds.
wsrep_flow_control_recv
Description: Number of FC_PAUSE events received as well as sent since the most recent status query.
wsrep_flow_control_sent
Description: Number of FC_PAUSE events sent since the most recent status query
wsrep_gcomm_uuid
Description: The UUID assigned to the node.
wsrep_incoming_addresses
Description: Comma-separated list of incoming server addresses in the cluster component.
wsrep_last_committed
Description: Sequence number of the most recently committed transaction.
wsrep_local_bf_aborts
Description: Total number of high-priority local transactions aborts caused by replication applier threads.
wsrep_local_cached_downto
Description: The lowest sequence number, or seqno, in the write-set cache (GCache).
wsrep_local_cert_failures
Description: Total number of local transactions that failed the certification test and consequently issued a voluntary rollback.
wsrep_local_commits
Description: Total number of local transactions committed on the node.
wsrep_local_index
Description: The node's index in the cluster. The index is zero-based.
wsrep_local_recv_queue
Description: Current length of the receive queue, which is the number of write sets waiting to be applied.
wsrep_local_recv_queue_avg
Description: Average length of the receive queue since the most recent status query. If this value is noticeably larger than zero, the node is likely to be overloaded and cannot apply the write sets as quickly as they arrive, resulting in replication throttling.
wsrep_local_recv_queue_max
Description: The maximum length of the recv queue since the last FLUSH STATUS command.
wsrep_local_recv_queue_min
Description: The minimum length of the recv queue since the last FLUSH STATUS command.
wsrep_local_replays
Description: Total number of transaction replays due to asymmetric lock granularity.
wsrep_local_send_queue
Description: Current length of the send queue, which is the number of write sets waiting to be sent.
wsrep_local_send_queue_avg
Description: Average length of the send queue since the most recent status query. If this value is noticeably larger than zero, there are most likely network throughput or replication throttling issues.
wsrep_local_send_queue_max
Description: The maximum length of the send queue since the last FLUSH STATUS command.
wsrep_local_send_queue_min
Description: The minimum length of the send queue since the last FLUSH STATUS command.
wsrep_local_state
Description: Internal Galera Cluster FSM state number.
wsrep_local_state_comment
Description: Human-readable explanation of the state.
wsrep_local_state_uuid
Description: The node's UUID state. If it matches the value in , the local and cluster nodes are in sync.
wsrep_open_connections
Description: The number of open connection objects inside the wsrep provider.
wsrep_open_transactions
Description: The number of locally running transactions that have been registered inside the wsrep provider. This means transactions that have made operations that have caused write set population to happen. Transactions that are read-only are not counted.
wsrep_protocol_version
Description: The wsrep protocol version being used.
wsrep_provider_name
Description: The name of the provider. The default is "Galera".
wsrep_provider_vendor
Description: The vendor string.
wsrep_provider_version
Description: The version number of the Galera wsrep provider
wsrep_ready
Description: Whether or not the Galera wsrep provider is ready. Possible values are ON or OFF
wsrep_received
Description: Total number of write sets received from other nodes.
wsrep_received_bytes
Description: Total size in bytes of all write sets received from other nodes.
wsrep_repl_data_bytes
Description: Total size of data replicated.
wsrep_repl_keys
Description: Total number of keys replicated.
wsrep_repl_keys_bytes
Description: Total size of keys replicated.
wsrep_repl_other_bytes
Description: Total size of other bits replicated.
wsrep_replicated
Description: Total number of write sets replicated to other nodes.
wsrep_replicated_bytes
Description: Total size in bytes of all write sets replicated to other nodes.
wsrep_rollbacker_thread_count
Description: Stores the current number of rollbacker threads to make clear how many slave threads of this type there are.
wsrep_thread_count
Description: Total number of wsrep (applier/rollbacker) threads.
This page is licensed: CC BY-SA / Gnu FDL
Configuring MariaDB Replication between Two MariaDB Galera Clusters
can be used for replication between two . This article will discuss how to do that.
Configuring the Clusters
Before we set up replication, we need to ensure that the clusters are configured properly. This involves the following steps:
Galera Use Cases
MariaDB Galera Cluster ensures high availability and disaster recovery through synchronous multi-master replication. It's ideal for active-active setups, providing strong consistency and automatic failover, perfect for critical applications needing continuous uptime.
To understand these use cases, it helps to see how Galera's core features are related:
High Availability (HA) for Mission-Critical Applications
Galera's core strength is its synchronous replication, ensuring that data is written to all nodes simultaneously. This makes it ideal for applications where data loss is unacceptable and downtime must be minimal.
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| ip | char(64) | NO | PRI | NULL | |
+-------+----------+------+-----+---------+-------+
INSERT INTO mysql.wsrep_allowlist(ip) VALUES('18.193.102.155');
[Warning] WSREP: Connection not allowed, IP 3.70.155.51 not found in allowlist.
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
needs to be set to the same path on all nodes in the cluster. See MDEV-9856 about that.
Copy the Backup to the Replica
Once the backup is done and prepared, you can copy it to the MariaDB Server that will be acting as replica. For example:
3
Restore the Backup on the Second Cluster's Replica
At this point, you can restore the backup to the , as you normally would. For example:
And adjusting file permissions, if necessary:
Start Replication on the New Replica
At this point, you need to get the replication coordinates of the primary from the original backup.
The coordinates will be in the file.
mariadb-backup dumps replication coordinates in two forms: and file and position coordinates, like the ones you would normally see from output. In this case, it is probably better to use the coordinates.
For example:
Regardless of the coordinates you use, you will have to set up the primary connection using and then start the replication threads with .
If you want to use GTIDs, then you will have to first set to the coordinates that we pulled from the file, and we would set MASTER_USE_GTID=slave_pos in the command. For example:
If you want to use the file and position coordinates, then you would set MASTER_LOG_FILE and MASTER_LOG_POS in the command to the file and position coordinates that we pulled from the file. For example:
3
Check the Status of the New Replica
You should be done setting up the replica now, so you should check its status with . For example:
Now that the MariaDB Server is up, ensure that it does not start accepting writes yet if you want to set up between the cluster and the MariaDB Server.
Start Circular Replication on the Cluster
How this is done would depend on whether you want to use the coordinates or the file and position coordinates.
Regardless, you need to ensure that the second cluster is not accepting any writes other than those that it replicates from the cluster at this stage.
To get the GTID coordinates on the MariaDB server, you can check by executing:
Then on the node acting as a replica in the cluster, you can set up replication by setting to the GTID that was returned and then executing :
To get the file and position coordinates on the MariaDB server, you can execute :
Then on the node acting as a replica in the cluster, you would set master_log_file and master_log_pos in the command. For example:
3
Check the Status of the Circular Replication
You should be done setting up the circular replication on the node in the first cluster now, so you should check its status with . For example:
wsrep_gtid_domain_id needs to be set to the same value on all nodes in a given cluster so that each cluster node uses the same domain when assigning for Galera Cluster's write sets. Each cluster should have this set to a different value so that each cluster uses different domains when assigning for their write sets.
needs to be enabled on all nodes in the cluster. See MDEV-9855 about that.
needs to be set to the same path on all nodes in the cluster. See about that.
And as an extra safety measure:
should be set to a different value on all nodes in a given cluster, and each of these values should be different than the configured wsrep_gtid_domain_id value. This is to prevent a node from using the same domain used for Galera Cluster's write sets when assigning for non-Galera transactions, such as DDL executed with wsrep_OSU_method=RSU set or DML executed with wsrep_on=OFF set.
Configuring Parallel Replication
To improve the performance of the replication stream between clusters, it is recommended to enable on the nodes in the destination cluster (the cluster acting as the replica).
Setting up Replication
Our process to set up replication is going to be similar to the process described at , but it will be modified a bit to work in this context.
1
Start the First Cluster
The very first step is to start the nodes in the first cluster. The first node will have to be bootstrapped. The other nodes can be started normally.
Once the nodes are started, you need to pick a specific node that will act as the replication primary for the second cluster.
2
Backup the Database on the First Cluster's Primary Node and Prepare It
The first step is to simply take and prepare a fresh of the node that you have chosen to be the replication primary. For example:
And then you would prepare the backup as you normally would. For example:
3
Copy the Backup to the Second Cluster's Replica
Once the backup is done and prepared, you can copy it to the node in the second cluster that will be acting as replica. For example:
4
Restore the Backup on the Second Cluster's Replica
At this point, you can restore the backup to the , as you normally would. For example:
And adjusting file permissions, if necessary:
5
Bootstrap the Second Cluster's Replica
Now that the backup has been restored to the second cluster's replica, you can start the server by the node.
6
Create a Replication User on the First Cluster's Primary
Before the second cluster's replica can begin replicating from the first cluster's primary, you need to on the primary that the replica can use to connect, and you need to the user account the privilege. For example:
7
Start Replication on the Second Cluster's Replica
At this point, you need to get the replication coordinates of the primary from the original backup.
The coordinates will be in the file.
mariadb-backup dumps replication coordinates in two forms:
8
Check the Status of the Second Cluster's Replica
You should be done setting up the replica now, so you should check its status with . For example:
9
Start the Second Cluster
If the replica is replicating normally, then the next step would be to start the MariaDB Server process on the other nodes in the second cluster.
Now that the second cluster is up, ensure that it does not start accepting writes yet if you want to set up between the two clusters.
Setting up Circular Replication
You can also set up between the two clusters, which means that the second cluster replicates from the first cluster, and the first cluster also replicates from the second cluster.
1
Create a Replication User on the Second Cluster's Primary
Before circular replication can begin, you also need to on the second cluster's primary that the first cluster's replica can use to connect, and you need to the user account the the privilege. For example:
2
Start Circular Replication on the First Cluster
How this is done would depend on whether you want to use the coordinates or the file and position coordinates.
Regardless, you need to ensure that the second cluster is not accepting any writes other than those that it replicates from the first cluster at this stage.
To get the GTID coordinates on the second cluster, you can check by executing:
3
Check the Status of the Circular Replication
You should be done setting up the circular replication on the node in the first cluster now, so you should check its status with . For example:
Financial Trading Platforms: These systems demand immediate data consistency across all read and write operations.
E-commerce and Online Retail: Ensures immediate consistency in inventory levels, shopping carts, and order statuses.
Billing and CRM Systems: Applications where customer data must be continuously available and instantly up-to-date, 24/7.
This diagram shows how a proxy like MaxScale handles a node failure. The application is shielded from the downtime, and traffic is automatically rerouted to the healthy nodes.
How It Really Works: The "Synchronous" Nuance
When you hear "synchronous," it doesn't mean every node writes to disk at the exact same millisecond. The process is more elegant:
A client sends a COMMIT to one node (e.g., Node A).
Node A packages the transaction and replicates it to Node B and Node C.
Node B and Node C check the transaction for conflicts (called certification) and signal "OK" back to Node A.
Only after Node A gets an "OK" from all other nodes does it tell the client, "Your transaction is committed."
All nodes then apply the write.
As a result, the data is "safe" on all nodes before the application is ever told the write was successful.
In-Depth Use Case: E-commerce Inventory Control
You have one "Super-Widget" left in stock. Two customers, accessing different nodes, click "Buy" simultaneously.
Without Galera (Traditional Replication):
You risk selling the widget twice due to replication lag.
With Galera Cluster:
Both "buy" transactions (UPDATE inventory SET stock=0...) are sent for cluster certification. The cluster instantly detects the conflict:
One transaction "wins" certification and commits.
The other transaction fails certification and gets a "deadlock" error.
Result: Data integrity is fully maintained.
Always Use a Proxy. Your application shouldn't know about individual nodes. Place a cluster-aware proxy like MariaDB MaxScale in front of your cluster.
Design for 3 (or 5). A Galera Cluster needs a minimum of three nodes (or any odd number) to maintain quorum—the ability to have a "majority vote" and avoid a "split-brain" scenario.
The Trade-Offs
Latency: The synchronous check adds a small amount of latency to every COMMIT.
Application Deadlocks: Your application must be built to handle "deadlock" errors by retrying the transaction.
Zero-Downtime Maintenance and Upgrades
Galera allows for a rolling restart of the cluster members. By taking one node down at a time, performing maintenance, and bringing it back up, the cluster remains operational.
Examples:
Continuous Operations Environments: Organizations with strict SLAs that prohibit maintenance windows.
Database Scaling and Infrastructure Changes: Adding or removing cluster nodes (scaling out or in) without interrupting service.
This flowchart shows the "rolling" process for a 3-node cluster.
How It Really Works: The "Graceful" Maintenance Process
MariaDB Maintenance Process
Isolate the Node
Configure your proxy (e.g., MaxScale) to stop routing new connections to the targeted node for maintenance.
Perform Maintenance
Safely stop the MariaDB service by executing systemctl stop mariadb. Proceed with applying OS patches or upgrading MariaDB binaries.
Restart & Resync
Upon restarting MariaDB, it will automatically synchronize with the cluster. The Incremental State Transfer (IST) ensures only the missed changes are applied.
Rejoin
After syncing, enable the node again in the proxy.
Repeat
Apply these steps to other nodes individually.
Reduced Capacity
While one node is down for maintenance, your 3-node cluster is temporarily running as a 2-node cluster. It's wise to perform maintenance during low-traffic periods.
IST vs. SST
The fast, automatic sync is IST (Incremental State Transfer). If a node is down for too long, it may trigger a State Snapshot Transfer (SST)—a full copy of the entire database. SSTs are resource-intensive.
Disaster Recovery and Geo-Redundancy
Galera can be deployed across multiple physical locations, providing a robust solution for disaster recovery by surviving the complete loss of one site.
Examples:
Multi-Data Center Deployment: Deploying a cluster across three or more geographically separated data centers.
Disaster Recovery Setup: Deploying one cluster in a data center using asynchronous replication to a second cluster in a separate data center.
In-Depth Look: Two Different DR Patterns
This use case covers two distinct architectures with different goals:
This is a single Galera cluster with nodes stretched across multiple data centers. A COMMIT in New York is not "OK'd" until the data is safely certified by the London node. This gives Zero Data Loss (RPO=0) but has a major performance impact.
This is the more common setup. A primary cluster in DC-1 runs at full speed. It asynchronously replicates its data to a separate node/cluster in DC-2. This is fast, but allows for minimal data loss (RPO > 0) in a disaster.
Choosing the right DR pattern
Feature
Synchronous WAN Cluster
Asynchronous DR Cluster
Primary Goal
100% Data Consistency
Primary Site Performance
Data Loss (RPO)
Zero (RPO=0)
Seconds to Minutes (RPO > 0)
Performance Impact
Very High. All writes are as slow as the RTT to the farthest data center.
None. Primary cluster runs at local network speed.
Best For
Financials or other applications where data loss is impossible to tolerate.
Scaling Out Write Workloads (Limited)
While synchronous replication adds some overhead, Galera fundamentally allows any node to accept write queries. This is best combined with a proxy like MaxScale to intelligently distribute traffic.
Examples:
Load Balanced Read/Write Traffic: Using MaxScale's Read/Write Split Router to direct reads to any node and writes to a single "Primary" node.
High-Volume Write Environments: Suitable for applications with a high volume of concurrent, non-conflictingwrite operations.
Myth vs. Reality: Write Throughput in Distributed Systems
Myth: "With 3 nodes, I achieve 3x the write throughput."
Reality: False. Every write must be processed by all three nodes.
Nuance: Enjoy excellent read-scaling. Write scaling is only possible if writes are non-contended (not targeting the same rows).
In-Depth Use Case: The "Read-Write Split" Strategy (Recommended)
This is the most common and recommended architecture. MaxScale's readwritesplit router automatically designates one node as the "Primary" (for writes) and load-balances reads across the others. If the Primary node fails, MaxScale automatically promotes a new one.
Strategy
"True Multi-Master"
"Read-Write Split" (Recommended)
How it Works
The application (or proxy) sends writes to allnodes in the cluster.
A proxy (MaxScale) designates one node as "Primary" and sends 100% of writes to it.
Pros
Fully utilizes all nodes for writes; no single point of failure for write ingress.
No application deadlocks. Zero certification failures. Simple for the application.
Cons
High risk of deadlocks. If two clients update the same row on different nodes, one fails.
Write throughput is limited to what a single node can handle.
Best For
Very specific applications that are 100% guaranteed to have no write conflicts.
Read-Write Split Strategy
For most applications, using readwritesplit is the safest, most reliable, and effective strategy.
Keep Transactions Small: Large UPDATE operations on a single node can stall the entire cluster during the certification/commit phase.
Trade-Off: readwritesplit is not sharding. Galera focuses on high availability rather than infinite write-scaling. If your application demands more writes than a single powerful server can handle, consider implementing a sharded solution.
These topics will be discussed in more detail below.
Dear Schema Designer:
InnoDB only, always have PK.
Dear Developer:
Check for errors, even after COMMIT.
Moderate sized transactions.
Don't make assumptions about AUTO_INCREMENT values.
Handling of "critical reads" is quite different (arguably better).
Read/Write split is not necessary, but is still advised in case the underlying structure changes in the future.
Dear DBA:
Building the machines is quite different. (Not covered here)
ALTERs are handled differently.
TRIGGERs and EVENTs may need checking.
Overview of cross-colo writing
(This overview is valid even for same-datacenter nodes, but the issues of latency vanish.)
Cross-colo latency is an 'different' than with traditional replication, but not necessarily better or worse with Galera. The latency happens at a very different time for Galera.
In 'traditional' replication, these steps occur:
Client talks to Master. If Client and Master are in different colos, this has a latency hit.
Each SQL to Master is another latency hit, including(?) the COMMIT (unless using autocommit).
Replication to Slave(s) is asynchronous, so this does not impact the client writing to the Master.
In Galera-based replication:
Client talks to any Master -- possibly with cross-colo latency. Or you could arrange to have Galera nodes co-located with clients to avoid this latency.
At COMMIT time (or end of statement, in case of autocommit=1), galera makes one roundtrip to other nodes.
The COMMIT usually succeeds, but could fail if some other node is messing with the same rows. (Galera retries on autocommit failures.)
For an N-statement transaction: In a typical 'traditional' replication setup:
0 or N (N+2?) latency hits, depending on whether the Client is co-located with the Master.
Replication latencies and delays lead to issues with "Critical Reads".
In Galera:
0 latency hits (assuming Client is 'near' some node)
1 latency hit for the COMMIT.
0 (usually) for Critical Read (details below)
Bottom line: Depending on where your Clients are, and whether you clump statements into BEGIN...COMMIT transacitons, Galera may be faster or slower than traditional replication in a WAN topology.
AUTO_INCREMENT
By using wsrep_auto_increment_control = ON, the values of auto_increment_increment and auto_increment_offset will be automatically adjusted as nodes come/go.
If you are building a Galera cluster by starting with one node as a Slave to an existing non-Galera system, and if you have multi-row INSERTs that depend on AUTO_INCREMENTs, the read this Percona blog
Bottom line: There may be gaps in AUTO_INCREMENT values. Consecutive rows, even on one connection, will not have consecutive ids.
Beware of Proxies that try to implement a "read/write split". In some situations, a reference to LAST_INSERT_ID() will be sent to a "Slave".
InnoDB only
For effective replication of data, you must use only InnoDB. This eliminates
FULLTEXT index (until 5.6)
SPATIAL index
MyISAM's PK as second column
You can use MyISAM and MEMORY for data that does not need to be replicated.
Also, you should use "START TRANSACTION READONLY" wherever appropriate.
Check after COMMIT
Check for errors after issuing COMMIT. A "deadlock" can occur due to writes on other node(s).
Possible exception (could be useful for legacy code without such checks): Treat the system as single-Master, plus Slaves. By writing only to one node, COMMIT should always succeed(?)
What about autocommit = 1? wsrep_retry_autocommit tells Galera to retry if a single statement that is autocommited N times. So, there is still a chance (very slim) of getting a deadlock on such a statement. The default setting of "1" retry is probably good.
Always have PRIMARY KEY
"Row Based Replication" will be used; this requires a PK on every table. A non-replicated table (eg, MyISAM) does not have to have a PK.
Transaction "size"
(This section assumes you have Galera nodes in multiple colos.) Because of some of the issues discussed, it is wise to group your write statements into moderate sized BEGIN...COMMIT transactions. There is one latency hit per COMMIT or autocommit. So, combining statements will decrease those hits. On the other hand, it is unwise (for other reasons) to make huge transactions, such as inserting/modifying millions of rows in a single transaction.
To deal with failure on COMMIT, design your code so you can redo the SQL statements in the transaction without messing up other data. For example, move "normalization" statements out of the main transaction; there is arguably no compelling reason to roll them back if the main code rolls back.
In any case, doing what is "right" for the business logic overrides other considerations.
Galera's tx_isolation is between Serializable and Repeatable Read. tx_isolation variable is ignored.
Set wsrep_log_conflicts to get errors put in the regular MySQL mysqld.err.
XA transactions cannot be supported. (Galera is already doing a form of XA in order to do its thing.)
Critical reads
Here is a 'simple' (but not 'free') way to assure that a read-after-write, even from a different connection, will see the updated data.
For non-SELECTs, use a different bit set for the first select. (TBD: Would 0xffff always work?) (Before Galera 3.6, it was wsrep_causal_reads = ON.) Doc for wsrep_sync_wait
This setting stalls the SELECT until all current updates have been applied to the node. That is sufficient to guarantee that a previous write will be visible. The time cost is usually zero. However, a large UPDATE could lead to a delay. Because of RBR and parallel application, delays are likely to be less than on traditional replication. Zaitsev's blog
It may be more practical (for a web app) to simply set wsrep_sync_wait right after connecting.
MyISAM and MEMORY
As said above, use InnoDB only. However, here is more info on the MyISAM (and hence FULLTEXT, SPATIAL, etc) issues. MyISAM and MEMORY tables are not replicated.
Having MyISAM not replicated can be a big benefit -- You can "CREATE TEMPORARY TABLE ... ENGINE=MyISAM" and have it existed on only one node. RBR assures that any data transferred from that temp table into a 'real' table can still be replicated.
Replicating GRANTs
GRANTs and related operations act on the MyISAM tables in the database mysql. The GRANT statements will(?) be replicated, but the underlying tables will not.
ALTERs
Many DDL changes on Galera can be achieved without downtime, even if they take a long time.
:
Rolling Schema Upgrade (RSU): manually execute the DDL on each node in the cluster. The node will desync while executing the DDL.
Total Order Isolation (TOI): Galera automatically replicates the DDL to each node in the cluster, and it synchronizes each node so that the statement is executed at same time (in the replication sequence) on all nodes.
Caution: Since there is no way to synchronize the clients with the DDL, you must make sure that the clients are happy with either the old or the new schema. Otherwise, you will probably need to take down the entire cluster while simultaneously switching over both the schema and the client code.
Fast DDL operations can usually be executed in TOI mode:
DDL operations that support the NOCOPY and INSTANT algorithms are usually very fast.
DDL operations that support the INPLACE algorithm may be fast or slow, depending on whether the table needs to be rebuilt.
For a list of which operations support which algorithms, see .
If you need to use RSU mode, then do the following separately for each node:
Single "Master" Configuration
You can 'simulate' Master + Slaves by having clients write only to one node.
No need to check for errors after COMMIT.
Lose the latency benefits.
DBA tricks
Remove node from cluster; back it up; put it back in. Syncup is automatic.
Remove node from cluster; use it for testing, etc; put it back in. Syncup is automatic.
Rolling hardware/software upgrade: Remove; upgrade; put back in. Repeat.
Variables that may need to be different
- If you are writing to multiple nodes, and you use AUTO_INCREMENT, then auto_increment_increment will automatically be equal the current number of nodes.
/ - Do not use.
- ROW is required for Galera.
Miscellany
Until recently, FOREIGN KEYs were buggy.
LOAD DATA is auto chunked. That is, it is passed to other nodes piecemeal, not all at once.
DROP USER may not replicate?
A slight difference in ROLLBACK for conflict: InnoDB rolls back smaller transaction; Galera rolls back last.
SET GLOBAL wsrep_debug = 1; leads to a lot of debug info in the error log.
Large UPDATEs / DELETEs should be broken up. This admonition is valid for all databases, but there are additional issues in Galera.
WAN: May need to increase (from the defaults) wsrep_provider_options = evs...
MySQL/Perona 5.6 or MariaDB 10 is recommended when going to Galera.
GTIDs
See .
How many nodes to have in a cluster
If all the servers are in the same 'vulnerability zone' -- eg, rack or data center -- Have an odd number (at least 3) of nodes.
When spanning colos, you need 3 (or more) data centers in order to be 'always' up, even during a colo failure. With only 2 data centers, Galera can automatically recover from one colo outage, but not the other. (You pick which.)
If you use 3 or 4 colos, these number of nodes per colo are safe:
3 nodes: 1+1+1 (1 node in each of 3 colos)
4 nodes: 1+1+1+1 (4 nodes won't work in 3 colos)
5 nodes: 2+2+1, 2+1+1+1 (5 nodes spread 'evenly' across the colos)
Postlog
Posted 2013; VARIABLES: 2015; Refreshed Feb. 2016
See also
Rick James graciously allowed us to use this article in the documentation.
has other useful tips, how-tos,
optimizations, and debugging tips.
Original source:
This page is licensed: CC BY-SA / Gnu FDL
Introduction to State Snapshot Transfers (SSTs)
In a State Snapshot Transfer (SST), the cluster provisions nodes by transferring a full data copy from one node to another. When a new node joins the cluster, the new node initiates a State Snapshot Transfer to synchronize its data with a node that is already part of the cluster.
Types of SSTs
There are two conceptually different ways to transfer a state from one MariaDB server to another:
Logical: The only SST method of this type is the mysqldump SST method, which uses the utility to get a logical dump of the donor. This SST method requires the joiner node to be fully initialized and ready to accept connections before the transfer. This method is, by definition, blocking, in that it blocks the donor node from modifying its state for the duration of the transfer. It is also the slowest of all, and that might be an issue in a cluster with a lot of loads.
Physical: SST methods of this type physically copy the data files from the donor node to the joiner node. This requires that the joiner node be initialized after the transfer. The SST method and a few other SST methods fall into this category. These SST methods are much faster than the mysqldump SST method, but they have certain limitations. For example, they can be used only on server startup, and the joiner node must be configured very similarly to the donor node (e.g., should be the same, and so on). Some of the SST methods in this category are non-blocking on the donor node, meaning that the donor node is still able to process queries while donating the SST (e.g. the SST method is non-blocking).
SST Methods
SST methods are supported via a scriptable interface. New SST methods could potentially be developed by creating new SST scripts. The scripts usually have names of the form wsrep_sst_<method> where <method> is one of the SST methods listed below.
You can choose your SST method by setting the system variable. It can be changed dynamically with on the node that you intend to be an SST donor. For example:
It can also be set in a server in an prior to starting up a node:
For an SST to work properly, the donor and joiner node must use the same SST method. Therefore, it is recommended to set to the same value on all nodes, since any node will usually be a donor or joiner node at some point.
MariaDB Galera Cluster comes with the following built-in SST methods:
mariadb-backup
This SST method uses the utility for performing SSTs. It is one of the two non-locking methods. This is the recommended SST method if you require the ability to run queries on the donor node during the SST. Note that if you use the mariadb-backup SST method, then you also need to have socat installed on the server. This is needed to stream the backup from the donor to the joiner. This is a limitation inherited from the xtrabackup-v2 SST method.
This SST method supports
This SST method supports .
This SST method is available from and .
With this SST method, it is impossible to upgrade the cluster between some major versions; see .
See for more information.
rsync / rsync_wan
rsync is the default method. This method uses the utility to create a snapshot of the donor node. rsync should be available by default on all modern Linux distributions. The donor node is blocked with a read lock during the SST. This is the fastest SST method, especially for large datasets since it copies binary data. Because of that, this is the recommended SST method if you do not need to allow the donor node to execute queries during the SST.
The rsync method runs rsync in --whole-file mode, assuming that nodes are connected by fast local network links so that the default delta transfer mode would consume more processing time than it may save on data transfer bandwidth. When having a distributed cluster with slow links between nodes, the rsync_wan method runs rsync in the default delta transfer mode, which may reduce data transfer time substantially when an older datadir state is already present on the joiner node. Both methods are actually implemented by the same script, wsrep_sst_rsync_wan is just a symlink to the wsrep_sst_rsync script and the actual rsync mode to use is determined by the name the script was called by.
This SST method supports
This SST method supports .
The rsync SST method does not support tables created with the clause. Use the as an alternative to support this feature.
Use of this SST method could result in data corruption when using (the default).
Use of this SST method could result in data corruption when using (the default). wsrep_sst_method=rsync is a reliable way to upgrade the cluster to a newer major version.
can be used to encrypt data over the wire. Be sure to have stunnel installed. You will also need to generate certificates and keys. See for information on how to do that. Once you have the keys, you will need to add the tkey and tcert options to the [sst] option group in your MariaDB configuration file, such as:
You also need to run the certificate directory through .
cannot be used to encrypt data over the wire.
mysqldump
This SST method runs on the donor node and pipes the output to the client connected to the joiner node. The mysqldump SST method needs a username/password pair set in the variable in order to get the dump. The donor node is blocked with a read lock during the SST. This is the slowest SST method.
This SST method supports .
This SST method supports .
xtrabackup-v2
Percona XtraBackup is not supported in MariaDB. is the recommended backup method to use instead of Percona XtraBackup. See for more information.
This SST method uses the utility for performing SSTs. It is one of the two non-blocking methods. Note that if you use the xtrabackup-v2 SST method, you also need to have socat installed on the server. Since Percona XtraBackup is a third-party product, this SST method requires an additional installation and some additional configuration. Please refer to for information from the vendor.
This SST method does not support
This SST method does not support .
This SST method is available from MariaDB Galera Cluster 5.5.37 and MariaDB Galera Cluster 10.0.10.
See xtrabackup-v2 SST method for more information.
xtrabackup
Percona XtraBackup is not supported in MariaDB. is the recommended backup method to use instead of Percona XtraBackup. See for more information.
This SST method is an older SST method that uses the utility for performing SSTs. The xtrabackup-v2 SST method should be used instead of the xtrabackup SST method starting from .
This SST method does not support
This SST method does not support .
Authentication
All SST methods except rsync require authentication via username and password. You can tell the client what username and password to use by setting the system variable. It can be changed dynamically with on the node that you intend to be a SST donor. For example:
It can also be set in a server in an prior to starting up a node:
Some do not require a password. For example, the and authentication plugins do not require a password. If you are using a user account that does not require a password in order to log in, then you can just leave the password component of empty. For example:
See the relevant description or page for each SST method to find out what privileges need to be to the user and whether the privileges are needed on the donor node or joiner node for that method.
SSTs and Systemd
MariaDB's unit file has a default startup timeout of about 90 seconds on most systems. If an SST takes longer than this default startup timeout on a joiner node, then systemd will assume that mysqld has failed to startup, which causes systemd to kill the mysqld process on the joiner node. To work around this, you can reconfigure the MariaDB systemd unit to have an infinite timeout, such as by executing one of the following commands:
If you are using systemd 228 or older, then you can execute the following to set an infinite timeout:
, so if you are using systemd 229 or later, then you can execute the following to set an infinite timeout:
See for more details.
Note that that allows services to extend the startup timeout during long-running processes. Starting with , , and , on systems with systemd versions that support it, MariaDB uses this feature to extend the startup timeout during long SSTs. Therefore, if you are using systemd 236 or later, then you should not need to manually override TimeoutStartSec, even if your SSTs run for longer than the configured value. See for more information.
SST Failure
An SST failure generally renders the joiner node unusable. Therefore, when an SST failure is detected, the joiner node will abort.
Restarting a node after a mysqldump SST failure may require manual restoration of the administrative tables.
SSTs and Data at Rest Encryption
Look at the description of each SST method to determine which methods support .
For logical SST methods like mysqldump, each node should be able to have different . For physical SST methods, all nodes need to have the same , since the donor node will copy encrypted data files to the joiner node, and the joiner node will need to be able to decrypt them.
Minimal Cluster Size
In order to avoid a split-brain condition, the minimum recommended number of nodes in a cluster is 3.
When using an SST method that blocks the donor, there is yet another reason to require a minimum of 3 nodes. In a 3-node cluster, if one node is acting as an SST joiner and one other node is acting as an SST donor, then there is still one more node to continue executing queries.
Manual SSTs
In some cases, if Galera Cluster's automatic SSTs repeatedly fail, then it can be helpful to perform a "manual SST". See the following pages on how to do that:
Known Issues
mysqld_multi
SST scripts can't currently read the mysqld<#> in an that are read by instances managed by .
CREATE USER 'repl'@'dc2-dbserver1' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'dc2-dbserver1';
CREATE USER 'repl'@'c1dbserver1' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'c1dbserver1';
CREATE USER 'repl'@'c1dbserver1' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'c1dbserver1';
SET GLOBAL slave_parallel_threads = 4; -- Adjust based on workload
SET GLOBAL slave_parallel_mode = 'optimistic';
and
file and position coordinates, like the ones you would normally see from
output. In this case, it is probably better to use the
coordinates.
For example:
Regardless of the coordinates you use, you will have to set up the primary connection using and then start the replication threads with .
If you want to use GTIDs, then you will have to first set to the coordinates that we pulled from the file, and we would set MASTER_USE_GTID=slave_pos in the command. For example:
If you want to use the file and position coordinates, then you would set MASTER_LOG_FILE and MASTER_LOG_POS in the command to the file and position coordinates that we pulled the file. For example:
Then on the first cluster, you can set up replication by setting to the GTID that was returned and then executing :
To get the file and position coordinates on the second cluster, you can execute :
Then on the first cluster, you would set master_log_file and master_log_pos in the command. For example:
Tricks in replication (eg, BLACKHOLE) may not work.
Several variables need to be set differently.
Since replication is asynchronous, a Client (same or subsequent) cannot be guaranteed to see that data on the Slave. This is a "critical read". The async Replication delay forces apps to take some evasive action.
Failure of the COMMIT is reported to the Client, who should simply replay the SQL statements from the BEGIN.
Later, the whole transaction will be applied (with possibility of conflict) on the other nodes.
Critical Read -- details below
DDL operations that only support the
COPY
algorithm are usually very slow.
- 2
- ON: When an IST occurs, want there to be no torn pages? (With FusionIO or other drives that guarantee atomicity, OFF is better.)
- 2 or 0. IST or SST will recover from loss if you have 1.
wsrep_sync_wait (previously wsrep_causal_reads) - used transiently to dealing with "critical reads".
6 nodes: 2+2+2, 2+2+1+1
7 nodes: 3+2+2, 3+3+1, 2+2+2+1, 3+2+1+1
There may be a way to "weight" the nodes differently; that would allow a few more configurations. With "weighting", give each colo the same weight; then subdivide the weight within each colo evenly. Four nodes in 3 colos: (1/6+1/6) + 1/3 + 1/3 That way, any single colo failure cannot lead to "split brain".
The current versions of the Galera wsrep provider library are 26.4.21 for Galera 4. For convenience, packages containing these libraries are included in the MariaDB.
Currently, MariaDB Galera Cluster only supports the storage engine (although there is experimental support for and, from , ).
Galera Cluster Support in MariaDB Server
MariaDB Galera Cluster is powered by:
MariaDB Server.
The patch for MySQL Server and MariaDB Server. The patch currently supports only Unix-like operating systems.
The .
The patch has been merged into MariaDB Server. This means that the functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the package. The following version corresponds to each MariaDB Server version:
MariaDB Galera Cluster uses 4. This means that the patch is version 26 and the is version 4.
See for more information about how to interpret these version numbers.
See for more information about which specific version is included in each release of MariaDB Server.
In supported builds, Galera Cluster functionality can be enabled by setting some configuration options that are mentioned below. Galera Cluster functionality is not enabled in a standard MariaDB Server installation unless explicitly enabled with these configuration options.
Prerequisites
Swap Size Requirements
During normal operation, a MariaDB Galera node consumes no more memory than a regular MariaDB server. Additional memory is consumed for the
certification index and uncommitted write sets, but normally, this should not be
noticeable in a typical application. There is one exception, though:
Writeset caching during state transfer
When a node is receiving a state transfer, it cannot process and apply incoming writesets because it has no state to apply them to yet. Depending on a state transfer mechanism (e.g.) the node that sends the state transfer may not be able to apply writesets as well. Thus, they need to cache those writesets for a catch-up phase. Currently the writesets are cached in memory and, if the system runs out of memory either the state transfer will fail or the cluster will block waiting for the state transfer to end.
To control memory usage for writeset caching, check the : gcs.recv_q_hard_limit, gcs.recv_q_soft_limit, and gcs.max_throttle.
Limitations
Before using MariaDB Galera Cluster, we would recommend reading through the , so you can be sure that it is appropriate for your application.
Installing MariaDB Galera Cluster
To use MariaDB Galera Cluster, there are two primary packages that you need to install:
A MariaDB Server version that supports Galera Cluster.
The Galera wsrep provider library.
As mentioned in the previous section, Galera Cluster support is actually included in the standard MariaDB Server packages. That means that installing MariaDB Galera Cluster package is the same as installing standard MariaDB Server package in those versions. However, you will also have to install an additional package to obtain the Galera wsrep provider library.
Some methods may also require additional packages to be installed. The SST method is generally the best option for large clusters that expect a lot of loads.
Installing MariaDB Galera Cluster with a Package Manager
MariaDB Galera Cluster can be installed via a package manager on Linux. In order to do so, your system needs to be configured to install from one of the MariaDB repositories.
You can configure your package manager to install it from MariaDB Corporation's MariaDB Package Repository by using the .
You can also configure your package manager to install it from MariaDB Foundation's MariaDB Repository by using the .
Installing MariaDB Galera Cluster with yum/dnf
On RHEL, CentOS, Fedora, and other similar Linux distributions, it is highly recommended to install the relevant from MariaDB's
repository using or . Starting with RHEL 8 and Fedora 22, yum has been replaced by dnf, which is the next major version of yum. However, yum commands still work on many systems that use dnf.
To install MariaDB Galera Cluster with yum or dnf, follow the instructions at .
Installing MariaDB Galera Cluster with apt-get
On Debian, Ubuntu, and other similar Linux distributions, it is highly recommended to install the relevant from MariaDB's
repository using .
To install MariaDB Galera Cluster with apt-get, follow the instructions at .
Installing MariaDB Galera Cluster with zypper
On SLES, OpenSUSE, and other similar Linux distributions, it is highly recommended to install the relevant from MariaDB's repository using .
To install MariaDB Galera Cluster with zypper, follow the instructions at .
Installing MariaDB Galera Cluster with a Binary Tarball
To install MariaDB Galera Cluster with a binary tarball, follow the instructions at .
To make the location of the libgalera_smm.so library in binary tarballs more similar to its location in other packages, the library is now found at lib/galera/libgalera_smm.so in the binary tarballs, and there is a symbolic link in the lib directory that points to it.
Installing MariaDB Galera Cluster from Source
To install MariaDB Galera Cluster by compiling it from source, you will have to compile both MariaDB Server and the Galera wsrep provider library. For some information on how to do this, see the pages at . The pages at and Galera Cluster Documentation: Building Galera Cluster for MySQL may also be helpful.
Configuring MariaDB Galera Cluster
A number of options need to be set in order for Galera Cluster to work when using MariaDB. See for more information.
Bootstrapping a New Cluster
To first node of a new cluster needs to be bootstrapped by starting on that node with the option option. This option tells the node that there is no existing cluster to connect to. The node will create a new UUID to identify the new cluster.
Do not use the option when connecting to an existing cluster. Restarting the node with this option set will cause the node to create new UUID to identify the cluster again, and the node won't reconnect to the old cluster. See the next section about how to reconnect to an existing cluster.
For example, if you were manually starting on a node, then you could bootstrap it by executing the following:
However, keep in mind that most users are not going to be starting manually. Instead, most users will use a to start . See the following sections on how to bootstrap a node with the most common service managers.
Systemd and Bootstrapping
On operating systems that use , a node can be bootstrapped in the following way:
This wrapper uses to run with the option.
If you are using the service that supports the , then you can bootstrap a specific instance by specifying the instance name as a suffix. For example:
Systemd support and the galera_new_cluster script were added.
SysVinit and Bootstrapping
On operating systems that use , a node can be bootstrapped in the following way:
This runs with the option.
Adding Another Node to a Cluster
Once you have a cluster running and you want to add/reconnect another node to it, you must supply an address of one or more of the existing cluster members in the option. For example, if the first node of the cluster has the address 192.168.0.1, then you could add a second node to the cluster by setting the following option in a server in an :
The new node only needs to connect to one of the existing cluster nodes. Once it connects to one of the existing cluster nodes, it will be able to see all of the nodes in the cluster. However, it is generally better to list all nodes of the cluster in , so that any node can join a cluster by connecting to any of the other cluster nodes, even if one or more of the cluster nodes are down. It is even OK to list a node's own IP address in , since Galera Cluster is smart enough to ignore it.
Once all members agree on the membership, the cluster's state will be exchanged. If the new node's state is different from that of the cluster, then it will request an IST or to make itself consistent with the other nodes.
Restarting the Cluster
If you shut down all nodes at the same time, then you have effectively terminated the cluster. Of course, the cluster's data still exists, but the running cluster no longer exists. When this happens, you'll need to bootstrap the cluster again.
If the cluster is not bootstrapped and on the first node is just started normally, then the node willl try to connect to at least one of the nodes listed in the option. If no nodes are currently running, then this will fail. Bootstrapping the first node solves this problem.
Determining the Most Advanced Node
In some cases Galera will refuse to bootstrap a node if it detects that it might not be the most advanced node in the cluster. Galera makes this determination if the node was not the last one in the cluster to be shut down or if the node crashed. In those cases, manual intervention is needed.
If you know for sure which node is the most advanced you can edit the grastate.dat file in the . You can set safe_to_bootstrap=1 on the most advanced node.
You can determine which node is the most advanced by checking grastate.dat on each node and looking for the node with the highest seqno. If the node crashed and seqno=-1, then you can find the most advanced node by recovering the seqno on each node with the option. For example:
Systemd and Galera Recovery
On operating systems that use , the position of a node can be recovered by running the galera_recovery script. For example:
If you are using the service that supports the , then you can recover the position of a specific instance by specifying the instance name as a suffix. For example:
The galera_recovery script recovers the position of a node by running with the option.
When the galera_recovery script runs , it does not write to the . Instead, it redirects log output to a file named with the format /tmp/wsrep_recovery.XXXXXX, where XXXXXX is replaced with random characters.
When Galera is enabled, MariaDB's service automatically runs the galera_recovery script prior to starting MariaDB, so that MariaDB starts with the proper Galera position.
Support for and the galera_recovery script were added.
State Snapshot Transfers (SSTs)
In a State Snapshot Transfer (SST), the cluster provisions nodes by transferring a full data copy from one node to another. When a new node joins the cluster, the new node initiates a State Snapshot Transfer to synchronize its data with a node that is already part of the cluster.
See for more information.
Incremental State Transfers (ISTs)
In an Incremental State Transfer (SST), the cluster provisions nodes by transferring a node's missing writesets from one node to another. When a new node joins the cluster, the new node initiates a Incremental State Transfer to synchronize its data with a node that is already part of the cluster.
If a node has only been out of a cluster for a little while, then an IST is generally faster than an SST.
Data at Rest Encryption
MariaDB Galera Cluster supports . See for some disclaimers on how SSTs are affected when encryption is configured.
Some data still cannot be encrypted:
The disk-based is not encrypted ().
Monitoring
Status Variables
can be queried with the standard command. For example:
Cluster Change Notifications
The cluster nodes can be configured to invoke a command when cluster membership or node status changes. This mechanism can also be used to communicate the event to some external monitoring agent. This is configured by setting . See for more information.
See Also
Footnotes
This page is licensed: CC BY-SA / Gnu FDL
MariaDB Enterprise Cluster Security
The features described on this page are available from MariaDB Enterprise Server 10.6.
WSREP stands for Write-Set Replication.
MariaDB Enterprise Cluster, powered by Galera, adds some security features:
New TLS Modes have been implemented, which can be used to configure mandatory TLS and X.509 certificate verification for Enterprise Cluster:
have been implemented for Enterprise Cluster replication traffic.
have been implemented for SSTs that use MariaDB Enterprise Backup or Rsync.
WSREP TLS Modes
MariaDB Enterprise Cluster, powered by Galera, adds the system variable, which configures the WSREP TLS Mode used for Enterprise Cluster replication traffic.
The following WSREP TLS Modes are supported:
WSREP TLS Mode
Values
Description
WSREP TLS Modes: Provider
MariaDB Enterprise Cluster supports the Provider WSREP TLS Mode, which is equivalent to Enterprise Cluster's TLS implementation in earlier versions of MariaDB Server. The Provider WSREP TLS Mode is primarily intended for backward compatibility, and it is most useful for users who need to perform a rolling upgrade to Enterprise Server 10.6.
The Provider WSREP TLS Mode can be configured by setting the system variable to PROVIDER.
TLS is optional in the Provider WSREP TLS Mode. When the provider is not configured to use TLS on a node, the node will connect to the cluster without TLS.
Each node obtains its TLS configuration from the system variable. The following options are used:
WSREP Provider Option
Description
For example:
WSREP TLS Modes: Server and Server X.509
MariaDB Enterprise Cluster adds the Server and Server X.509 WSREP TLS Modes for users who require mandatory TLS.
The Server WSREP TLS Mode can be configured by setting the system variable to SERVER. In the Server WSREP TLS Mode, TLS is mandatory, but X.509 certificate verification is not performed. The Server WSREP TLS Mode is the default.
The Server X.509 WSREP TLS Mode can be configured by setting the system variable to SERVER_X509. In the Server X.509 WSREP TLS Mode, TLS and X.509 certification verification are mandatory.
In MariaDB Enterprise Server 10.6.8-4 and higher, TLS is not mandatory in the Server WSREP TLS Mode. When MariaDB Enterprise Server is not configured to use TLS on a node or TLS is not working, then the Galera library will not activate the TLS service, and connections between nodes will be unencrypted. Prior to version 10.6.8-4 TLS is mandatory when setting the Server WSREP TLS Mode, but X.509 certificate verification is not performed.
For both 'Server' and "Server X.509' WSREP TLS Modes, each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. The following system variables are used:
System Variables
Description
For example:
SST TLS Modes
MariaDB Enterprise Cluster, powered by Galera, adds the ssl-mode option, which configures the SST TLS Mode for State Snapshot Transfers (SSTs). The ssl-mode option is supported by the following SST methods, which can be configured using the system variable:
SST Method
wsrep_sst_method
The following SST TLS Modes are supported:
SST/TLS Mode
Values
Description
SST TLS Modes: Backward Compatible
In MariaDB Enterprise Server 10.6, MariaDB Enterprise Cluster adds the Backward Compatible SST TLS Mode for SSTs that use MariaDB Enterprise Backup or Rsync. The Backward Compatible SST TLS Mode is primarily intended for backward compatibility with ES 10.5 and earlier, and it is most useful for users who need to perform a rolling upgrade to ES 10.6.
The Backward Compatible SST TLS Mode is the default, but it can also be configured by setting the ssl_mode option to DISABLED in a configuration file in the [sst] group.
TLS is optional in the Backward Compatible SST TLS Mode. When the SST is not configured to use TLS, the SST will occur without TLS.
Each node obtains its TLS configuration from a configuration file in the [sst] group. The following options are used:
Option
Description
For example:
SST TLS Modes: Server and Server X.509
MariaDB Enterprise Cluster adds the Server and Server X.509 SST TLS Modes for SSTs that use MariaDB Enterprise Backup or Rsync. The Server and Server X.509 SST TLS Modes are intended for users who require mandatory TLS.
The Server SST TLS Mode can be configured by setting the ssl_mode option to REQUIRED in a configuration file in the [sst] group. In the Server SST TLS Mode, TLS is mandatory, but X.509 certificate verification is not performed.
The Server X.509 SST TLS Mode can be configured by setting the ssl_mode option to VERIFY_CA or VERIFY_IDENTITY in a configuration file in the [sst] group. In the Server X.509 SST TLS Mode, TLS and X.509 certification verification are mandatory. Prior to the state transfer, the Donor node will verify the Joiner node's X.509 certificate, and the Joiner node will verify the Donor node's X.509 certificate.
TLS is mandatory in both the Server and Server X.509 SST TLS Modes. When MariaDB Enterprise Server is not configured to use TLS on a node, the node will fail to connect during an SST.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. The following system variables are used:
System Variable
Description
For example:
When the are configured, the Server and Server X.509 SST TLS Modes use those parameters instead of the MariaDB Enterprise Server system variables. In that case, the following message will be written to the :
Cluster Name Verification
MariaDB Enterprise Cluster, powered by Galera, adds cluster name verification for Joiner nodes, which ensures that the Joiner node does not perform a State Snapshot Transfer (SST) or an Incremental State Transfer (IST) for the wrong cluster.
Prior to performing a State Snapshot Transfer (SST) or Incremental State Transfer (IST), the Donor node verifies the value configured by the Joiner node to verify that the node belongs to the cluster.
Certificate Expiration Warnings
MariaDB Enterprise Cluster, powered by Galera, can be configured to write certificate expiration warnings to the when the node's X.509 certificate is close to expiration.
Certificate expiration warnings can be configured using the system variable:
When the wsrep_certificate_expiration_hours_warning system variable is set to 0, certificate expiration warnings are not printed to the MariaDB Error Log.
When the wsrep_certificate_expiration_hours_warning system variable is set to a value N, which is greater than 0, certificate expiration warnings are printed to the MariaDB Error Log when the node's certificate expires in N
For example:
Enable TLS without Downtime
MariaDB Enterprise Cluster, powered by Galera, adds new capabilities that allow TLS to be enabled for Enterprise Cluster replication traffic without downtime.
Enabling TLS without downtime relies on two new options implemented for the system variable:
Option
Dynamic
Default
Description
SET GLOBAL gtid_slave_pos = "0-1-2";
CHANGE MASTER TO
MASTER_HOST="c1dbserver1",
MASTER_PORT=3310,
MASTER_USER="repl",
MASTER_PASSWORD="password",
MASTER_USE_GTID=slave_pos;
START SLAVE;
CREATE USER 'repl'@'c2dbserver1' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'c2dbserver1';
SHOW SLAVE STATUS\G
SHOW GLOBAL VARIABLES LIKE 'gtid_current_pos';
SHOW SLAVE STATUS\G
mariadb-bin.000096 568 0-1-2
SET GLOBAL gtid_slave_pos = "0-1-2";
CHANGE MASTER TO
MASTER_HOST="c2dbserver1",
MASTER_PORT=3310,
MASTER_USER="repl",
MASTER_PASSWORD="password",
MASTER_USE_GTID=slave_pos;
START SLAVE;
SET SESSION wsrep_sync_wait = 1;
SELECT ...
SET SESSION wsrep_sync_wait = 0;
SET SESSION wsrep_OSU_method='RSU';
ALTER TABLE tab <alter options here>;
SET SESSION wsrep_OSU_method='TOI';
SET GLOBAL gtid_slave_pos = "0-1-2";
CHANGE MASTER TO
MASTER_HOST="c1dbserver1",
MASTER_PORT=3310,
MASTER_USER="repl",
MASTER_PASSWORD="password",
MASTER_USE_GTID=slave_pos;
START SLAVE;
SET GLOBAL gtid_slave_pos = "0-1-2";
CHANGE MASTER TO
MASTER_HOST="c2dbserver1",
MASTER_PORT=3310,
MASTER_USER="repl",
MASTER_PASSWORD="password",
MASTER_USE_GTID=slave_pos;
START SLAVE;
Cluster name verification checks that a Joiner node belongs to the cluster prior to performing a State Snapshot Transfer (SST) or an Incremental State Transfer (IST).
Certificate expiration warnings are written to the MariaDB error log when the node's X.509 certificate is close to expiration.
TLS is optional for Enterprise Cluster replication traffic.
Each node obtains its TLS configuration from the wsrep_provider_options system variable. When the provider is not configured to use TLS on a node, the node will connect to the cluster without TLS.
The Provider WSREP TLS Mode is backward compatible with ES 10.5 and earlier. When performing a rolling upgrade from ES 10.5 and earlier, the Provider WSREP TLS Mode can be configured on the upgraded nodes.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration.
Starting with MariaDB Enterprise Server 10.6.8-4, TLS is not mandatory when setting the Server WSREP TLS Mode. If MariaDB Enterprise Server is not configured to use TLS on a node, or TLS is not activated, the TLS service in the Galera library will not activate, and connections will not fail, but will be unencrypted.
Prior to version 10.6.8-4, TLS is mandatory when setting the Server WSREP TLS Mode, X.509 certificate verification is not performed, and if MariaDB Enterprise Server is not configured to use TLS the node will fail to connect to the cluster.
The Server WSREP TLS Mode is the default in ES 10.6.
TLS and X.509 certificate verification are mandatory for Enterprise Cluster replication traffic.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. When MariaDB Enterprise Server is not configured to use TLS on a node, the node will fail to connect to the cluster.
Optionally set this system variables to the path of the CA chain directory. The directory must have been processed by openssl rehash. When your CA chain is stored in a single file, use the ssl_ca system variable instead.
Each node obtains its TLS configuration from the tca, tcert, and tkey options. When the SST is not configured to use TLS on a node, the node will connect during the SST without TLS.
The Backward Compatible SST TLS Mode is backward compatible with ES 10.5 and earlier, so it is suitable for rolling upgrades.
The Backward Compatible SST TLS Mode is the default in ES 10.6.
TLS is mandatory for SST traffic, but X.509 certificate verification is not performed.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. When MariaDB Enterprise Server is not configured to use TLS on a node, the node will fail to connect during an SST.
TLS and X.509 certification verification are mandatory for SST traffic.
Each node obtains its TLS configuration from the node's MariaDB Enterprise Server configuration. When MariaDB Enterprise Server is not configured to use TLS on a node, the node will fail to connect during an SST.
Prior to the state transfer, the Donor node will verify the Joiner node's X.509 certificate, and the Joiner node will verify the Donor node's X.509 certificate.
tca
Set this option to the path of the CA chain file.
tcert
Set this option to the path of the node's X.509 certificate file.
tkey
Set this option to the path of the node's private key file.
Set this system variable to the path of the node's private key file.
socket.dynamic
No
false
When set to true, the node will allow TLS and non-TLS communications at the same time.
socket.ssl_reload
Yes
N/A
When set to true with the statement, Enterprise Cluster dynamically re-initializes its TLS context.
This is most useful if you need to replace a certificate that is about to expire without restarting the server.
The paths to the certificate and key files cannot be changed dynamically, so the updated certificates and keys must be placed at the same paths defined by the relevant TLS variables.
new ssl configuration options (ssl-ca, ssl-cert and ssl-key) are ignored by SST due to presence of the tca, tcert and/or tkey in the [sst] section
[mariadb]
...
# warn 3 days before certificate expiration
wsrep_certificate_expiration_hours_warning=72
Building the Galera wsrep Package on Fedora
The instructions on this page were used to create the galera package on the Fedora Linux distribution. This package contains the wsrep provider for MariaDB Galera Cluster.
The following table lists each version of the Galera 4 wsrep provider, and it lists which version of MariaDB each one was first released in. If you would like to install Galera 4 using , , or , then the package is called galera-4.
Galera Version
Released in MariaDB Version
26.4.21
, , , , ,
The following table lists each version of the 3 wsrep provider, and it lists which version of MariaDB each one was first released in. If you would like to install 3 using , , or , then the package is called galera-4.
Galera Version
Released in MariaDB Version
The following table lists each version of the 2 wsrep provider, and it lists which version of MariaDB each one was first released in.
Galera Version
Released in MariaDB Galera Cluster Version
For convenience, a galera package containing the preferred wsrep provider is included in the MariaDB (the preferred versions are bolded in the table above).
See also .
Install the prerequisites:
Clone from and checkout the mariadb-3.x branch:
Build the packages by executing under thebuild.sh scripts/ directory with the-p switch:
When finished, you will have an RPM package containing the Galera library, arbitrator, and related files in the current directory. Note: The same set of instructions can be applied to other RPM-based platforms to generate the Galera package.
This page is licensed: CC BY-SA / Gnu FDL
mariadb-backup SST Method
Configure State Snapshot Transfers for Galera. Learn to use mariadb-backup for non-blocking data transfer when a new node joins a cluster.
The mariabackup SST method uses the utility for performing SSTs. It is one of the methods that does not block the donor node. mariadb-backup was originally forked from , and similarly, the mariabackup SST method was originally forked from the xtrabackup-v2 SST method.
installed on the server. This is needed to stream the backup from the donor node to the joiner node. This is a limitation that was inherited from the xtrabackup-v2 SST method.
Choosing mariadb-backup for SSTs
To use the mariadb-backup SST method, you must set the wsrep_sst_method=mariabackup on both the donor and joiner node. It can be changed dynamically with on the node that you intend to be an SST donor. For example:
It can be set in a server in an prior to starting up a node:
For an SST to work properly, the donor and joiner node must use the same SST method. Therefore, it is recommended to set wsrep_sst_method to the same value on all nodes, since any node will usually be a donor or joiner node at some point.
Major Version Upgrades
The InnoDB redo log format has been changed in and in a way that will not allow the crash recovery or the preparation of a backup from an older major version. Because of this, the mariabackup SST method cannot be used for some major-version upgrades, unless you temporarily edit the wsrep_sst_mariadbbackup script so that the --prepare step on the newer-major-version joiner will be executed using the older-major-version mariadb-backup tool.
The default method wsrep_sst_method=rsync works for major-version upgrades; see MDEV-27437.
Configuration Options
The mariabackup SST method is configured by placing options in the [sst] section of a MariaDB configuration file (e.g., /etc/my.cnf.d/server.cnf). These settings are parsed by the wsrep_sst_mariabackup and wsrep_sst_common scripts.
The command-line utility is mariadb-backup; this tool was previously called mariabackup. The SST method itself retains the original name mariabackup (as in wsrep_sst_method=mariabackup).
Primary Transfer and Format Options
These options control the core data transfer mechanism.
Option
Default Value
Description
streamfmt
mbstream
Specifies the backup streaming format. mbstream is the native format for mariadb-backup.
transferfmt
socat
Defines the network utility for data transfer.
sockopt
A string of socket options passed to the socat utility.
rlimit
Compression Options
These options configure on-the-fly compression to reduce network bandwidth.
Option
Description
compressor
The command-line string for compressing the data stream on the donor (e.g., "lz4 -z").
decompressor
The command-line string for decompressing the data stream on the joiner (e.g., "lz4 -d").
Authentication and Security (TLS)
These options manage user authentication and stream encryption.
Option
Description
wsrep-sst-auth
The authentication string in user:password format. The user requires RELOAD, PROCESS, LOCK TABLES, and REPLICATION CLIENT privileges.
tcert
Path to the TLS certificate file for securing the transfer.
tkey
Path to the TLS private key file.
tca
Path to the TLS Certificate Authority (CA) file.
Logging and Miscellaneous Options
Option
Default Value
Description
progress
Set to 1 to show transfer progress (requires pv utility).
sst-initial-timeout
300
Timeout in seconds for the initial connection.
sst-log-archive
1
Set to 1 to archive the previous SST log.
cpat
Pass-through mariadb-backup Options
This feature allows mariadb-backup specific options to be passed through the SST script.
Option
Default Value
Description
use-extra
0
Must be set to 1 to enable pass-through functionality.
Example: Using Native Encryption and Threading
Authentication and Privileges
To use the mariadb-backup SST method, the utility must be able to authenticate locally on the donor node to create a backup stream. There are two ways to manage this authentication:
Automatic User Account Management (ES 11.4+)
Starting with MariaDB Enterprise Server 11.4, the cluster can automatically manage the SST user account. This method is more secure and requires less configuration because it avoids storing plain-text passwords in configuration files.
When this feature is used:
The donor node automatically creates a temporary internal user (e.g., 'wsrep.sst. <timestamp>_<node_id>'@localhost) with a generated password when the SST process begins.
The necessary privileges (RELOAD, PROCESS, LOCK TABLES, etc.) are automatically granted to this temporary user.
Once the SST process completes, the donor node automatically drops the user.
To enable automatic user management:
Ensure that the wsrep_sst_auth system variable is not set (or is left blank) in your configuration file.
If you explicitly define wsrep_sst_auth in your configuration, the server will revert to the manual behavior and attempt to authenticate using the credentials provided in that variable.
Manual User Configuration
For versions prior to 11.4, or if you prefer to manage the user manually, you must create a user and provide the credentials to the server.
You can tell the donor node what username and password to use by setting the wsrep_sst_auth system variable. It can be changed dynamically with SET GLOBAL on the node that you intend to be an SST donor:
It can also be set in a server in an prior to starting up a node:
Some do not require a password. For example, the unix_socket and gssapi authentication plugins do not require a password. If you are using a user account that does not require a password in order to log in, then you can just leave the password component of wsrep_sst_auth empty. For example:
The user account that performs the backup for the SST needs to have the same privileges as , which are the RELOAD, PROCESS, LOCK TABLES and BINLOG MONITOR, REPLICA MONITOR . To be safe, ensure that these privileges are set on each node in your cluster. mariadb-backup connects locally on the donor node to perform the backup, so the following user should be sufficient:
Passwordless Authentication - Unix Socket
It is possible to use the authentication plugin for the user account that performs SSTs. This would provide the benefit of not needing to configure a plain-text password in wsrep_sst_auth.
The user account would have to have the same name as the operating system user account that is running the mysqld process. On many systems, this is the user account configured as the user option, and it tends to default to mysql.
For example, if the authentication plugin is already installed, then you could execute the following to create the user account:
To configure wsrep_sst_auth, set the following in a server in an prior to starting up a node:
Passwordless Authentication - GSSAPI
It is possible to use the authentication plugin for the user account that performs SSTs. This would provide the benefit of not needing to configure a plain-text password in wsrep_sst_auth.
The following steps would need to be done beforehand:
You will need to containing the authentication plugin.
You will need to in MariaDB, so that the authentication plugin is available to use.
You will need to .
You will need to
For example, you could execute the following to create the user account in MariaDB:
To configure wsrep_sst_auth, set the following in a server in an prior to starting up a node:
Choosing a Donor Node
When mariadb-backup is used to create the backup for the SST on the donor node, mariadb-backup briefly requires a system-wide lock at the end of the backup. This is done with .
If a specific node in your cluster is acting as the primary node by receiving all of the application's write traffic, then this node should not usually be used as the donor node, because the system-wide lock could interfere with the application. In this case, you can define one or more preferred donor nodes by setting the wsrep_sst_donor system variable.
For example, let's say that we have a 5-node cluster with the nodes node1, node2, node3, node4, and node5, and let's say that node1 is acting as the primary node. The preferred donor nodes for node2 could be configured by setting the following in a server in an prior to starting up a node:
The trailing comma tells the server to allow any other node as donor when the preferred donors are not available. Therefore, if node1 is the only node left in the cluster, the trailing comma allows it to be used as the donor node.
Socat Dependency
During the SST process, the donor node uses socat to stream the backup to the joiner node. Then the joiner node prepares the backup before restoring it. The socat utility must be installed on both the donor node and the joiner node in order for this to work. Otherwise, the MariaDB error log will contain an error like:
This SST method supports three different TLS methods. The specific method can be selected by setting the encrypt option in the [sst] section of the MariaDB configuration file. The options are:
TLS using OpenSSL encryption built into socat (encrypt=2)
TLS using OpenSSL encryption with Galera-compatible certificates and keys (encrypt=3)
TLS using OpenSSL encryption with standard MySQL/MariaDB SSL certificates (encrypt=4)
Note that encrypt=1 refers to a TLS encryption method that has been deprecated and removed.
TLS Using OpenSSL Encryption Built into Socat
To generate keys compatible with this encryption method, follow these directions.
First, generate the keys and certificates:
On some systems, you may also have to add dhparams to the certificate:
Next, copy the certificate and keys to all nodes in the cluster.
When done, configure the following on all nodes in the cluster:
Make sure to replace the paths with whatever is relevant on your system. This should allow your SSTs to be encrypted.
TLS Using OpenSSL Encryption With Galera-Compatible Certificates and Keys
To generate keys compatible with this encryption method, follow these directions.
First, generate the keys and certificates:
Next, copy the certificate and keys to all nodes in the cluster.
When done, configure the following on all nodes in the cluster:
Make sure to replace the paths with whatever is relevant on your system. This should allow your SSTs to be encrypted.
Logs
The mariadb-backup SST method has its own logging outside of the MariaDB Server logging.
Logging to SST Logs
Logging for mariadb-backup SSTs works the following way.
By default, on the donor node, it logs to mariadb-backup.backup.log. This log file is located in the .
By default, on the joiner node, it logs to mariadb-backup.prepare.log and mariadb-backup.move.log These log files are also located in the datadir.
By default, before a new SST is started, existing mariadb-backup SST log files are compressed and moved to /tmp/sst_log_archive. This behavior can be disabled by setting sst-log-archive=0 in the [sst] in an . Similarly, the archive directory can be changed by setting sst-log-archive-dir:
Redirect the SST logs to the syslog instead, by setting the following in the [sst] in an :
You can also redirect the SST logs to the syslog by setting the following in the [mysqld_safe] in an :
Performing SSTs With IPv6 Addresses
If you are performing mariadb-backup SSTs with IPv6 addresses, then the socat utility needs to be passed the pf=ip6 option. This can be done by setting the sockopt option in the [sst] in an :
CREATE USER 'mariadbbackup'@'localhost' IDENTIFIED BY 'mypassword';
GRANT RELOAD, PROCESS, LOCK TABLES,
BINLOG MONITOR ON *.* TO 'mariadbbackup'@'localhost';
CREATE USER 'mysql'@'localhost' IDENTIFIED VIA unix_socket;
GRANT RELOAD, PROCESS, LOCK TABLES,
REPLICATION CLIENT ON *.* TO 'mysql'@'localhost';
[mariadb]
...
wsrep_sst_auth = mysql:
CREATE USER 'mariadbbackup'@'localhost' IDENTIFIED VIA gssapi;
GRANT RELOAD, PROCESS, LOCK TABLES,
BINLOG MONITOR ON *.* TO 'mariadbbackup'@'localhost';
[mariadb]
...
wsrep_sst_auth = mariadbbackup:
[mariadb]
...
wsrep_sst_donor=node3,node4,node5,
WSREP_SST: [ERROR] socat not found in path: /usr/sbin:/sbin:/usr//bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin (20180122 14:55:32.993)
The following options can be set as part of the Galera wsrep_provider_options variable. Dynamic options can be changed while the server is running.
Options need to be provided as a semicolon (;) separated list on a single line. Options that are not explicitly set are set to their default value.
Note that before Galera 3, the repl tag was named replicator.
base_dir
Description: Specifies the data directory
base_host
Description: For internal use. Should not be manually set.
Default: 127.0.0.1 (detected network address)
base_port
Description: For internal use. Should not be manually set.
Default: 4567
cert.log_conflicts
Description: Certification failure log details.
Dynamic: Yes
Default: no
cert.optimistic_pa
Description: Controls parallel application of actions on the replica. If set, the full range of parallelization as determined by the certification algorithm is permitted. If not set, the parallel applying window will not exceed that seen on the primary, and applying will start no sooner than after all actions it has seen on the master are committed.
Dynamic: Yes
Default: yes
debug
Description: Enable debugging.
Dynamic: Yes
Default: no
evs.auto_evict
Description: Number of entries the node permits for a given delayed node before triggering the Auto Eviction protocol. An entry is added to a delayed list for each delayed response from a node. If set to 0, the default, the Auto Eviction protocol is disabled for this node. See for more.
Dynamic: No
Default: 0
evs.causal_keepalive_period
Description: Used by the developers only, and not manually serviceable.
Dynamic: No
Default: The .
evs.debug_log_mask
Description: Controls EVS debug logging. Only effective when is on.
Dynamic: Yes
Default: 0x1
evs.delay_margin
Description: Time that response times can be delayed before this node adds an entry to the delayed list. See . Must be set to a higher value than the round-trip delay time between nodes.
Dynamic: No
Default: PT1S
evs.delayed_keep_period
Description: Time that this node requires a previously delayed node to remain responsive before being removed from the delayed list. See .
Dynamic: No
Default: PT30S
evs.evict
Description: When set to the gcomm UUID of a node, that node is evicted from the cluster. When set to an empty string, the eviction list is cleared on the node where it is set. See .
Dynamic: No
Default: Empty string
evs.inactive_check_period
Description: Frequency of checks for peer inactivity (looking for nodes with delayed responses), after which nodes may be added to the delayed list, and later evicted.
Dynamic: No
Default: PT0.5S
evs.inactive_timeout
Description: Time limit that a node can be inactive before being pronounced as dead.
Dynamic: No
Default: PT15S
evs.info_log_mask
Description: Controls extra EVS info logging. Bits:
0x1 – extra view change information
0x2 – extra state change information
evs.install_timeout
Description: Timeout on waits for install message acknowledgments. Replaces evs.consensus_timeout.
Dynamic: Yes
Default: PT7.5S
evs.join_retrans_period
Description: Time period for how often retransmission of EVS join messages when forming cluster membership should occur.
Dynamic: Yes
Default: PT1S
evs.keepalive_period
Description: How often keepalive signals should be transmitted when there's no other traffic.
Dynamic: Yes
Default: PT1S
evs.max_install_timeouts
Description: Number of membership install rounds to attempt before timing out. The total rounds will be this value plus two.
Dynamic: No
Default: 3
evs.send_window
Description: Maximum number of packets that can be replicated at a time, Must be more than , which applies to data packets only (double is recommended). In WAN environments can be set much higher than the default, for example 512.
Dynamic: Yes
Default: 4
evs.stats_report_period
Description: Reporting period for EVS statistics.
Dynamic: No
Default: PT1M
evs.suspect_timeout
Description: A node will be suspected to be dead after this period of inactivity. If all nodes agree, the node is dropped from the cluster before is reached.
Dynamic: No
Default: PT5S
evs.use_aggregate
Description: If set to true (the default), small packets will be aggregated into one where possible.
Dynamic: No
Default: true
evs.user_send_window
Description: Maximum number of data packets that can be replicated at a time. Must be smaller than (half is recommended). In WAN environments can be set much higher than the default, for example 512.
Dynamic: Yes
Default: 2
evs.version
Description: EVS protocol version. Defaults to 0 for backward compatibility. Certain EVS features (e.g. auto eviction) require more recent versions.
Dynamic: No
Default: 0
evs.view_forget_timeout
Description: Time after which past views will be dropped from the view history.
Dynamic: No
Default: P1D
gcache.dir
Description: Directory where GCache files are placed.
Dynamic: No
Default: The working directory
gcache.keep_pages_size
Description: Total size of the page storage pages for caching. One page is always present if only page storage is enabled.
Dynamic: No
Default: 0
gcache.mem_size
Description: Maximum size of size of the malloc() store for setups that have spare RAM.
Dynamic: No
Default: 0
gcache.name
Description: Gcache ring buffer storage file name. By default placed in the working directory, changing to another location or partition can reduce disk IO.
Dynamic: No
Default: ./galera.cache
gcache.page_size
Description: Size of the page storage page files. These are prefixed by gcache.page. Can be set to as large as the disk can handle.
Dynamic: No
Default: 128M
gcache.recover
Description: Whether or not gcache recovery takes place when the node starts up. If it is possible to recover gcache, the node can then provide IST to other joining nodes, which assists when the whole cluster is restarted.
Dynamic: No
Default: no
gcache.size
Description: Gcache ring buffer storage size (the space the node uses for caching write sets), preallocated on startup.
Dynamic: No
Default: 128M
gcomm.thread_prio
fifo or rr real-time scheduling policies requires mariadb service permissions at the OS level.
Description: Gcomm thread policy and priority (in the format policy:priority. Priority is an integer, while policy can be one of:
fifo: First-in, first-out scheduling. Always preempt other, batch or idle threads and can only be preempted by other fifo threads of a higher priority or blocked by an I/O request.
gcs.fc_debug
Description: If set to a value greater than zero (the default), debug statistics about SST flow control will be posted each timegcs.fc_master_slave after the specified number of writesets.
Dynamic: No
Default: 0
gcs.fc_factor
Description:Fraction below which if the recv queue drops below, replication resumes.
Dynamic: Yes
Default: 1.0
gcs.fc_limit
Description: If the recv queue exceeds this many writesets, replication is paused. Can increase greatly in master-slave setups. Replication will resume again according to the setting.
Dynamic: Yes
Default: 16
gcs.fc_master_slave
Description: Whether to assume that the cluster only contains one master. Deprecated since Galera 4.10 (, , , , ) - see
Dynamic: No
Default: no
gcs.fc_single_primary
Description: Defines whether there is more than one source of replication.
As the number of nodes in the cluster grows, the larger the calculated gcs.fc_limit gets. At the same time, the number of writes from the nodes increases.
When this parameter value is set to NO (multi-primary), the gcs.fc_limit parameter is dynamically modified to give more margin for each node to be a bit further behind applying writes.
The gcs.fc_limit parameter is modified by the square root of the cluster size, that is, in a four-node cluster it is two times higher than the base value. This is done to compensate for the increasing replication rate noise.
Dynamic: No
Default: no
gcs.max_packet_size
Description: Maximum packet size, after which writesets become fragmented.
Dynamic: No
Default: 64500
gcs.max_throttle
Description: How much we can throttle replication rate during state transfer (to avoid running out of memory). Set it to 0.0 if stopping replication is acceptable for the sake of completing state transfer.
Dynamic: No
Default: 0.25
gcs.recv_q_hard_limit
Description: Maximum size of the recv queue. If exceeded, the server aborts. Half of available RAM plus swap is a recommended size.
Dynamic: No
Default: LLONG_MAX
gcs.recv_q_soft_limit
Description: Fraction of after which replication rate is throttled. The rate of throttling increases linearly from zero (the regular, varying rate of replication) at and below csrecv_q_soft_limit to one (full throttling) at
Dynamic: No
Default: 0.25
gcs.sync_donor
Description: Whether or not the rest of the cluster should stay in sync with the donor. If set to YES (NO is default), if the donor is blocked by state transfer, the whole cluster is also blocked.
Dynamic: No
Default: no
gmcast.listen_addr
Description: Address Galera listens for connections from other nodes. Can be used to override the default port to listen, which is obtained from the connection address.
Specifying a hostname isn't supported. Use an IP number instead.
Note that supports TCP, SSL, and hostnames.
gmcast.mcast_addr
Description: Not set by default, but if set, UDP multicast will be used for replication. Must be identical on all nodes.For example, gmcast.mcast_addr=239.192.0.11
Dynamic: No
Default: None
gmcast.mcast_ttl
Description: Multicast packet TTL (time to live) value.
Dynamic: No
Default: 1
gmcast.peer_timeout
Description: Connection timeout for initiating message relaying.
Dynamic: No
Default: PT3S
gmcast.segment
Description: Defines the segment to which the node belongs. By default, all nodes are placed in the same segment (0). Usually, you would place all nodes in the same datacenter in the same segment. Galera protocol traffic is only redirected to one node in each segment, and then relayed to other nodes in that same segment, which saves cross-datacenter network traffic at the expense of some extra latency. State transfers are also, preferably but not exclusively, taken from the same segment. If there are no nodes available in the same segment, state transfer will be taken from a node in another segment.
Dynamic: No
Default:
gmcast.time_wait
Description: Waiting time before allowing a peer that was declared outside of the stable view to reconnect.
Dynamic: No
Default: PT5S
gmcast.version
Description: Deprecated option. Gmcast version.
Dynamic: No
Default: 0
ist.recv_addr
Description: Address for listening for Incremental State Transfer.
Dynamic: No
Default::<port+1> from
ist.recv_bind
Description:
Dynamic: No
Default: Empty string
pc.announce_timeout
Description: Period of time for which cluster joining announcements are sent every 1/2 second.
Dynamic: No
Default: PT3S
pc.checksum
Description: For debug purposes, by default false (true in earlier releases), indicates whether to checksum replicated messages on PC level. Safe to turn off.
Dynamic: No
Default: false
pc.ignore_quorum
Description: Whether to ignore quorum calculations, for example when a master splits from several slaves, it will remain in operation if set to true (false is default). Use with care however, as in master-slave setups, slaves will not automatically reconnect to the master if set.
Dynamic: Yes
Default: false
pc.ignore_sb
Description: Whether to permit updates to be processed even in the case of split brain (when a node is disconnected from its remaining peers). Safe in master-slave setups, but could lead to data inconsistency in a multi-master setup.
Dynamic: Yes
Default: false
pc.linger
Description: Time that the PC protocol waits for EVS termination.
Dynamic: No
Default: PT20S
pc.npvo
Description: If set to true (false is default), when there are primary component conficts, the most recent component will override the older.
Dynamic: No
Default: false
pc.recovery
Description: If set to true (the default), the Primary Component state is stored on disk and in the case of a full cluster crash (e.g power outages), automatic recovery is then possible. Subsequent graceful full cluster restarts will require explicit bootstrapping for a new Primary Component.
Dynamic: No
Default: true
pc.version
Description: Deprecated option. PC protocol version.
Dynamic: No
Default: 0
pc.wait_prim
Description: When set to true, the default, the node will wait for a primary component for the period of time specified by . Used to bring up non-primary components and make them primary using .
Dynamic: No
Default: true
pc.wait_prim_timeout
Description: Ttime to wait for a primary component. See .
Dynamic: No
Default: PT30S
pc.weight
Description: Node weight, used for quorum calculation. See the Codership article .
Dynamic: Yes
Default: 1
protonet.backend
Description: Deprecated option. Transport backend to use. Only ASIO is supported currently.
Dynamic: No
Default: asio
protonet.version
Description: Deprecated option. Protonet version.
Dynamic: No
Default: 0
repl.causal_read_timeout
Description: Timeout period for causal reads.
Dynamic: Yes
Default: PT30S
repl.commit_order
Description: Whether or not out-of-order committing is permitted, and under what conditions. By default it is not permitted, but setting this can improve parallel performance.
0 BYPASS: No commit order monitoring is done (useful for measuring the performance penalty).
1 OOOC: Out-of-order committing is permitted for all transactions.
repl.key_format
Description: Format for key replication. Can be one of:
FLAT8 - shorter key with a higher probability of false positives when matching
FLAT16 - longer key with a lower probability of false positives when matching
repl.max_ws_size
Description:
Dynamic:
Default: 2147483647
repl.proto_max
Description:
Dynamic:
Default: 9
socket.checksum
Description: Method used for generating checksum. Note: If Galera 25.2.x and 25.3.x are both being used in the cluster, MariaDB with Galera 25.3.x must be started with wsrep_provider_options='socket.checksum=1' in order to make it backward compatible with Galera v2. Galera wsrep providers other than 25.3.x or 25.2.x are not supported.
Dynamic: No
Default: 2
socket.dynamic
Description: Allow both encrypted and unencrypted connections between nodes. Typically this should be set to false (the default), when set to true encrypted connections will still be preferred, but will fall back to unencrypted connections when encryption is not possible, e.g. not enabled on all nodes yet. Needs to be true on all nodes when wanting to enable or disable encryption via a rolling restart. As this can't be changed at runtime a rolling restart to enable or disable encryption may need three restarts per node in total: one to enable socket.dynamic on each node, one to change the actual encryption settings on each node, and a final round to change socket.dynamic back to false.
socket.recv_buf_size
Description: Size in bytes of the receive buffer used on the network sockets between nodes, passed on to the kernel via the SO_RCVBUF socket option.
Dynamic: No
Default:
socket.send_buf_size
Description: Size in bytes of the send buffer used on the network sockets between nodes, passed on to the kernel via the SO_SNDBUF socket option.
Dynamic: No
Default:: Auto
socket.ssl
Description: Explicitly enables TLS usage by the wsrep Provider.
Dynamic: No
Default: NO
socket.ssl_ca
Description: Path to Certificate Authority (CA) file. Implicitly enables the option.
Dynamic: No
socket.ssl_cert
Description: Path to TLS certificate. Implicitly enables the option.
Dynamic: No
socket.ssl_cipher
Description: TLS cipher to use. Implicitly enables the option. Since defaults to the value of the system variable.
Dynamic: No
Default: system default, before defaults to AES128-SHA.
socket.ssl_compression
Description: Compression to use on TLS connections. Implicitly enables the option.
Dynamic: No
socket.ssl_key
Description: Path to TLS key file. Implicitly enables the option.
Dynamic: No
socket.ssl_password_file
Description: Path to password file to use in TLS connections. Implicitly enables the option.
Dynamic: No
See Also
This page is licensed: CC BY-SA / Gnu FDL
0x4 – statistics
0x8 – profiling (only available in builds with profiling enabled)
Dynamic: No
Default: 0
Introduced: , ,
rr: Round-robin scheduling. Always preempt other, batch or idle threads. Runs for a fixed period of time after which the thread is stopped and moved to the end of the list, being replaced by another round-robin thread with the same priority. Otherwise runs until preempted by other rr threads of a higher priority or blocked by an I/O request.
other: Default scheduling on Linux. Threads run until preempted by a thread of a higher priority or a superior scheduling designation, or blocked by an I/O request.
Permissions: Using the fifo or rr real-time scheduling policies requires granting the mariadb service the necessary permissions at the OS level. On systemd-based distributions, this is done by adjusting the resource limits for the service.
The recommended method is to create a systemd override file:
Open the MariaDB service unit for editing:
Add the following content to the file. This grants the service the ability to set real-time priorities:
Save the file and exit the editor.
Reload the systemd daemon and restart the MariaDB service to apply the changes:
Dynamic: No
Default: Empty string
You can specify the setting using either TCP or SSL, like this:
gmcast.listen_addr=tcp://192.168.8.111:4567
gmcast.listen_addr=ssl://192.168.8.111:4567
If your system supports IPv6, you can also specify it like this:
gmcast.listen_addr=tcp://[::]:@mysqld.1.#4567
Here, @mysqld.1 is an environment variable, and 4567 is the Galera port.
Dynamic: No
Default: tcp://0.0.0.0:4567
0
Range: 0 to 255
Introduced: , ,
2 LOCAL_OOOC: Out-of-order committing is permitted for local transactions only.
3 NO_OOOC: Out-of-order committing is not permitted at all.
Dynamic: No
Default: 3
FLAT8A - shorter key with a higher probability of false positives when matching, includes annotations for debug purposes
FLAT16A - longer key with a lower probability of false positives when matching, includes annotations for debug purposes
Complete Galera Cluster System Variables reference for MariaDB. Complete guide for configuration values, scope settings, and performance impact.
This page documents system variables related to Galera Cluster. For options that are not system variables, see . See for a complete list of system variables and instructions on setting them. Also see the .
wsrep_allowlist
Description:
Allowed IP addresses, comma-delimited.
Note that setting gmcast.listen_addr=tcp://[::]:4567 on a dual-stack system (for instance, Linux with net.ipv6.bindv6only = 0), IPv4 addresses need to be allow-listed using the IPv4-mapped IPv6 address (eg. ::ffff:1.2.3.4).
Description: Maximum number of applier retry attempts. Previously, replication applying always stopped for the first non-ignored failure occurring in event applying, and the node emergency-aborts (or start inconsistency voting). Some failures, however, can be concurrency related, and applying may succeed if the operation is tried at later time. This variable controls the retry-applying feature. It is set to 0 by default, which means no retrying.
Command line: --wsrep-applier-retry-count=value
wsrep_auto_increment_control
Description: If set to 1 (the default), automatically adjusts the and variables according to the size of the cluster, and readjusts them when the cluster size changes. This avoids replication conflicts due to . In a primary-replica environment, can be set to OFF.
Description: If set to ON (OFF is default), enforces characteristics across the cluster. In the case that a primary applies an event more quickly than a replica, the two could briefly be out-of-sync. With this variable set to ON, the replica will wait for the event to be applied before processing further queries. Setting to ON also results in larger read latencies. Deprecated by .
Command line: --wsrep-causal-reads[={0|1}]
wsrep_certificate_expiration_hours_warning
This variable is documented in detail here:
wsrep_certification_rules
Description: Certification rules to use in the cluster. Possible values are:
strict: Stricter rules that could result in more certification failures. For example with foreign keys, certification failure could result if different nodes receive non-conflicting insertions at about the same time that point to the same row in a parent table.
optimized
wsrep_certify_nonPK
Description: When set to ON (the default), Galera still certifies transactions for tables with no . However, this can still cause undefined behavior in some circumstances. It is recommended to define primary keys for every InnoDB table when using Galera.
Command line: --wsrep-certify-nonPK[={0|1}]
Scope: Global
wsrep_cluster_address
Description: The addresses of cluster nodes to connect to when starting up.
Good practice is to specify all possible cluster nodes, in the form gcomm://<node1 or ip:port>,<node2 or ip2:port>,<node3 or ip3:port>.
Specifying an empty ip (gcomm://) will cause the node to start a new cluster (which should not be done in the my.cnf file, as after each restart the server will not rejoin the current cluster).
wsrep_cluster_name
Description: The name of the cluster. Nodes cannot connect to clusters with a different name, so needs to be identical on all nodes in the same cluster. The variable can be set dynamically, but note that doing so may be unsafe and cause an outage, and that the wsrep provider is unloaded and loaded.
Command line: --wsrep-cluster-name=value
Scope: Global
wsrep_convert_LOCK_to_trx
Description: Converts / statements to and . Used mainly for getting older applications to work with a multi-primary setup, use carefully, as can result in extremely large writesets.
Command line: --wsrep-convert-LOCK-to-trx[={0|1}]
Scope: Global
wsrep_data_home_dir
Description: Directory where wsrep provider will store its internal files.
Command line: --wsrep-data-home-dir=value
Scope: Global
wsrep_dbug_option
Description: Unused. The mechanism to pass the DBUG options to the wsrep provider hasn't been implemented.
Command line: --wsrep-dbug-option=value
Scope: Global
wsrep_debug
Description: WSREP debug level logging.
Before MariaDB 10.6.1, DDL logging was only logged on the originating node. From , it is logged on other nodes as well.
Data type is . Valid values are:
wsrep_desync
Description: When a node receives more write-sets than it can apply, the transactions are placed in a received queue. If the node's received queue has too many write-sets waiting to be applied (as defined by the WSREP provider option), then the node would usually engage Flow Control. However, when this option is set to ON, Flow Control will be disabled for the desynced node. The desynced node works through the received queue until it reaches a more manageable size. The desynced node continues to receive write-sets from the other nodes in the cluster. The other nodes in the cluster do not wait for the desynced node to catch up, so the desynced node can fall even further behind the other nodes in the cluster. You can check if a node is desynced by checking if the status variable is equal to Donor/Desynced.
Command line: --wsrep-desync[={0|1}]
wsrep_dirty_reads
Description: By default, when not synchronized with the group (=OFF), a node rejects all queries other than SET and SHOW. If wsrep_dirty_reads is set to 1, queries which do not change data, like SELECT queries (dirty reads), creating of prepare statement, and so forth will be accepted by the node.
wsrep_drupal_282555_workaround
Description: If set to ON, a workaround for is enabled. This is a bug where, in some cases, when inserting a DEFAULT value into an column, a duplicate key error may be returned.
Description: A that overrides any session binlog format settings.
Command line: --wsrep-forced-binlog-format=value
Scope: Global
wsrep_gtid_domain_id
Description: This system variable defines the domain ID that is used for .
When is set to ON, wsrep_gtid_domain_id is used in place of for all Galera Cluster write sets.
wsrep_gtid_mode
Description: attempts to keep consistent for Galera Cluster write sets on all cluster nodes. state is initially copied to a joiner node during an . If you are planning to use Galera Cluster with , then wsrep GTID mode can be helpful.
When wsrep_gtid_mode is set to ON, is used in place of for all Galera Cluster write sets.
wsrep_gtid_seq_no
Description: Internal server usage, manually set WSREP GTID seqno.
Command line: None
Scope: Session only
wsrep_ignore_apply_errors
Description: Bitmask determining whether errors are ignored, or reported back to the provider.
0: No errors are skipped.
1: Ignore some DDL errors (DROP DATABASE, DROP TABLE, DROP INDEX
wsrep_load_data_splitting
Description: If set to ON, supports big data files by introducing transaction splitting. The setting has been deprecated in Galera 4, and defaults to OFF
Command line: --wsrep-load-data-splitting[={0|1}]
wsrep_log_conflicts
Description: If set to ON (OFF is default), details of conflicting MDL as well as InnoDB locks in the cluster will be logged.
Command line: --wsrep-log-conflicts[={0|1}]
Scope: Global
wsrep_max_ws_rows
Description: Maximum permitted number of rows per write set. The support for this variable has been added and in order to be backward compatible. The default value has been changed to 0, which essentially allows write sets to be any size.
Command line: --wsrep-max-ws-rows=#
Scope: Global
wsrep_max_ws_size
Description: Maximum permitted size in bytes per write set. Write sets exceeding 2GB are rejected.
Command line: --wsrep-max-ws-size=#
Scope: Global
wsrep_mode
Description: Turns on WSREP features which are not part of the default behavior.
BINLOG_ROW_FORMAT_ONLY: Only ROW is supported.
wsrep_mysql_replication_bundle
Description: Determines the number of replication events that are grouped together. Experimental implementation aimed to assist with bottlenecks when a single replica faces a large commit time delay. If set to 0 (the default), there is no grouping.
Command line: --wsrep-mysql-replication-bundle=#
Scope: Global
wsrep_node_address
Description: Specifies the node's network address, in the format ip address[:port]. It supports IPv6. The default behavior is for the node to pull the address of the first network interface on the system and the default Galera port. This automatic guessing can be unreliable, particularly in the following cases:
Cloud deployments
Container deployments
wsrep_node_incoming_address
Description: This is the address from which the node listens for client connections. If an address is not specified or it's set to AUTO (default), mysqld uses either or , or tries to get one from the list of available network interfaces, in the same order. See also .
Command line: --wsrep-node-incoming-address=value
wsrep_node_name
Description: Name of this node. This name can be used in as a preferred donor. Note that multiple nodes in a cluster can have the same name.
Command line: --wsrep-node-name=value
Scope: Global
wsrep_notify_cmd
Description: Command to be executed each time the node state or the cluster membership changes. Can be used for raising an alarm, configuring load balancers and so on. See the for more details.
Command line: --wsrep-notify-command=value
Scope: Global
wsrep_on
Description: Whether or not wsrep replication is enabled. If the global value is set to OFF , it is not possible to load the provider and join the node in the cluster. If only the session value is set to OFF, the operations from that particular session are not replicated in the cluster, but other sessions and applier threads will continue as normal. The session value of the variable does not affect the node's membership and thus, regardless of its value, the node keeps receiving updates from other nodes in the cluster. It is set to OFF by default and must be turned on to enable Galera replication.
Command line: --wsrep-on[={0|1}]
wsrep_OSU_method
Description: Online schema upgrade method. The default is TOI, specifying the setting without the optional parameter will set to RSU.
TOI: Total Order Isolation. In each cluster node, DDL is processed in the same order regarding other transactions, guaranteeing data consistency. However, affected parts of the database will be locked for the whole cluster.
wsrep_patch_version
Description: Wsrep patch version, for example wsrep_25.10.
Command line: None
Scope: Global
wsrep_provider
Description: Location of the wsrep library, usually /usr/lib/libgalera_smm.so on Debian and Ubuntu, and /usr/lib64/libgalera_smm.so on Red Hat/CentOS.
Command line: --wsrep-provider=value
Scope: Global
wsrep_provider_options
Description: Semicolon (;) separated list of wsrep options (see ).
Command line: --wsrep-provider-options=value
Scope: Global
More details can be found on this page:
wsrep_recover
Description: If set to ON when the server starts, the server will recover the sequence number of the most recent write set applied by Galera, and it will be output to stderr, which is usually redirected to the . At that point, the server will exit. This sequence number can be provided to the system variable.
Command line: --wsrep-recover[={0|1}]
wsrep_reject_queries
Description: Variable to set to reject queries from client connections, useful for maintenance. The node continues to apply write-sets, but an Error 1047: Unknown command error is generated by a client query.
NONE - Not set. Queries are processed as normal.
ALL
wsrep_replicate_myisam
Description: Whether or not DML updates for tables will be replicated. This functionality is still experimental and should not be relied upon in production systems. Deprecated in , and removed in , use instead.
Command line: --wsrep-replicate-myisam[={0|1}]
Scope: Global
wsrep_restart_slave
Description: If set to ON, the replica is restarted automatically, when node joins back to cluster.
Command line: --wsrep-restart-slave[={0|1}]
Scope: Global
wsrep_retry_autocommit
Description: Number of times autocommited queries are retried due to cluster-wide conflicts before returning an error to the client. If set to 0, no retries are attempted, while a value of 1 (the default) or more specifies the number of retries attempted. Can be useful to assist applications using autocommit to avoid deadlocks.
Reasons for failures include:
Certification failure: If the transaction reached the replication state and observed the conflict by performing a certification test.
wsrep_slave_FK_checks
Description: If set to ON (the default), the applier replica thread performs foreign key constraint checks.
Command line: --wsrep-slave-FK-checks[={0|1}]
Scope: Global
wsrep_slave_threads
Description: Number of replica threads used to apply Galera write sets in parallel. The Galera replica threads are able to determine which write sets are safe to apply in parallel. However, if your cluster nodes seem to have frequent consistency problems, then setting the value to 1 will probably fix the problem. See for more information.
Command line: --wsrep-slave-threads=#
wsrep_slave_UK_checks
Description: If set to ON, the applier replica thread performs secondary index uniqueness checks.
Command line: --wsrep-slave-UK-checks[={0|1}]
Scope: Global
wsrep_sr_store
Description: Storage for streaming replication fragments.
Command line: --wsrep-sr-store=val
Scope: Global
wsrep_ssl_mode
This variable is documented in details on this page:
wsrep_sst_auth
Description: Username and password of the user to use for replication. Unused if is set to rsync, while for other methods it should be in the format <user>:<password>. The contents are masked in logs and when querying the value with . See for more information.
Command line: --wsrep-sst-auth=value
wsrep_sst_donor
Description: Comma-separated list (from 5.5.33) or name (as per ) of the servers as donors, or the source of the state transfer, in order of preference. The donor-selection algorithm, in general, prefers a donor capable of transferring only the missing transactions (IST) to the joiner node, instead of the complete state (SST). Thus, it starts by looking for an IST-capable node in the given donor list followed by rest of the nodes in the cluster. In case multiple candidate nodes are found outside the specified donor list, the node in the same segment () as the joiner is preferred. If none of the existing nodes in the cluster can serve the missing transactions through IST, the algorithm moves on to look for a suitable node to transfer the entire state (SST). It first looks at the nodes specified in the donor list (irrespective of their segment). If no suitable donor is still found, the rest of the donor nodes are checked for suitability only if the donor list has a "terminating-comma". Note that a stateless node (the Galera arbitrator) can never be a donor. See for more information.
Although the variable is dynamic, the node does not use the new value unless the node requiring SST or IST disconnects from the cluster. To force this, set to an empty string and back to the nodes list. After setting this variable dynamically, on startup the value from the configuration file will be used again.
Command line: --wsrep-sst-donor=value
Scope: Global
Dynamic: Yes (read note above)
wsrep_sst_donor_rejects_queries
Description: If set to ON (OFF is default), the donor node will reject incoming queries, returning an UNKNOWN COMMAND error code. Can be used for informing load balancers that a node is unavailable.
Description: Method used for taking the . See for more information.
Command line: --wsrep-sst-method=value
Scope: Global
See this page for more information about this variable:
wsrep_sst_receive_address
Description: This is the address where other nodes (donor) in the cluster connect to in order to send the state-transfer updates. If an address is not specified or its set to AUTO (default), mysqld uses 's value as the receiving address. However, if is not set, it uses address from either or tries to get one from the list of available network interfaces, in the same order. Note: setting it to localhost will make it impossible for nodes running on other hosts to reach this node. See for more information.
Command line: --wsrep-sst-receive-address=value
wsrep_start_position
Description: The start position that the node should use in the format: UUID:seq_no. The proper value to use for this position can be recovered with .
Command line: --wsrep-start-position=value
Scope: Global
wsrep_status_file
Description: wsrep status output filename.
Command line: --wsrep-status-file=value
Scope: Global
wsrep_strict_ddl
Description: If set, reject DDL statements on affected tables not supporting Galera replication. This is done by checking if the table is InnoDB, which is the only table currently fully supporting Galera replication. MyISAM tables will not trigger the error if the experimental setting is ON. If set, should be set on all tables in the cluster. Affected DDL statements include: (e.g. CREATE TABLE t1(a int) engine=Aria)
Statements in , , and are permitted as the affected
tables are only known at execution. Furthermore, the various USER, ROLE, SERVER and
DATABASE statements are also allowed as they do not have an affected table. Deprecated in and removed in . Use instead.
Command line:
wsrep_sync_wait
Description: Setting this variable ensures causality checks will take place before executing an operation of the type specified by the value, ensuring that the statement is executed on a fully synced node. While the check is taking place, new queries are blocked on the node to allow the server to catch up with all updates made in the cluster up to the point where the check was begun. Once reached, the original query is executed on the node. This can result in higher latency. Note that when is ON, values of wsrep_sync_wait become irrelevant. Sample usage (for a critical read that must have the most up-to-date data) SET SESSION wsrep_sync_wait=1; SELECT ...; SET SESSION wsrep_sync_wait=0;
0 - Disabled (default)
wsrep_trx_fragment_size
Description: Size of transaction fragments for streaming replication (measured in units as specified by )
Command line: --wsrep-trx-fragment-size=#
Scope: Session
wsrep_trx_fragment_unit
Description: Unit for streaming replication transaction fragments' size:
bytes: transaction’s binlog events buffer size in bytes
rows: number of rows affected by the transaction
This page is licensed: CC BY-SA / Gnu FDL
Scope: Global
Dynamic: Yes
Data Type: INT UNSIGNED
Default Value: 0
Range: 0 to 4294967295
Introduced:
Scope: Global
Dynamic: Yes
Data Type: Boolean
Default Value: ON
Scope: Session
Dynamic: Yes
Data Type: Boolean
Default Value: OFF
Removed:
: Relaxed rules that allow more concurrency and cause less certification failures.
Command line: --wsrep-certifcation-rules
Scope: Global
Dynamic: Yes
Data Type: Enumeration
Default Value: strict
Valid Values: strict, optimized
Dynamic: Yes
Data Type: Boolean
Default Value: ON
The variable can be changed at runtime in some configurations, and will result in the node closing the connection to any current cluster, and connecting to the new address.
If specifying a port, note that this is the Galera port, not the MariaDB port.
Valid Values: STATEMENT, ROW, MIXED or NONE (which resets the forced binlog format state).
When is set to OFF, wsrep_gtid_domain_id is simply ignored to allow for backward compatibility.
There are some additional requirements that need to be met in order for this mode to generate consistent . For more information, see .
Command line: --wsrep-gtid-domain-id=#
Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 0
Range: 0 to 4294967295
When wsrep_gtid_mode is set to OFF, is simply ignored to allow for backward compatibility.
There are some additional requirements that need to be met in order for this mode to generate consistent . For more information, see .
Command line: --wsrep-gtid-mode[={0|1}]
Scope: Global
Dynamic: Yes
Data Type: boolean
Default Value: OFF
Dynamic: Yes
Data Type: numeric
Range: 0 to 18446744073709551615
Introduced:
,
ALTER TABLE
).
2: Skip DML errors (Only ignores DELETE errors).
4: Ignore all DDL errors.
Command line: --wsrep-ignore-apply-errors
Scope: Global
Dynamic: Yes
Data Type: Numeric
Default Value: 7
Range: 0 to 7
Scope: Global
Dynamic: Yes
Data Type: Boolean
Default Value: OFF
Deprecated: MariaDB 10.4.2
Removed:
Dynamic: Yes
Data Type: Boolean
Default Value: OFF
Dynamic: Yes
Data Type: Numeric
Default Value: 0
Range: 0 to 1048576
Dynamic: Yes
Data Type: Numeric
Default Value: 2147483647 (2GB)
Range: 1024 to 2147483647
DISALLOW_LOCAL_GTID: Nodes can have GTIDs for local transactions in a number of scenarios. If DISALLOW_LOCAL_GTID is set, these operations produce an error ERROR HY000: Galera replication not supported. Scenarios include:
A DDL statement is executed with wsrep_OSU_method=RSU set.
A DML statement writes to a non-InnoDB table.
A DML statement writes to an InnoDB table with wsrep_on=OFF set.
REPLICATE_ARIA: Whether or not DML updates for tables will be replicated. This functionality is experimental and should not be relied upon in production systems.
REPLICATE_MYISAM: Whether or not DML updates for a tables will be replicated. This functionality is experimental and should not be relied upon in production systems.
REQUIRED_PRIMARY_KEY: Table should have PRIMARY KEY defined.
STRICT_REPLICATION: Same as the old setting.
APPLIER_SKIP_FK_CHECKS_IN_IST: When this operation mode is set, and the node is processing IST or catch-up, appliers skip FK checking. See .
This flag is available from MariaDB 12.0.
Command line: --wsrep-mode=value
Scope: Global
Dynamic: Yes
Data Type: Enumeration
Default Value: (Empty)
Valid Values: APPLIER_SKIP_FK_CHECKS_IN_IST, BINLOG_ROW_FORMAT_ONLY, DISALLOW_LOCAL_GTID, REQUIRED_PRIMARY_KEY, REPLICATE_ARIA, REPLICATE_MYISAM and STRICT_REPLICATION
Introduced:
Dynamic: No
Data Type: Numeric
Default Value: 0
Range: 0 to 1000
Servers with multiple network interfaces
Servers running multiple nodes
Network address translation (NAT)
Clusters with nodes in more than one region
See also
Command line: --wsrep-node-address=value
Scope: Global
Dynamic: No
Data Type: String
Default Value: Primary network address, usually eth0 with a default port of 4567, or 0.0.0.0 if no IP address.
Scope: Global
Dynamic: No
Data Type: String
Default Value: AUTO
Dynamic: Yes
Data Type: String
Default Value: The server's hostname.
Dynamic: No
Data Type: String
Default Value: Empty
Scope: Global, Session
Dynamic: Yes
Data Type: Boolean
Default Value: OFF
Valid Values: ON, OFF
RSU: Rolling Schema Upgrade. DDL processing is only done locally on the node, and the user needs perform the changes manually on each node. The node is desynced from the rest of the cluster while the processing takes place to avoid the blocking other nodes. Schema changes to avoid breaking replication when the DDL processing is complete on the single node, and replication recommences.
Command line: --wsrep-OSU-method[=value]
Scope: Global, Session
Dynamic: Yes
Data Type: Enum
Default Value: TOI
Valid Values: TOI, RSU
Dynamic: No
Data Type: String
Default Value: None
Data Type: String
Default Value: None
Dynamic: No
Data Type: String
Default Value: Empty
Scope: Global
Dynamic: No
Data Type: Boolean
Default Value: OFF
- All queries from client connections will be rejected, but existing client connections are maintained.
ALL_KILL All queries from client connections will be rejected, and existing client connections, including the current one, are immediately killed.
Command line: --wsrep-reject-queries[=value]
Scope: Global
Dynamic: Yes
Data Type: Enum
Default Value: NONE
Valid Values: NONE, ALL, ALL_KILL
Dynamic: Yes
Default Value: OFF
Data Type: Boolean
Valid Values: ON, OFF
Deprecated:
Removed:
Dynamic: Yes
Default Value: OFF
Data Type: Boolean
High-priority abort: If the execution of the transaction was interrupted by the replication applier before entering the replication state.