Galera Use Cases
MariaDB Galera Cluster ensures high availability and disaster recovery through synchronous multi-master replication. It's ideal for active-active setups, providing strong consistency and automatic failover, perfect for critical applications needing continuous uptime.
To understand these use cases, it helps to see how Galera's core features are related:
High Availability (HA) for Mission-Critical Applications
Galera's core strength is its synchronous replication, ensuring that data is written to all nodes simultaneously. This makes it ideal for applications where data loss is unacceptable and downtime must be minimal.
Examples
Financial Trading Platforms: These systems demand immediate data consistency across all read and write operations.
E-commerce and Online Retail: Ensures immediate consistency in inventory levels, shopping carts, and order statuses.
Billing and CRM Systems: Applications where customer data must be continuously available and instantly up-to-date, 24/7.
This diagram shows how a proxy like MaxScale handles a node failure. The application is shielded from the downtime, and traffic is automatically rerouted to the healthy nodes.
How It Really Works: The "Synchronous" Nuance
When you hear "synchronous," it doesn't mean every node writes to disk at the exact same millisecond. The process is more elegant:
A client sends a
COMMITto one node (e.g., Node A).Node A packages the transaction and replicates it to Node B and Node C.
Node B and Node C check the transaction for conflicts (called certification) and signal "OK" back to Node A.
Only after Node A gets an "OK" from all other nodes does it tell the client, "Your transaction is committed."
All nodes then apply the write.
As a result, the data is "safe" on all nodes before the application is ever told the write was successful.
In-Depth Use Case: E-commerce Inventory Control
You have one "Super-Widget" left in stock. Two customers, accessing different nodes, click "Buy" simultaneously.
Without Galera (Traditional Replication):
You risk selling the widget twice due to replication lag.
With Galera Cluster:
Both "buy" transactions (UPDATE inventory SET stock=0...) are sent for cluster certification. The cluster instantly detects the conflict:
One transaction "wins" certification and commits.
The other transaction fails certification and gets a "deadlock" error.
Result: Data integrity is fully maintained.
Always Use a Proxy. Your application shouldn't know about individual nodes. Place a cluster-aware proxy like MariaDB MaxScale in front of your cluster.
Design for 3 (or 5). A Galera Cluster needs a minimum of three nodes (or any odd number) to maintain quorum—the ability to have a "majority vote" and avoid a "split-brain" scenario.
The Trade-Offs
Latency: The synchronous check adds a small amount of latency to every
COMMIT.Application Deadlocks: Your application must be built to handle "deadlock" errors by retrying the transaction.
Zero-Downtime Maintenance and Upgrades
Galera allows for a rolling restart of the cluster members. By taking one node down at a time, performing maintenance, and bringing it back up, the cluster remains operational.
Examples:
Continuous Operations Environments: Organizations with strict SLAs that prohibit maintenance windows.
Database Scaling and Infrastructure Changes: Adding or removing cluster nodes (scaling out or in) without interrupting service.
This flowchart shows the "rolling" process for a 3-node cluster.
How It Really Works: The "Graceful" Maintenance Process
MariaDB Maintenance Process
Isolate the Node Configure your proxy (e.g., MaxScale) to stop routing new connections to the targeted node for maintenance.
Perform Maintenance Safely stop the MariaDB service by executing
systemctl stop mariadb. Proceed with applying OS patches or upgrading MariaDB binaries.Restart & Resync Upon restarting MariaDB, it will automatically synchronize with the cluster. The Incremental State Transfer (IST) ensures only the missed changes are applied.
Rejoin After syncing, enable the node again in the proxy.
Repeat Apply these steps to other nodes individually.
Reduced Capacity
While one node is down for maintenance, your 3-node cluster is temporarily running as a 2-node cluster. It's wise to perform maintenance during low-traffic periods.
IST vs. SST
The fast, automatic sync is IST (Incremental State Transfer). If a node is down for too long, it may trigger a State Snapshot Transfer (SST)—a full copy of the entire database. SSTs are resource-intensive.
Disaster Recovery and Geo-Redundancy
Galera can be deployed across multiple physical locations, providing a robust solution for disaster recovery by surviving the complete loss of one site.
Examples:
Multi-Data Center Deployment: Deploying a cluster across three or more geographically separated data centers.
Disaster Recovery Setup: Deploying one cluster in a data center using asynchronous replication to a second cluster in a separate data center.
In-Depth Look: Two Different DR Patterns
This use case covers two distinct architectures with different goals:
This is a single Galera cluster with nodes stretched across multiple data centers. A COMMIT in New York is not "OK'd" until the data is safely certified by the London node. This gives Zero Data Loss (RPO=0) but has a major performance impact.
This is the more common setup. A primary cluster in DC-1 runs at full speed. It asynchronously replicates its data to a separate node/cluster in DC-2. This is fast, but allows for minimal data loss (RPO > 0) in a disaster.
Choosing the right DR pattern
Primary Goal
100% Data Consistency
Primary Site Performance
Data Loss (RPO)
Zero (RPO=0)
Seconds to Minutes (RPO > 0)
Performance Impact
Very High. All writes are as slow as the RTT to the farthest data center.
None. Primary cluster runs at local network speed.
Best For
Financials or other applications where data loss is impossible to tolerate.
Most businesses that can tolerate a few seconds of data loss in a major disaster.
Scaling Out Write Workloads (Limited)
While synchronous replication adds some overhead, Galera fundamentally allows any node to accept write queries. This is best combined with a proxy like MaxScale to intelligently distribute traffic.
Examples:
Load Balanced Read/Write Traffic: Using MaxScale's Read/Write Split Router to direct reads to any node and writes to a single "Primary" node.
High-Volume Write Environments: Suitable for applications with a high volume of concurrent, non-conflictingwrite operations.
Myth vs. Reality: Write Throughput in Distributed Systems
Myth: "With 3 nodes, I achieve 3x the write throughput."
Reality: False. Every write must be processed by all three nodes.
Nuance: Enjoy excellent read-scaling. Write scaling is only possible if writes are non-contended (not targeting the same rows).
In-Depth Use Case: The "Read-Write Split" Strategy (Recommended)
This is the most common and recommended architecture. MaxScale's readwritesplit router automatically designates one node as the "Primary" (for writes) and load-balances reads across the others. If the Primary node fails, MaxScale automatically promotes a new one.
How it Works
The application (or proxy) sends writes to allnodes in the cluster.
A proxy (MaxScale) designates one node as "Primary" and sends 100% of writes to it.
Pros
Fully utilizes all nodes for writes; no single point of failure for write ingress.
No application deadlocks. Zero certification failures. Simple for the application.
Cons
High risk of deadlocks. If two clients update the same row on different nodes, one fails.
Write throughput is limited to what a single node can handle.
Best For
Very specific applications that are 100% guaranteed to have no write conflicts.
99% of all applications. You get full read-scaling and automatic HA, without the application complexity.
Read-Write Split Strategy
For most applications, using readwritesplit is the safest, most reliable, and effective strategy.
Keep Transactions Small: Large
UPDATEoperations on a single node can stall the entire cluster during the certification/commit phase.Trade-Off:
readwritesplitis not sharding. Galera focuses on high availability rather than infinite write-scaling. If your application demands more writes than a single powerful server can handle, consider implementing a sharded solution.
See Also
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?

