Architectural Decision Guide: Choosing a Replication Strategy

Choosing the right replication strategy in MariaDB is less about the underlying technology itself and more about which specific "pain point" your architecture needs to solve: preventing downtime, accelerating slow reads, securing data across distances, or aggregating data for analytics.

This guide helps you determine which replication method and format to choose for your cluster, and what trade-offs you will make.

Choosing a Replication Format

Before choosing a cluster topology, you must establish how the data changes are recorded in the binary log. MariaDB offers three primary replication formats:

  • Row-Based Replication (RBR) [Recommended]: This format replicates the actual row changes rather than the SQL statements. If your goal is "carrier-grade" reliability and absolute data consistency across replicas, RBR is the required choice. While it can consume more memory and network bandwidth, this overhead is negligible for typical OLTP workloads.

  • Mixed Format [Default]: MariaDB makes a best-effort approach, defaulting to Statement-Based Replication but automatically switching to Row-Based Replication when it detects a statement that is not safe to replicate consistently.

  • Statement-Based Replication (SBR) [Fallback Only]: SBR replicates the exact SQL queries executed. While it uses very little CPU and network bandwidth, SBR is generally considered unreliable for modern high-availability setups due to the high risk of data inconsistency. It should only be used as a fallback solution in edge cases where RBR fails (e.g., executing a single query that updates a massive number of rows at once, which would otherwise bloat the binary log).

Choosing a Strategy by Use Case

Once your foundation is set, choose a topology based on your primary architectural goal.

circle-info

You need zero data loss and automatic failover. If one server dies, another takes over instantly without impacting the application.

  • Galera Cluster (Virtually Synchronous): The "gold standard" for local high availability. It is a synchronous, multi-primary solution where every node has the exact same data at the same time.

    • Trade-off: Write latency is dictated by the slowest node, as all active nodes must acknowledge the transaction.

  • MariaDB Advanced Clusterarrow-up-right (Quorum/Raft): (Enterprise Technical Preview) Designed for environments that need HA but cannot afford Galera's write latency. It requires acknowledgment from only a majority (quorum) of nodes to commit a write, effectively ignoring network lag from the slowest data centers.

  • Semisynchronous Replication: A middle ground where the primary server waits for at least one replica to acknowledge receipt of the data before confirming a "success" to the client. It prevents data loss during a crash without the full performance overhead of Galera.

The Role of MariaDB MaxScale

Managing replication topologies—especially handling automatic failover and routing in Asynchronous or Semisynchronous environments—can be highly complex.

This is where MariaDB MaxScale is typically introduced. While not a replication engine itself, MaxScale is an advanced database proxy that sits between your application and your database cluster.

  • Read/Write Splitting: It acts as a traffic cop, automatically routing SELECT queries to your replicas and INSERT/UPDATE queries to your primary.

  • Automated Failover: If a primary node fails, MaxScale can automatically promote a replica and reroute traffic without application downtime.

circle-info

MaxScale is a commercial MariaDB Enterprise product, which should be factored into your architectural decision-making. For technical implementation details, refer to the MaxScale documentation.

Quick Comparison Decision Matrix

Architectural Goal
Recommended Solution
Consistency Type
Key Trade-off / Benefit

No Downtime (Local HA)

Galera Cluster

Synchronous

Guarantees zero data loss, but slower writes.

No Downtime (Geo HA)

MariaDB Advanced Cluster

Quorum (Raft)

Faster writes across WAN, but Enterprise-only.

Read Scaling

Primary-Replica + MaxScale

Asynchronous

Maximum performance, but risks replication lag.

Disaster Recovery (WAN)

Hybrid Replication

Sync (Local) / Async (Remote)

Safely bridges data centers, but setup is complex.

Reporting / BI

Multi-Source Replication

Asynchronous

Safely aggregates data without impacting production.

Human Error Recovery

Delayed Replication

Asynchronous (Delayed)

Saves against accidental DROP TABLEexecutions.

Last updated

Was this helpful?