HA (Primary/Replica) Availability

Overview

The following availability details pertain to SkySQL's MariaDB Platform for Transactions in a HA (Primary/Replica) topology.

MariaDB Platform for Transactions in a HA (Primary/Replica) topology offers horizontal scaling for transactional workloads.

High Availability with MariaDB Replication

MariaDB Platform for Transactions in a HA (Primary/Replica) topology uses MariaDB Replication with MariaDB Enterprise Server, which provides high availability:

  • By default, uses the InnoDB storage engine to store data

  • Replicates data and schema changes between nodes using asynchronous or semi-synchronous replication

  • When the primary node fails, failover occurs automatically

Automatic Failover with MariaDB MaxScale

MariaDB Platform for Transactions in a HA (Primary/Replica) topology uses MariaDB MaxScale for automatic failover.

MaxScale's MariaDB Monitor (mariadbmon) provides automatic failover:

  • MaxScale's MariaDB Monitor continuously checks the status of all primaries and replicas.

  • If the primary node fails, MaxScale promotes the most up-to-date replica to be the new primary.

  • All existing replicas will be redirected to use the newly promoted primary.

SkySQL also provides failover for MaxScale itself:

  • If the MaxScale instance fails, it is restarted or replaced, depending on the specific issue.

  • If one an instance fails, it is restarted or replaced, depending on the specific issue.

Redundant MaxScale instances are also supported as optional configuration option.

For additional information, see "MaxScale Redundancy".

Amazon Infrastructure

MariaDB SkySQL services on AWS rely on Elastic Kubernetes Service (EKS), which is a component of Amazon Web Services (AWS). MariaDB SkySQL inherits many availability features from EKS and AWS:

  • The resiliency features of EKS.

  • The auto-healing functionality of Kubernetes.

  • Amazon's goal to have 99.95% up-time, as mentioned in the SLA for EKS.

Google Infrastructure

MariaDB SkySQL services on GCP rely on Google Kubernetes Engine (GKE), which is a component of Google Cloud Platform (GCP). MariaDB SkySQL inherits many availability features from GKE and GCP:

  • The resiliency of regional GKE Kubernetes clusters which include multiple zones within a region.

  • The auto-healing functionality of Kubernetes.

  • Google's goal to have 99.5% up-time, as mentioned in the SLA for GKE.

Powered by Kubernetes

MariaDB SkySQL services run in containers powered by Kubernetes. MariaDB SkySQL inherits many availability features from Kubernetes' self-healing functionality:

  • Failed containers are automatically restarted.

  • Unhealthy containers are automatically killed.

  • Dead containers are automatically replaced.

  • All of this happens in a way that is transparent to the user.

  • If the MaxScale instance fails it is restarted or replaced, depending on the specific issue.

  • If one of the MariaDB Enterprise Server instances fails, it is restarted or replaced, depending on the specific issue.

Production Readiness

MariaDB Platform for Transactions in a HA (Primary/Replica) topology is designed for resiliency. It is built on MariaDB Enterprise Server with MariaDB Replication, which provides asynchronous or semi-synchronous replication, High Availability (HA), and automatic failover.

Multi-Zonal Replicas

MariaDB Platform for Transactions in a HA (Primary/Replica) topology provide multi-zonal availability by default. The primary node is automatically deployed in a different zone within the same region from the replica nodes.

SkySQL takes a lot of precautions to ensure service availability upon a zone failure:

  • If the failed zone contains the primary node, the service will temporarily be read-only (for an average of 1 minute) until the service completes auto-failover to the new primary node. The service will run with fewer replica nodes until the failed zone recovers. There is a chance that the most recent transactions applied on the old primary node could be lost, if they were not replicated to the new primary node before the failure occurred.

  • If the failed zone contains one or more replica nodes, the service will not be interrupted. However, the service will run with fewer replica nodes until the failed zone recovers.

  • If failed zone contains a non-redundant MaxScale instance, the service will temporarily be unavailable (for an average of 15 minutes) until the service rebuilds the MaxScale instance.

Cross-Region Replicas

MariaDB Platform for Transactions in a HA (Primary/Replica) topology support cross-region replicas for Disaster Recovery (DR).

For additional information, see "Cross-Region Replicas".