Distributed SQL Availability

Overview

The following availability details pertain to SkySQL's MariaDB Platform for Distributed SQL (Distributed Transactions).

MariaDB Platform for Distributed SQL offers read and write scaling, High Availability (HA), and fault tolerance for transactional workloads.

Distributed SQL with MariaDB Xpand

MariaDB Platform for Distributed SQL uses MariaDB Xpand, which provides high availability and fault tolerance:

  • Uses the Paxos protocol to resolve distributed transactions

  • Uses synchronous replication to provide strong consistency

  • Synchronously writes multiple redundant copies (replicas) of data to multiple nodes for fault tolerance

  • Tolerates a single node or single zone failure

  • Uses a rebalancer process to automatically maintain intended replica count in the background

Automatic Failover with MariaDB MaxScale

MariaDB Platform for Distributed SQL uses MariaDB MaxScale for automatic failover.

MaxScale's Xpand Monitor (xpandmon) provides automatic failover:

  • MaxScale's Xpand Monitor continuously checks the status of all nodes.

  • If a node fails, MaxScale automatically stops routing queries to it.

  • If a node is added, MaxScale automatically starts routing queries to it.

SkySQL also provides failover for MaxScale itself:

  • If the MaxScale instance fails, it is restarted or replaced, depending on the specific issue.

  • If one an instance fails, it is restarted or replaced, depending on the specific issue.

Redundant MaxScale instances are also supported as an optional configuration option. For additional information, see "MaxScale Redundancy".

By default, the Xpand topology deploys MaxScale to handle load balancing and failover. Optionally, MaxScale can be omitted from the service on request. Omitting MaxScale enables direct connections to Xpand nodes, but removes features provided by MaxScale, such as load balancing and failover.

Amazon Infrastructure

MariaDB SkySQL services on AWS rely on Elastic Kubernetes Service (EKS), which is a component of Amazon Web Services (AWS). MariaDB SkySQL inherits many availability features from EKS and AWS:

  • The resiliency features of EKS.

  • The auto-healing functionality of Kubernetes.

  • Amazon's goal to have 99.95% up-time, as mentioned in the SLA for EKS.

Google Infrastructure

MariaDB SkySQL services on GCP rely on Google Kubernetes Engine (GKE), which is a component of Google Cloud Platform (GCP). MariaDB SkySQL inherits many availability features from GKE and GCP:

  • The resiliency of regional GKE Kubernetes clusters which include multiple zones within a region.

  • The auto-healing functionality of Kubernetes.

  • Google's goal to have 99.5% up-time, as mentioned in the SLA for GKE.

Powered by Kubernetes

MariaDB SkySQL services run in containers powered by Kubernetes. MariaDB SkySQL inherits many availability features from Kubernetes' self-healing functionality:

  • Failed containers are automatically restarted.

  • Unhealthy containers are automatically killed.

  • Dead containers are automatically replaced.

  • All of this happens in a way that is transparent to the user.

  • If the MaxScale instance fails it is restarted or replaced, depending on the specific issue.

  • If one of the MariaDB Xpand instances fails, it is restarted or replaced, depending on the specific issue.

Production Readiness

MariaDB Platform for Distributed SQL is designed for resiliency. It is built on MariaDB Xpand, which provides distributed SQL with High Availability (HA) and fault tolerance.

Multi-Zonal

MariaDB Platform for Distributed SQL provides multi-zonal availability by default.

MariaDB Xpand supports multi-zonal availability across 3 or more zones. MariaDB Platform for Distributed SQL on SkySQL leverages this capability by automatically distributing the Xpand nodes evenly across 3 zones within the same region.

Xpand can tolerate a single zone failure by default. If a zone fails, Xpand's rebalancer process automatically creates new redundant copies of data to maintain fault tolerance.

The Xpand default is for multi-zonal availability. Optionally, Xpand can be deployed in a single zone on request. This improves the communication latency between Xpand nodes, while limiting fault tolerance.

Self-Healing with Data Protection

MariaDB Platform for Distributed SQL inherits the self-healing functionality from Kubernetes and extends these capabilities with fault tolerant data protection:

  • If the MariaDB Platform for Distributed SQL instance fails, it is restarted or replaced, depending on the specific issue.

  • Upon node failure, data is rebalanced from remaining nodes, automatically healing the Kubernetes cluster's data protection without intervention.

  • The failed node is replaced by Kubernetes auto-healing, and the replacement node is picked-up by the rebalancer, restoring the Kubernetes cluster to its intended node count.

MariaDB Platform for Distributed SQL is fault tolerant by design. By default, two replicas are maintained of all data by a rebalancer process that runs in the background. An Xpand deployment in MariaDB SkySQL can suffer a single node failure without data loss.