Architecture of MariaDB Xpand

Overview

MariaDB Xpand provides ACID-compliant distributed SQL, high availability, fault tolerance, write scaling, and horizontal scale-out for transactional workloads.

Distributed SQL databases consist of multiple database nodes working together with each node storing and querying a subset of the data.

MariaDB Xpand is a distributed SQL database that scales efficiently for modern massive-workload web applications that require strong consistency and data integrity. MariaDB Xpand achieves strong consistency through synchronous writes to replicas. MariaDB Xpand is designed for large data-set transactional workloads that require high performance, such as online transactional processing (OLTP).

This page describes the architecture of MariaDB Xpand and the role of each component.

Benefits

  • ACID-compliant distributed SQL for modern web applications with massive workloads that require high performance and strong consistency, such as online transactional processing (OLTP)

  • Elastic scale-out and scale-in to adapt for changes in workload or capacity requirements

  • High Availability (HA) and fault tolerance

  • Sophisticated data distribution partitions Xpand horizontally into slices, and distributes multiple copies of the slices among nodes as replicas which are managed automatically for data protection and load rebalancing

  • Read and write scaling, with a shared-nothing architecture

  • Parallel query evaluation for efficient execution of complex queries such as JOINs and aggregate operations

  • Columnar indexes can be used in Xpand 6:

    • Columnar indexes are best suited for queries that "scan" much or all of a table. This includes queries that perform aggregations on large amounts of data, queries that select a subset of columns from large tables, and queries that perform a full table scan of a fact table. Queries tend to match this pattern in analytical, online-analytical processing (OLAP), and data warehousing workloads.

    • With composite (multi-column) Columnar indexes, filtration is not dependent on column order. If a Columnar index is defined with COLUMNAR INDEX (a, b, c), and the query filters with WHERE b = 1 AND c = 2, the Columnar index can be used to filter the results.

Topologies

MariaDB Xpand supports multiple topologies. MariaDB products can be deployed in many different topologies. The topology on this page is representative. MariaDB products can be deployed to form other topologies, leverage advanced product capabilities, or combine the capabilities of multiple topologies.

Xpand Topology

The Xpand Topology consists of:

  • One or more MaxScale nodes

  • Three or more Xpand nodes

The MaxScale nodes:

  • Monitor the health and availability of each Xpand node using the Xpand Monitor (xpandmon)

  • Accept client and application connections

  • Route queries to Xpand nodes using the Read Connection (readconnroute) or Read/Write Split (readwritesplit) routers.

The Xpand nodes:

  • Receive queries from MaxScale

  • Store data in a distributed manner

  • Execute queries using parallel query evaluation

The Xpand Topology can also be deployed in a multi-zone environment:

  • Three or more zones

  • Each zone should be connected by low latency network connections

  • Each zone must have the same number of Xpand nodes

Distributed SQL

MariaDB Xpand provides distributed SQL capabilities that support modern massive-workload web applications which require strong consistency and data integrity.

A distributed SQL database consists of multiple database nodes working together, with each node storing and querying a subset of the data. Strong consistency is achieved by synchronously updating each copy of a row.

MariaDB Xpand uses a shared-nothing architecture. In a shared-nothing distributed computing architecture nodes do not share the same memory or storage. Xpand's shared-nothing architecture provides important benefits, including read and write scaling, and storage scalability.

Data Distribution

This documentation explains how Xpand distributes data sets across a large number of independent nodes, as well as provides reasoning behind some of our architectural decisions.

For additional information, see "Data Distribution with MariaDB Xpand".

Consistency, Fault Tolerance, and Availability

Xpand provides a consistency model that can scale using a combination of intelligent data distribution, multi-version concurrency control (MVCC), and Paxos. Our approach enables Xpand to scale writes, scale reads in the presence of write workloads, and provide strong ACID semantics.

Evaluation Model

Xpand uses parallel query evaluation for simple queries and Massively Parallel Processing (MPP) for analytic queries (similar to columnar stores).

Concurrency Control

Xpand uses a combination of Multi-Version Concurrency Control (MVCC) and 2 Phase Locking (2PL) to support mixed read-write workloads. In our system, readers enjoy lock-free snapshot isolation while writers use 2PL to manage conflict. The combination of concurrency controls means that readers never interfere with writers (or vice-versa), and writers use explicit locking to order updates.

Rebalancer

The Rebalancer is an automated system for maintaining a healthy distribution of data in the cluster. It's the Rebalancer's job to respond to an "unhealthy" cluster by modifying the distribution and placement of data. The Rebalancer is an online process that effects changes to the cluster with minimal interruption to user operations. It relieves the database administrator from the burden of manually manipulating data placement.

For additional information, see "MariaDB Xpand Rebalancer".

Query Optimizer

At the core of Xpand Query Optimizer is the ability to execute one query with maximum parallelism and many simultaneous queries with maximum concurrency. This is achieved via a distributed query planner and compiler and a distributed shared-nothing execution engine.

For additional information, see "Query Optimizer for MariaDB Xpand".