MariaDB Enterprise ColumnStore Architecture

Overview

MariaDB Enterprise ColumnStore provides a highly available, fault tolerant, and performant columnar storage engine for MariaDB Enterprise Server. MariaDB Enterprise ColumnStore is designed for data warehousing, decision support systems (DSS), online analytical processing (OLAP), and hybrid transactional-analytical processing (HTAP).

Benefits

  • Columnar storage engine that enables MariaDB Enterprise Server to perform new workloads

  • Optimized for online analytical process (OLAP) workloads including data warehousing, decision support systems, and business intelligence

  • Single-stack solution for hybrid transactional-analytical workloads to eliminate barriers and prevent data silos

  • Implements cross-engine JOINs to join Enterprise ColumnStore tables with tables using row-based storage engines, such as InnoDB

  • Smart storage engine that plans and optimizes its own queries using a custom select handler

  • Scalable query execution using massively parallel processing (MPP) strategies, parallel query execution, and distributed function evaluation

  • S3-compatible object storage can be used for highly available, low-cost, multi-regional, resilient, scalable, secure, and virtually limitless data storage

  • High availability and automatic failover by leveraging MariaDB MaxScale

  • REST API for multi-node administration with the Cluster Management API (CMAPI) server

  • Connectors for popular BI platforms such as Microsoft Power BI and Tableau

  • High-speed bulk data loading that bypasses the SQL layer and does not block concurrent read-only queries

Topologies

MariaDB Enterprise ColumnStore supports multiple topologies. Several options are described below. MariaDB Enterprise ColumnStore can be deployed in other topologies. The topologies on this page are representative of basic product capabilities.

MariaDB products can be deployed to form other topologies that leverage advanced product capabilities and combine the capabilities of multiple topologies.

Enterprise ColumnStore with Object Storage

Enterprise ColumnStore Topology with Object Storage

The MariaDB Enterprise ColumnStore topology with Object Storage delivers production analytics with high availability, fault tolerance, and limitless data storage by leveraging S3-compatible storage.

The topology consists of:

  • One or more MaxScale nodes

  • An odd number of ColumnStore nodes (minimum of 3) running ES, Enterprise ColumnStore, and CMAPI

The MaxScale nodes:

  • Monitor the health and availability of each ColumnStore node using the MariaDB Monitor (mariadbmon)

  • Accept client and application connections

  • Route queries to ColumnStore nodes using the Read/Write Split Router (readwritesplit)

The ColumnStore nodes:

Enterprise ColumnStore with Shared Local Storage

Enterprise ColumnStore Topology with Shared Local Storage

The MariaDB Enterprise ColumnStore topology with Shared Local Storage delivers production analytics with high availability and fault tolerance by leveraging shared local storage, such as NFS.

The topology consists of:

  • One or more MaxScale nodes

  • An odd number of ColumnStore nodes (minimum of 3) running ES, Enterprise ColumnStore, and CMAPI

The MaxScale nodes:

  • Monitor the health and availability of each ColumnStore node using the MariaDB Monitor (mariadbmon)

  • Accept client and application connections

  • Route queries to ColumnStore nodes using the Read/Write Split Router (readwritesplit)

The ColumnStore nodes:

HTAP

Enterprise ColumnStore HTAP Topology with Object Storage

The MariaDB Enterprise ColumnStore HTAP topology is designed for hybrid transactional-analytical processing (HTAP) workloads.

The topology consists of:

  • One MaxScale node

  • One ColumnStore node running ES and Enterprise ColumnStore

The MaxScale node:

  • Monitors the health and availability of the ColumnStore node using the MariaDB Monitor (mariadbmon)

  • Accepts client and application connections

  • Routes queries to the ColumnStore node using the Read/Write Split Router (readwritesplit)

The ColumnStore node:

  • Receives queries from MaxScale

  • Executes queries

  • Uses a row-based storage engine, such as InnoDB to handle transactional queries

  • Uses Enterprise ColumnStore as the columnar storage engine to handle analytical queries

  • Uses cross-engine JOINs to join transactional and analytical tables

  • Replicates data between engines using MariaDB Replication

  • Optionally uses S3-compatible object storage for Enterprise ColumnStore data

Software Architecture

Software Component

Role

MariaDB Enterprise ColumnStore

  • Columnar storage engine

  • Query execution

  • Data storage

MariaDB Enterprise Server

  • Enterprise-grade database server

ColumnStore Storage Engine Plugin

  • Storage engine plugin

  • Integrates MariaDB Enterprise ColumnStore into MariaDB Enterprise Server

Cluster Management API (CMAPI)

  • REST API

  • Used for administrative tasks

MariaDB MaxScale

  • Database proxy

  • Accepts connections

  • Routes queries

  • Performs auto-failover

MariaDB Enterprise ColumnStore

MariaDB Enterprise ColumnStore is the columnar storage engine that handles data storage and query optimization/execution.

MariaDB Enterprise ColumnStore is a columnar storage engine that is optimized for analytical or online analytical processing (OLAP) workloads, data warehousing, and DSS. MariaDB Enterprise ColumnStore can be used for hybrid transactional-analytical processing (HTAP) workloads when paired with a row-based storage engine, like InnoDB.

MariaDB Enterprise Server

MariaDB Enterprise ColumnStore is built on top of MariaDB Enterprise Server. MariaDB Enterprise ColumnStore 5 is included with the standard MariaDB Enterprise Server 10.5 releases, while MariaDB Enterprise ColumnStore 6 is included with the standard MariaDB Enterprise Server 10.6 releases.

Enterprise ColumnStore interfaces with the Enterprise Server SQL engine through the ColumnStore storage engine plugin.

MariaDB has been continually improving the integration of MariaDB Enterprise ColumnStore with MariaDB Enterprise Server:

ColumnStore Storage Engine Plugin

MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

The ColumnStore storage engine plugin is a smart storage engine that implements a custom select handler to fully take advantage of Enterprise ColumnStore's capabilities, such as:

  • Using a custom query planner

  • Selecting data by column instead of by row

  • Parallel query evaluation

  • Distributed aggregations

  • Distributed functions

  • Extent elimination

As a smart storage engine, the ColumnStore storage engine plugin tightly integrates Enterprise ColumnStore with ES, but it has enough independence to efficiently execute analytical queries using a completely unique approach.

For additional information, see "ColumnStore Storage Engine".

Cluster Management API (CMAPI) Server

The Cluster Management API (CMAPI) server provides a REST API that can be used to configure and manage Enterprise ColumnStore.

CMAPI must run on every ColumnStore node in a multi-node deployment but is not required in a single-node deployment.

The REST API can be used to perform multiple operations:

  • Add ColumnStore nodes

  • Remove ColumnStore nodes

  • Start Enterprise ColumnStore

  • Shutdown Enterprise ColumnStore

  • Check the status of Enterprise ColumnStore

For additional information, see "MariaDB Enterprise ColumnStore CMAPI".

MariaDB MaxScale

MariaDB Enterprise ColumnStore leverages MariaDB MaxScale as an advanced database proxy and query router.

Multi-node Enterprise ColumnStore deployments must have one or more MaxScale nodes. MaxScale performs many different roles:

  • Routing write queries to the primary server

  • Load balancing read queries on replica servers

  • Monitoring node health

  • Performing automatic failover if a node fails

Storage Architecture

MariaDB Enterprise ColumnStore's storage architecture provides a columnar storage engine with high availability, fault tolerance, compression, and automatic partitioning for production analytics and data warehousing.

For additional information, see "MariaDB Enterprise ColumnStore Storage Architecture".

Columnar Storage Engine

MariaDB Enterprise ColumnStore is a columnar storage engine for MariaDB Enterprise Server. MariaDB Enterprise ColumnStore enables ES to perform analytical workloads, including online analytical processing (OLAP), data warehousing, decision support systems (DSS), and hybrid transactional-analytical processing (HTAP) workloads.

Most traditional relational databases use row-based storage engines. In row-based storage engines, all columns for a table are stored contiguously. Row-based storage engines perform very well for transactional workloads, but are less performant for analytical workloads.

Columnar storage engines store each column separately. Columnar storage engines perform very well for analytical workloads. Analytical workloads are characterized by ad hoc queries on very large data sets by relatively few users.

MariaDB Enterprise ColumnStore automatically partitions each column into extents, which helps improve query performance without using indexes.

S3-Compatible Object Storage

MariaDB Enterprise ColumnStore supports S3-compatible object storage.

S3-compatible object storage is optional, but highly recommended. If S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage (such as NFS) for high availability.

S3-compatible object storage is:

  • Compatible: Many object storage services are compatible with the Amazon S3 API.

  • Economical: S3-compatible object storage is often very low cost.

  • Flexible: S3-compatible object storage is available for both cloud and on-premises deployments.

  • Limitless: S3-compatible object storage is often virtually limitless.

  • Resilient: S3-compatible object storage is often low maintenance and highly available, since many services use resilient cloud infrastructure.

  • Scalable: S3-compatible object storage is often highly optimized for read and write scaling.

  • Secure: S3-compatible object storage is often encrypted-at-rest.

Many S3-compatible object storage services exist. MariaDB Corporation cannot make guarantees about all S3-compatible object storage services, because different services provide different functionality.

If you have any questions about using specific S3-compatible object storage with MariaDB Enterprise ColumnStore, contact us.

Shared Local Storage

MariaDB Enterprise ColumnStore can use shared local storage.

Shared local storage is required for high availability. The specific shared local storage requirements depend on whether Enterprise ColumnStore is configured to use S3-compatible object storage:

  • When S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage for high availability.

  • When S3-compatible object storage is not used, Enterprise ColumnStore requires the DB Root directories to use shared local storage for high availability.

The most common shared local storage options for on-premises and cloud deployments are:

  • NFS (Network File System)

  • GlusterFS

The most common shared local storage options for AWS (Amazon Web Services) deployments are:

  • EBS (Elastic Block Store) Multi-Attach

  • EFS (Elastic File System)

The most common shared local storage option for GCP (Google Cloud Platform) deployments is:

  • Filestore

Query Evaluation Architecture

Enterprise ColumnStore Query Execution

MariaDB Enterprise ColumnStore uses distributed query execution and massively parallel processing (MPP) techniques to achieve vertical and horizontal scalability for production analytics and data warehousing.

For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation".

Extent Elimination

Enterprise ColumnStore: Extent Elimination

MariaDB Enterprise ColumnStore uses extent elimination to scale query evaluation as table size increases.

Most databases are row-based databases that use manually-created indexes to achieve high performance on large tables. This works well for transactional workloads. However, analytical queries tend to have very low selectivity, so traditional indexes are not typically effective for analytical queries.

Enterprise ColumnStore uses extent elimination to achieve high performance, without requiring manually created indexes. Enterprise ColumnStore automatically partitions all data into extents. Enterprise ColumnStore stores the minimum and maximum values for each extent in the extent map. Enterprise ColumnStore uses the minimum and maximum values in the extent map to perform extent elimination.

When Enterprise ColumnStore performs extent elimination, it compares the query's join conditions and filter conditions (i.e., WHERE clause) to the minimum and maximum values for each extent in the extent map. If the extent's minimum and maximum values fall outside the bounds of the query's conditions, Enterprise ColumnStore skips that extent for the query.

Extent elimination is automatically performed for every query. It can significantly decrease I/O for columns with clustered values. For example, extent elimination works effectively for series, ordered, patterned, and time-based data.

Custom Select Handler

The ColumnStore storage engine plugin implements a custom select handler to fully take advantage of Enterprise ColumnStore's capabilities.

All storage engines interact with ES using an internal handler API, which is highly extensible. Storage engines can implement different features by implementing different methods within the handler API.

For select statements, the handler API transforms each query into a SELECT_LEX object, which is provided to the select handler.

Before ES 10.4, the handler API implemented a generic select handler that all storage engines used. Starting with ES 10.4, the handler API allows storage engines to implement a custom select handler. Implementing a custom select handler enables storage engines to take full advantage of unique features.

The generic select handler is not optimal for Enterprise ColumnStore, because:

  • Enterprise ColumnStore selects data by column, but the generic select handler selects data by row

  • Enterprise ColumnStore supports parallel query evaluation, but the generic select handler does not

  • Enterprise ColumnStore supports distributed aggregations, but the generic select handler does not

  • Enterprise ColumnStore supports distributed functions, but the generic select handler does not

  • Enterprise ColumnStore supports extent elimination, but the generic select handler does not

  • Enterprise ColumnStore has its own query planner, but the generic select handler cannot use it

Smart Storage Engine

The ColumnStore storage engine plugin is known as a smart storage engine, because it implements a custom select handler. MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

If a storage engine implements a custom select handler, it is known as a smart storage engine.

As a smart storage engine, the ColumnStore storage engine plugin tightly integrates Enterprise ColumnStore with ES, but it has enough independence to efficiently execute analytical queries using a completely unique approach.

Query Planning

The ColumnStore storage engine plugin is a smart storage engine, so MariaDB Enterprise ColumnStore to plan its own queries using the custom select handler.

MariaDB Enterprise ColumnStore's query planning is divided into two steps:

  1. ES provides the query's SELECT_LEX object to the custom select handler. The custom select handler builds a ColumnStore Execution Plan (CSEP).

  2. The custom select handler provides the CSEP to the ExeMgr process on the same node. The ExeMgr process performs extent elimination and creates a job list.

Job Steps

When Enterprise ColumnStore executes a query, the ExeMgr process on the initiator/aggregator node translates the ColumnStore execution plan (CSEP) into a job list. A job list is a sequence of job steps.

Enterprise ColumnStore uses many different types of job steps that provide different scalability benefits:

  • Some types of job steps perform operations in a distributed manner using multiple nodes to operate on different extents. Distributed operations provide horizontal scalability.

  • Some types of job steps perform operations in a multi-threaded manner using a thread pool. Performing multi-threaded operations provides vertical scalability.

As you increase the number of ColumnStore nodes or the number of cores on each node, Enterprise ColumnStore can use those resources to more efficiently execute job steps.

High Availability and Failover

MariaDB Enterprise ColumnStore leverages common technologies to provide highly available production analytics with automatic failover:

Technology

Role

S3-Compatible Object Storage

  • HA for data

  • Optional, but highly recommended

Shared Storage

MariaDB Replication

  • Schema replication (ColumnStore tables)

  • Schema and data replication (non-ColumnStore tables)

  • Database object replication

MaxScale

  • Monitoring

  • Automatic failover

  • Load balancing

Cluster Management API (CMAPI) Server

  • REST API

  • Administration

  • Add nodes

  • Remove nodes

S3-Compatible Object Storage

MariaDB Enterprise ColumnStore supports S3-compatible object storage.

S3-compatible object storage is optional, but highly recommended. If S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage (such as NFS) for high availability.

S3-compatible object storage is:

  • Compatible: Many object storage services are compatible with the Amazon S3 API.

  • Economical: S3-compatible object storage is often very low cost.

  • Flexible: S3-compatible object storage is available for both cloud and on-premises deployments.

  • Limitless: S3-compatible object storage is often virtually limitless.

  • Resilient: S3-compatible object storage is often low maintenance and highly available, since many services use resilient cloud infrastructure.

  • Scalable: S3-compatible object storage is often highly optimized for read and write scaling.

  • Secure: S3-compatible object storage is often encrypted-at-rest.

Many S3-compatible object storage services exist. MariaDB Corporation cannot make guarantees about all S3-compatible object storage services, because different services provide different functionality.

If you have any questions about using specific S3-compatible object storage with MariaDB Enterprise ColumnStore, contact us.

Shared Local Storage

MariaDB Enterprise ColumnStore can use shared local storage.

Shared local storage is required for high availability. The specific shared local storage requirements depend on whether Enterprise ColumnStore is configured to use S3-compatible object storage:

  • When S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage for high availability.

  • When S3-compatible object storage is not used, Enterprise ColumnStore requires the DB Root directories to use shared local storage for high availability.

The most common shared local storage options for on-premises and cloud deployments are:

  • NFS (Network File System)

  • GlusterFS

The most common shared local storage options for AWS (Amazon Web Services) deployments are:

  • EBS (Elastic Block Store) Multi-Attach

  • EFS (Elastic File System)

The most common shared local storage option for GCP (Google Cloud Platform) deployments is:

  • Filestore

MariaDB Replication

MariaDB Enterprise ColumnStore requires MariaDB Replication to synchronize various database objects on multiple nodes for high availability.

MariaDB replication synchronizes:

  • The schemas for all ColumnStore tables on all nodes

  • The schemas and data for all non-ColumnStore tables on all nodes

  • All other databases objects (stored procedures, stored functions, user accounts, and other objects) on all nodes

MaxScale

MariaDB Enterprise ColumnStore requires MariaDB MaxScale to achieve high availability, automatic failover, and load balancing.

MariaDB Monitor (mariadbmon) in MaxScale monitors the health of each Enterprise ColumnStore node.

MaxScale provides load balancing by routing queries and/or connections to healthy nodes by:

When MaxScale's MariaDB Monitor notices the primary node fail, MariaDB Monitor performs automatic failover by:

  • Promoting a replica node to become the new primary node.

  • Re-configuring all replica nodes to replicate from the new primary node.

Cluster Management API (CMAPI) Server

MariaDB Enterprise ColumnStore requires the Cluster Management API (CMAPI) Server for high availability.

The CMAPI server provides a REST API that can be used to manage and configure Enterprise ColumnStore.

The CMAPI server has a role in automatic failover. After MaxScale performs automatic failover, the CMAPI server detects the topology change and automatically re-configures the roles of each Enterprise ColumnStore node.

Data Loading

Loading Data into an Enterprise ColumnStore Deployment with S3

MariaDB Enterprise ColumnStore performs bulk data loads very efficiently using a variety of mechanisms including the cpimport tool, specialized handling of certain SQL statements, and minimal locking during data import.

For additional information, see "MariaDB Enterprise ColumnStore Data Loading".

cpimport

MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which provides several benefits:

  • Bypasses the SQL layer to decrease overhead

  • Does not block read queries

  • Requires a write metadata lock on the table, which can be monitored with the METADATA_LOCK_INFO plugin

  • Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries.

  • Inserts each row in the order the rows are read from the source file. Users can optimize data loads for Enterprise ColumnStore's automatic partitioning by loading presorted data files. For additional information, see "Load Ordered Data in Proper Order".

  • Supports parallel distributed bulk loads

  • Imports data from text files

  • Imports data from binary files

  • Imports data from standard input (stdin)

Batch Insert Mode

MariaDB Enterprise ColumnStore enables batch insert mode by default.

When batch insert mode is enabled, MariaDB Enterprise ColumnStore, MariaDB Enterprise ColumnStore has special handling for the following statements:

Enterprise ColumnStore uses the following rules:

  • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.

  • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

Backup and Restore

Backup an Enterprise ColumnStore Deployment with S3

MariaDB Enterprise ColumnStore supports backup and restore using well-known tools and methods.

Component

Backup Methods

S3-Compatible Object Storage

  • S3 snapshot

Shared Local Storage

  • File system snapshot

  • File copy

Enterprise Server Data Directory

  • MariaDB Enterprise Backup

For additional information, see "MariaDB Enterprise ColumnStore Backup and Restore".

S3-Compatible Object Storage

MariaDB Enterprise ColumnStore can leverage S3 snapshots to backup S3-compatible object storage when it is used for Enterprise ColumnStore's data.

The S3-compatible object storage can be backed up by:

  1. Locking the database on the primary node

  2. Performing an S3 snapshot using the vendor's standard snapshot functionality

Shared Local Storage

MariaDB Enterprise ColumnStore can leverage file system snapshots or file copy tools (such as rsync) to backup shared local storage when it is used for the Storage Manager directory or the DB Root directories.

The shared local storage can be backed up by:

  1. Locking the database on the primary node

  2. Performing a file system snapshot or using a file copy tool (such as rsync) to copy the contents of the Storage Manager directory and/or the DB Root directories

Enterprise Server Data Directory

MariaDB Enterprise ColumnStore can leverage the standard MariaDB Enterprise Backup utility to backup the Enterprise Server data directory.

The backup contains:

  • All ColumnStore schemas

  • All non-ColumnStore schemas and data

  • All other database objects

It does not contain:

  • ColumnStore data