Multi-Node Analytics
This page is part of MariaDB's Documentation.
The parent of this page is: Services
Topics on this page:
Overview
MariaDB SkySQL includes the Multi-Node Analytics topology on GCP and AWS.
This service is designed for OLAP, DSS, and cloud data warehousing workloads.
Component View
In this service:
MariaDB Enterprise Server provides a modern SQL RDBMS with high availability, pluggable storage engines, hot online backups, and audit logging.
MariaDB Enterprise ColumnStore provides a columnar storage engine for ES that delivers real-time analytics at scale to perform interactive, ad hoc queries on hundreds of billions of rows with standard SQL.
MariaDB MaxScale provides load balancing, HA, and automatic failover.
Service Details
Multi-Node Analytics topology delivers the following features:
Topology | Multi-Node Analytics |
Storage Engine | ColumnStore |
Number of ES Instances | 3 |
Number of MaxScale Instances | 1 to 2 |
Columnar Storage | Yes |
Object Storage | Yes (infinite growth) |
Statistical Functions | Yes |
Production-Ready | Yes |
Development/UAT-Ready | Yes |
For additional information on ES instance options, see "Multi-Node Analytics Instance Sizes".
For additional information on MaxScale instance options, see "MaxScale Redundancy".
With this topology, enforce_
For additional information about MariaDB Enterprise ColumnStore on-premises and in the cloud, including use cases and features, see Modern Analytics Made Easy.
Resources
Storage
Multi-Node Analytics topology configures Enterprise ColumnStore to use object storage to store its data in a highly available manner.
"Unlimited" on-demand capacity without downtime for upgrades
Highly available on resilient cloud infrastructure
Cost effective
Additionally, the Multi-Node Analytics topology configures Enterprise ColumnStore to use reliable shared storage in the cloud to store its metadata in a highly available manner, either NFS or EFS depending on the cloud provider selected.
Efficient Data Loading
Multi-Node Analytics topology leverages the data loading architecture of MariaDB Enterprise ColumnStore.
MariaDB Enterprise ColumnStore supports very efficient bulk data loads.
Multi-Node Analytics topology can also benefit from the features of cpimport
, a bulk loading tool which bypasses the SQL layer to decrease the overhead of bulk data loading. cpimport
efficiently performs the bulk data load by appending the new rows to the table in the order in which the rows are read from the source file. This approach reduces the I/O requirements, so that larger data sets can be loaded very efficiently.
Multi-Node Analytics topology configures Enterprise ColumnStore to automatically translate LOAD DATA INFILE statements into bulk data loads. It translates the statement into a bulk data load that uses cpimport.bin
, which is an internal wrapper around the cpimport
tool.
Enterprise ColumnStore only requires a write metadata lock on the table to perform a bulk data load, so bulk data load does not block read-only queries.
OLAP Workloads
MariaDB Enterprise ColumnStore enables Multi-Node Analytics services to perform analytical or online analytical processing (OLAP) workloads.
OLAP workloads are generally characterized by ad hoc queries on very large data sets. Some other typical characteristics are:
Each query typically reads a subset of columns in the table
Most activity typically consists of read-only queries that perform aggregations, window functions, and various calculations
Analytical applications typically require only a few concurrent queries
Analytical applications typically require scalability of large, complex queries
Analytical applications typically require the ability to efficiently bulk load new data
Example Workloads
OLAP workloads are typically required for:
Business intelligence (BI)
Health informatics
Historical data mining
Columnar Benefits
Row-based storage engines face a disadvantage with OLAP workloads. Indexes are not usually very useful for OLAP workloads, because the large size of the data set and the ad hoc nature of the queries preclude the use of indexes to optimize queries.
Columnar storage engines are much better suited for OLAP workloads. MariaDB Enterprise ColumnStore is a columnar storage engine that is designed for OLAP workloads:
If a query reads a subset of columns in the table, Enterprise ColumnStore is able to reduce I/O by reading those columns and ignoring all others, because each column is stored separately
If most activity consists of read-only queries that perform aggregations, window functions, and various calculations, Enterprise ColumnStore is able to efficiently execute those queries using extent elimination, distributed query execution, and massively parallel processing (MPP) techniques
If only a few concurrent queries are required, Enterprise ColumnStore is able to maximize parallelism for each query by allocating more resources to each one
If scalability of large, complex queries is required, Enterprise ColumnStore is able to achieve horizontal and vertical scalability using distributed query execution and massively parallel processing (MPP) techniques
If the ability to efficiently bulk load new data is required, Enterprise ColumnStore is able to bulk load new data without affecting existing data using automatic partitioning with the extent map
High Availability and Failover
Multi-Node Analytics topology leverages the high availability features of MariaDB Enterprise ColumnStore and MariaDB MaxScale.
MariaDB Enterprise ColumnStore is designed to provide highly available production analytics with automatic failover.
Multi-Node Analytics topology uses MaxScale to achieve high availability and automatic failover. MaxScale's MariaDB Monitor (mariadbmon
) monitors the Enterprise ColumnStore nodes. When the monitor notices the primary node fail, it promotes a replica node to become the new primary node, and it re-configures all replica nodes to replicate from the new primary node. The MaxScale Redundancy option enables high availability load balancing.
Architecture
MariaDB Enterprise ColumnStore uses a sophisticated storage architecture designed to provide a columnar storage engine for production analytics with high availability, fault tolerance, compression, and automatic partitioning.
Multi-Node Analytics topology configures Enterprise ColumnStore to use MariaDB Replication to synchronize various database objects on multiple nodes for high availability:
Replication synchronizes the schemas for all ColumnStore tables on all nodes.
Replication synchronizes the schemas and data for all non-ColumnStore tables (such as system tables) on all nodes.
Replication synchronizes all other databases objects (stored procedures, stored functions, user accounts, and other objects) on all nodes.
Additionally, if scalability of large, complex queries is required, Enterprise ColumnStore is able to achieve horizontal and vertical scalability using distributed query execution and massively parallel processing (MPP) techniques.
For additional information, see "MariaDB Enterprise ColumnStore".