Single Node Analytics

Overview

MariaDB SkySQL includes the Single Node Analytics topology GCP and AWS.

This service is designed for OLAP, DSS, and cloud data warehousing workloads.

Component View

In this service:

  • MariaDB Enterprise Server provides a modern SQL RDBMS with high availability, pluggable storage engines, hot online backups, and audit logging.

  • MariaDB Enterprise ColumnStore provides a columnar storage engine for ES that delivers real-time analytics at scale to perform interactive, ad hoc queries on hundreds of billions of rows with standard SQL.

  • MariaDB MaxScale provides load balancing, HA, and automatic failover.

Service Details

Single Node Analytics topology delivers the following features:

Topology

Single Node Analytics

Storage Engine

ColumnStore

Number of ES Instances

1

Number of MaxScale Instances

1 to 2

Columnar Storage

Yes

Object Storage

Yes (infinite growth)

Statistical Functions

Yes

Production-Ready

No

Development/UAT-Ready

Yes

For additional information on ES instance options, see "Single Node Analytics Instance Sizes".

For additional information on MaxScale instance options, see "MaxScale Redundancy".

With this topology, enforce_storage_engine is set so this service only supports the Enterprise ColumnStore storage engine.

For additional information about MariaDB Enterprise ColumnStore on-premises and in the cloud, including use cases and features, see Modern Analytics Made Easy.

Storage

Single Node Analytics topology configures Enterprise ColumnStore to use object storage to store its data in a highly available manner.

  • "Unlimited" on-demand capacity without downtime for upgrades

  • Highly available on resilient cloud infrastructure

  • Cost effective

Efficient Data Loading

Single Node Analytics topology leverages the data loading architecture of MariaDB Enterprise ColumnStore.

MariaDB Enterprise ColumnStore supports very efficient bulk data loads.

Single Node Analytics topology can also benefit from the features of cpimport, a bulk loading tool which bypasses the SQL layer to decrease the overhead of bulk data loading. cpimport efficiently performs the bulk data load by appending the new rows to the table in the order in which the rows are read from the source file. This approach reduces the I/O requirements, so that larger data sets can be loaded very efficiently.

Single Node Analytics topology configures Enterprise ColumnStore to automatically translate LOAD DATA INFILE statements into bulk data loads. It translates the statement into a bulk data load that uses cpimport.bin, which is an internal wrapper around the cpimport tool.

Enterprise ColumnStore only requires a write metadata lock on the table to perform a bulk data load, so that the bulk data load does not block read-only queries.

OLAP Workloads

MariaDB Enterprise ColumnStore enables Single Node Analytics services to perform analytical or online analytical processing (OLAP) workloads.

OLAP workloads are generally characterized by ad hoc queries on very large data sets. Some other typical characteristics are:

  • Each query typically reads a subset of columns in the table

  • Most activity typically consists of read-only queries that perform aggregations, window functions, and various calculations

  • Analytical applications typically require only a few concurrent queries

  • Analytical applications typically require scalability of large, complex queries

  • Analytical applications typically require the ability to efficiently bulk load new data

Example Workloads

OLAP workloads are typically required for:

  • Business intelligence (BI)

  • Health informatics

  • Historical data mining

Columnar Benefits

Row-based storage engines face a disadvantage with OLAP workloads. Indexes are not usually very useful for OLAP workloads, because the large size of the data set and the ad hoc nature of the queries preclude the use of indexes to optimize queries.

Columnar storage engines are much better suited for OLAP workloads. MariaDB Enterprise ColumnStore is a columnar storage engine that is designed for OLAP workloads:

  • If a query reads a subset of columns in the table, Enterprise ColumnStore is able to reduce I/O by reading those columns and ignoring all others, because each column is stored separately

  • If most activity consists of read-only queries that perform aggregations, window functions, and various calculations, Enterprise ColumnStore is able to efficiently execute those queries using extent elimination, distributed query execution, and massively parallel processing (MPP) techniques

  • If only a few concurrent queries are required, Enterprise ColumnStore is able to maximize parallelism for each query by allocating more resources to each one

  • If scalability of large, complex queries is required, Enterprise ColumnStore is able to achieve horizontal and vertical scalability using distributed query execution and massively parallel processing (MPP) techniques

  • If the ability to efficiently bulk load new data is required, Enterprise ColumnStore is able to bulk load new data without affecting existing data using automatic partitioning with the extent map

High Availability and Failover

Single Node Analytics topology leverages the high availability features of MariaDB Enterprise ColumnStore and MariaDB MaxScale.

MariaDB Enterprise ColumnStore is designed to provide highly available production analytics with automatic failover.

Single Node Analytics topology uses MaxScale to achieve high availability and automatic failover. MaxScale's MariaDB Monitor (mariadbmon) monitors the Enterprise ColumnStore nodes. When the monitor notices the primary node fail, it promotes a replica node to become the new primary node, and it re-configures all replica nodes to replicate from the new primary node. The MaxScale Redundancy option enables high availability load balancing.

Architecture

MariaDB Enterprise ColumnStore uses a sophisticated storage architecture designed to provide a columnar storage engine for production analytics with high availability, fault tolerance, compression, and automatic partitioning.

Single Node Analytics topology configures Enterprise ColumnStore to use MariaDB Replication to synchronize various database objects on multiple nodes for high availability:

  • Replication synchronizes the schemas for all ColumnStore tables on all nodes.

  • Replication synchronizes the schemas and data for all non-ColumnStore tables (such as system tables) on all nodes.

  • Replication synchronizes all other databases objects (stored procedures, stored functions, user accounts, and other objects) on all nodes.

Additionally, if scalability of large, complex queries is required, Enterprise ColumnStore is able to achieve horizontal and vertical scalability using distributed query execution and massively parallel processing (MPP) techniques.

For additional information, see "MariaDB Enterprise ColumnStore".