MariaDB Enterprise ColumnStore

MariaDB Enterprise ColumnStore is an enterprise version of the ColumnStore storage engine included with MariaDB Enterprise Server. It provides distributed, columnar storage for scalable analytical processing and smart transactions.

MariaDB Enterprise ColumnStore is a component of MariaDB Platform.

MariaDB Enterprise ColumnStore is the analytical storage engine used by MariaDB's single stack Hybrid Transactional/Analytical Processing (HTAP) solution.

MariaDB Enterprise Server Convergence

MariaDB Enterprise ColumnStore has been converging with MariaDB Enterprise Server throughout the past few releases:

  • In MariaDB ColumnStore 1.2 and earlier, MariaDB ColumnStore required special custom-built releases of MariaDB Server.

  • Starting with MariaDB Enterprise ColumnStore 1.4, it is distributed with the standard MariaDB Enterprise Server 10.4 releases as the ColumnStore storage engine.

  • The simplified installation process makes enterprise-grade analytics accessible to a wider audience.

Available Versions

The version of MariaDB Enterprise ColumnStore depends on the version of MariaDB Enterprise Server being used:

Enterprise Server

Enterprise ColumnStore

ES10.5.6-4

MariaDB Enterprise ColumnStore 5.4

ES10.5.5-3 and before

MariaDB Enterprise ColumnStore 1.5

ES10.4

MariaDB Enterprise ColumnStore 1.4

Note

To install the latest version, see Deploy MariaDB Enterprise ColumnStore 5.4.

Available Platforms

MariaDB Enterprise ColumnStore is available on select platforms:

  • CentOS 8

  • CentOS 7

  • Debian 10

  • Debian 9

  • Red Hat Enterprise Linux 8

  • Red Hat Enterprise Linux 7

  • SUSE Linux Enterprise Server 15

  • SUSE Linux Enterprise Server 12

  • Ubuntu 20.04

  • Ubuntu 18.04

  • Ubuntu 16.04

Columnar Database

MariaDB Enterprise ColumnStore is a columnar database:

  • The most commonly used storage engines in MariaDB Enterprise Server are row-based storage engines.

  • Row-based storage engines are very performant for transactional or OLTP workloads.

  • Transactional workloads are generally characterized by a fixed set of queries using a relatively small data set.

  • Columnar storage engines are more performant for analytical or OLAP workloads.

  • Analytical workloads are characterized by ad hoc queries on very large data sets.

  • Where indexes can be used to optimize query performance for transactional workloads, the size of the data sets and the ad hoc nature of the queries preclude the use of indexes to optimize for analytical queries.

  • ColumnStore is designed specifically to handle analytical workloads.

  • Data is written to disk by column rather than row and is automatically partitioned. No indexes are necessary.

Hybrid Transactional and Analytical Processing

MariaDB Enterprise ColumnStore supports Hybrid Transactional and Analytical Processing (HTAP) workloads:

  • Hybrid Transactional and Analytical Processing (HTAP) workloads require both transactional and analytical queries.

  • HTAP workloads are also known as "Smart Transactions", "Augmented Transactions" "Translytical", or "Hybrid Operational-Analytical Processing (HOAP)".

  • ColumnStore can perform as the analytical storage engine for HTAP.

  • MariaDB Enterprise Server provides row-based storage engines, such as InnoDB, which can perform as the transactional storage engine for HTAP.

  • MariaDB Replication can replicate data between the transactional and analytical engines to maintain data consistency.

  • MariaDB MaxScale is a high-performance database proxy, which can dynamically route transactional queries to the transactional storage engine and analytical queries to ColumnStore.

S3 Storage Manager

MariaDB Enterprise ColumnStore supports S3-compatible storage:

  • Enterprise ColumnStore can use any object store that is compatible with the Amazon S3 API.

  • Using cloud storage for Enterprise ColumnStore data allows for practically limitless data storage while also providing high availability.

  • Enterprise ColumnStore's "Storage Manager" uses a persistent local disk cache for read/write operations so that network latency has minimal performance impact on Enterprise ColumnStore.

  • In some cases, it will perform better than local disk operations.

Scalability

MariaDB Enterprise ColumnStore is scalable:

  • Enterprise ColumnStore's distributed massively parallel architecture provides linear scalability and horizontal scaling.

  • In a multi-node Enterprise ColumnStore cluster, additional data nodes can be added as data grows.

  • During bulk data loads, read queries can continue uninterrupted.

  • ColumnStore's design supports targeted scale-out to address increased workload requirements, whether it is a larger query load or increased storage and query processing capacity.

High Availability

MariaDB Enterprise ColumnStore provides high availability (HA) in a multi-node Enterprise ColumnStore cluster:

  • Multi-node Enterprise ColumnStore supports data redundancy providing highly available storage and automated failover.

  • Multi-node Enterprise ColumnStore leverages MaxScale 2.5 to provide automated failover and load balancing.

  • Multi-node Enterprise ColumnStore leverages shared storage for data redundancy.

    • GlusterFS is a great shared storage option. It is an open source, distributed file system that provides continued access to data and is capable of scaling very large data. It is maintained by Red Hat.

    • EFS and NFS are other shared storage options.

    • S3-compatible storage is also supported for data, but shared storage is still required for metadata.

    • To enable data redundancy you must install and enable shared storage.

  • Failover is configured automatically by MariaDB Enterprise ColumnStore and MariaDB MaxScale, so that if a physical server experiences a service interruption, data is still accessible from another node.

Connectors to BI Tools

MariaDB Enterprise ColumnStore includes support for multiple connectors and data adapters to enable the use of popular data ingestion and business intelligence tools such as:

  • Apache Spark

  • Pentaho

  • Tableau Desktop

  • Power BI

Data Ingestion Tools

MariaDB Enterprise ColumnStore supports data ingestion tools:

  • cpimport ColumnStore bulk data ingestion utility which includes command-line options for loading a CSV file from Amazon S3 (and compatible) buckets.

  • Apache Spark connector to directly export data from Spark DataFrames to MariaDB Enterprise ColumnStore

  • Kafka data adapter for rapid data ingestion

UDAF C++ API

MariaDB Enterprise ColumnStore supports a Distributed User Defined Aggregate Functions (UDAF) C++ API:

  • The Distributed User Defined Aggregate Functions (UDAF) C++ API allows anyone to create aggregate functions of arbitrary complexity for distributed execution in the ColumnStore storage engine.

  • These functions can also be used as Analytic (Window) functions just like any built in aggregate function.

Next steps: