MariaDB ColumnStore

MariaDB ColumnStore is a columnar storage engine that provides distributed, columnar storage for scalable analytical processing and smart transactions.

Smart transactions are sometimes known as augmented transactions, Translytical, Hybrid Transactional-Analytical Processing or HTAP, Hybrid Operational-Analytical Processing or HOAP.

MariaDB ColumnStore is a component of MariaDB Platform.

MariaDB ColumnStore is the analytical component of MariaDB's single stack Hybrid Transactional/Analytical Processing (HTAP) solution.

MariaDB Server Convergence

In MariaDB ColumnStore 1.2 and earlier, MariaDB ColumnStore required special custom-built releases of MariaDB Server. Since MariaDB ColumnStore 1.4, it is distributed with the standard MariaDB Server releases as the ColumnStore storage engine. The simplified installation process makes enterprise-grade analytics accessible to a wider audience.

MariaDB ColumnStore 1.4 is included with MariaDB Enterprise Server 10.4 on select platforms. The first release including MariaDB ColumnStore was MariaDB Enterprise Server 10.4.11-5, which included MariaDB ColumnStore 1.4.2.

MariaDB ColumnStore 1.5 is included with MariaDB Community Server 10.5 on select platforms. The first release including MariaDB ColumnStore was MariaDB Community Server 10.5.4, which included MariaDB ColumnStore 1.5.2.

Benefits of Columnar Storage

Analytical workloads have different demands than transactional workloads. Transactional workloads are generally characterized by a fixed set of queries using a relatively small data set. Analytical workloads are characterized by ad hoc queries on very large data sets. Where indexes can be used to optimize query performance for transactional workloads, the size of the data sets and the ad hoc nature of the queries preclude the use of indexes to optimize for analytical queries.

ColumnStore is designed specifically to handle analytical workloads. Data is written to disk by column rather than row and is automatically partitioned. No indexes are necessary.

Hybrid Transactional and Analytical Processing

Hybrid Transactional and Analytical Processing (HTAP) workloads require both transactional and analytical queries. MariaDB ColumnStore is designed with HTAP workloads in mind.

Used in conjunction with MariaDB Enterprise Server running a row-based transactional database instance and MariaDB MaxScale high-performance router for transparent query routing, MariaDB ColumnStore can provide the analytical engine for Hybrid Transactional and Analytical Processing (HTAP).

MariaDB MaxScale dynamically routes transactional queries to replicated, row-based storage on MariaDB Enterprise Server and analytical queries to MariaDB ColumnStore. Applications can combine the benefits of row-based transactional databases with columnar analytical databases through a single interface. Data consistency across row-based and columnar storage engines is maintained by MaxScale so queries can be executed operationally at the time of data change.

S3 Storage Manager

MariaDB ColumnStore can use any object store that is Amazon S3 API compatible. ColumnStore's "Storage Manager" uses a persistent disk cache for read/write operations so that it has minimal performance impact on ColumnStore. In some cases it will perform better than local disk operations.

Scalability

ColumnStore's distributed massively parallel architecture provides linear scalability and horizontal scaling. Additional data nodes can be added as data grows and read queries can continue uninterrupted.

MariaDB ColumnStore's design supports targeted scale-out to address increased workload needs whether it is a larger query load or increased storage and query processing capacity.

High Availability

MariaDB ColumnStore supports data redundancy providing highly available storage and automated failover.

MariaDB ColumnStore data redundancy leverages an open source file system called GlusterFS, maintained by RedHat. GlusterFS is an open source, distributed file system that provides continued access to data and is capable of scaling very large data. To enable data redundancy you must install and enable GlusterFS prior to running postConfigure. Failover is configured automatically by MariaDB ColumnStore, so that if a physical server experiences a service interruption, data is still accessible from another node.

Connectors to BI Tools

MariaDB ColumnStore includes support for multiple connectors and data adapters to enable the use of popular data ingestion and business intelligence tools such as:

  • Apache Spark

  • Pentaho

  • Tableau Desktop

Data Ingestion Tools

  • cpimport ColumnStore bulk data ingestion utility which includes command-line options for loading a CSV file from Amazon S3 (and compatible) buckets.

  • Apache Spark connector to directly export data from Spark DataFrames to MariaDB ColumnStore

  • Kafka data adapter for rapid data ingestion

UDAF C++ API

The Distributed User Defined Aggregate Functions (UDAF) C++ API allows anyone to create aggregate functions of arbitrary complexity for distributed execution in the ColumnStore Engine. These functions can also be used as Analytic (Window) functions just like any built in aggregate.

Next steps: