MariaDB ColumnStore

MariaDB ColumnStore is a columnar storage engine that provides distributed, columnar storage for scalable analytical processing and smart transactions.

Smart transactions are sometimes known as augmented transactions, Translytical, Hybrid Transactional-Analytical Processing or HTAP, Hybrid Operational-Analytical Processing or HOAP.

MariaDB ColumnStore is a component of MariaDB Platform.

MariaDB ColumnStore is the analytical component of MariaDB's single stack Hybrid Transactional/Analytical Processing (HTAP) solution.

MariaDB ColumnStore is included with MariaDB Enterprise Server 10.4 on select Platforms. The first release including MariaDB ColumnStore was MariaDB Enterprise Server 10.4.11-5, which included MariaDB ColumnStore 1.4.2.

Benefits of Columnar Storage

Analytical workloads have different demands than transactional workloads. Transactional workloads are generally characterized by a fixed set of queries using a relatively small data set. Analytical workloads are characterized by ad hoc queries on very large data sets. Where indexes can be used to optimize query performance for transactional workloads, the size of the data sets and the ad hoc nature of the queries preclude the use of indexes to optimize for analytical queries.

ColumnStore is designed specifically to handle analytical workloads. Data is written to disk by column rather than row and is automatically partitioned. No indexes are necessary.

Hybrid Transactional and Analytical Processing

Used in conjunction with MariaDB Enterprise Server running a row-based transactional database instance and MariaDB MaxScale high-performance router for transparent query routing, MariaDB ColumnStore can provide the analytical engine for Hybrid Transactional and Analytical Processing (HTAP).

MariaDB MaxScale dynamically routes transactional queries to replicated, row-based storage on MariaDB Enterprise Server and analytical queries to MariaDB ColumnStore. Applications can combine the benefits of row-based transactional databases with columnar analytical databases through a single interface. Data consistency across row-based and columnar storage engines is maintained by MaxScale so queries can be executed operationally at the time of data change.

S3 Storage Manager

MariaDB ColumnStore can use any object store that is Amazon S3 API compatible. ColumnStore's "Storage Manager" uses a persistent disk cache for read/write operations so that it has minimal performance impact on ColumnStore. In some cases it will perform better than local disk operations.

Scalability

ColumnStore's distributed massively parallel architecture provides linear scalability and horizontal scaling. Additional data nodes can be added as data grows and read queries can continue uninterrupted.

MariaDB ColumnStore's design supports targeted scale-out to address increased workload needs whether it is a larger query load or increased storage and query processing capacity.

High Availability

MariaDB ColumnStore supports data redundancy providing highly available storage and automated failover.

MariaDB ColumnStore data redundancy leverages an open source file system called GlusterFS, maintained by RedHat. GlusterFS is an open source, distributed file system that provides continued access to data and is capable of scaling very large data. To enable data redundancy you must install and enable GlusterFS prior to running postConfigure. Failover is configured automatically by MariaDB ColumnStore, so that if a physical server experiences a service interruption, data is still accessible from another node.

Connectors to BI Tools

MariaDB ColumnStore includes support for multiple connectors and data adapters to enable the use of popular data ingestion and business intelligence tools such as:

  • Apache Spark

  • Pentaho

  • Tableau Desktop

Data Ingestion Tools

  • cpimport ColumnStore bulk data ingestion utility which includes command-line options for loading a CSV file from Amazon S3 (and compatible) buckets.

  • Apache Spark connector to directly export data from Spark DataFrames to MariaDB ColumnStore

  • Kafka data adapter for rapid data ingestion

UDAF C++ API

The Distributed User Defined Aggregate Functions (UDAF) C++ API allows anyone to create aggregate functions of arbitrary complexity for distributed execution in the ColumnStore Engine. These functions can also be used as Analytic (Window) functions just like any built in aggregate.

New in ColumnStore 1.4

  • MariaDB Server convergence

    Until now, MariaDB ColumnStore has been maintained as a custom fork of MariaDB Server, to handle the unique way that queries are handled for distributed processing.

    With this release, a joint project between the MariaDB Server and MariaDB ColumnStore engineering teams, ColumnStore now works as a pluggable storage engine on the standard MariaDB Enterprise Server 10.4 platform.

  • S3-compatible object storage

    MariaDB ColumnStore now has the ability to use any object store that is Amazon S3 API compatible. The "Storage Manager" uses a persistent disk cache for read/write operations so that it has minimal performance impact on ColumnStore. In some cases it will perform better than local disk operations.

  • Disk pre-allocation

    To reduce SSD wear and and increase write performance for large data sets containing many columns, ColumnStore now allocates disk as-needed, writing only real data and padding to fill the remainder of an 8KB block. ColumnStore previously wrote twice -- once to pre-allocate an empty file for each new extent (8 million item file for a column), and a second time to fill the file with real data.

  • Faster ORDER BY

    The outer "ORDER BY" of a query is now processed using ColumnStore’s engine instead of MariaDB server. This uses a faster sorting algorithm for higher performance with larger result sets.

  • Faster, more efficient hash JOINS

  • Expanded Data Type Support including TIMESTAMP (with CURRENT_TIMESTAMP), BOOLEAN, MEDIUMINT

  • Statement-based Replication support

  • InfiniDB alias eliminated

    ColumnStore 1.2 and earlier included the InfiniDB engine as an alias. This alias has now been removed. All ColumnStore tables must now be created with the engine name "columnstore". All MariaDB system variables prefixed with "infinidb_" have now been removed.

For a complete list of changes, see MariaDB ColumnStore 1.4.2 release notes.