ColumnStore Streaming Data Adapters
The ColumnStore Bulk Data API enables the creation of higher performance adapters for ETL integration and data ingestions. The Streaming Data Adapters are out of box adapters using these API for specific data sources and use cases.
MaxScale CDC Data Adapter is integration of the MaxScale CDC streams into MariaDB ColumnStore.
Kafka Data Adapter is integration of the Kafka streams into MariaDB ColumnStore.
MaxScale CDC Data Adapter
The MaxScale CDC Data Adapter has been deprecated.
The MaxScale CDC Data Adapter allows to stream change data events(binary log events) from MariaDB Master hosting non-columnstore engines(InnoDB, MyRocks, MyISAM) to MariaDB ColumnStore. In another words replicate data from MariaDB Master to MariaDB ColumnStore. It acts as a CDC Client for MaxScale and uses the events received from MaxScale as input to MariaDB ColumnStore Bulk Data API to push the data to MariaDB ColumnStore.
It registers with MariaDB MaxScale as a CDC Client using the MaxScale CDC Connector API, receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore bulk write SDK, converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).
Installation
Pre-requisite:
Download and install MaxScale CDC Connector API from connector
Download and install MariaDB ColumnStore bulk write SDK from columnstore-bulk-write-sdk.md
CentOS 7
Debian 9/Ubuntu Xenial:
Debian 8:
Usage
Streaming Multiple Tables
To stream multiple tables, use the -f parameter to define a path to a TSV formatted file. The file must have one database and one table name per line. The database and table must be separated by a TAB character and the line must be terminated in a newline \n.
Here is an example file with two tables, t1 and t2 both in the test database.
Automated Table Creation on ColumnStore
You can have the adapter automatically create the tables on the ColumnStore instance with the -a option. In this case, the user used for cross-engine queries will be used to create the table (the values in Columnstore.CrossEngineSupport). This user will require CREATE privileges on all streamed databases and tables.
Data Transformation Mode
The -z option enables the data transformation mode. In this mode, the data is converted from historical, append-only data to the current version of the data. In practice, this replicates changes from a MariaDB master server to ColumnStore via the MaxScale CDC.
Quick Start
Download and install both MaxScale and ColumnStore.
Copy the Columnstore.xml file from /usr/local/mariadb/columnstore/etc/Columnstore.xml
from one of the ColumnStore UM or PMnodese to the server where the adapter is installed.
Configure MaxScale according to the CDC tutorial.
Create a CDC user by executing the following MaxAdmin command on the MaxScale server. Replace the <service>
with the name of the avrorouter service and <user>
and <password>
with the credentials that are to be created.
Then we can start the adapter by executing the following command.
The <database>
and <table>
define the table that is streamed to ColumnStore. This table should exist on the master server where MaxScale is reading events from. If the table is not created on ColumnStore, the adapter will print instructions on how to define it in the correct way.
The <user>
and <password>
are the users created for the CDC user, <host>
is the MaxScale address and <port>
is the port where the CDC service listener is listening.
The -c
flag is optional if you are running the adapter on the server where ColumnStore is located.
Kafka to ColumnStore Adapter
The Kafka data adapter streams all messages published to Apache Kafka topics in Avro format to MariaDB AX automatically and continuously - enabling data from many sources to be streamed and collected for analysis without complex code. The Kafka adapter is built using librdkafka and the MariaDB ColumnStore bulk write SDK
A tutorial for the Kafka adapter for ingesting Avro formatted data can be found in the kafka-to-columnstore-data-adapter document.
ColumnStore - Pentaho Data Integration - Data Adapter
Starting with MariaDB ColumnStore 1.1.4, a data adapter for Pentaho Data Integration (PDI) / Kettle is available to import data directly into ColumnStore’s WriteEngine. It is built on MariaDB’s rapid-paced Bulk Write SDK.

Compatibility notice
The plugin was designed for the following software composition:
Operating system: Windows 10 / Ubuntu 16.04 / RHEL/CentOS 7+
MariaDB ColumnStore >= 1.1.4
MariaDB Java Database client* >= 2.2.1
Java >= 8
Pentaho Data Integration >= 7
+not officially supported by Pentaho.
*Only needed if you want to execute DDL.
Installation
The following steps are necessary to install the ColumnStore Data adapter (bulk loader plugin):
Extract the archive mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip into your PDI installation directory $PDI-INSTALLATION/plugins.
Copy MariaDB's JDBC Client mariadb-java-client-2.2.x.jar into PDI's lib directory $PDI-INSTALLATION/lib.
Install the additional library dependencies
Ubuntu dependencies
CentOS dependencies
Windows 10 dependencies
On Windows the installation of the Visual Studio 2015/2017 C++ Redistributable (x64) is required.
Configuration
Each MariaDB ColumnStore Bulk Loader block needs to be configured. On the one hand, it needs to know how to connect to the underlying Bulk Write SDK to inject data into ColumnStore, and on the other hand, it needs to have a proper JDBC connection to execute DDL.
Both configurations can be set in each block’s settings tab.

The database connection configuration follows PDI’s default schema.
By default the plugin tries to use ColumnStore's default configuration /usr/local/mariadb/columnstore/etc/Columnstore.xml to connect to the ColumnStore instance through the Bulk Write SDK. In addition, individual paths or variables can be used too.
Information on how to prepare the Columnstore.xml configuration file can be found here.
Usage

Once a block is configured and all inputs are connected in PDI, the inputs have to be mapped to ColumnStore’s table format.
One can either choose “Map all inputs”, which sets target columns of adequate type, or choose a custom mapping based on the structure of the existing table.
The SQL button can be used to generate DDL based on the defined mapping and to execute it.
Limitations
This plugin is a beta release.
In addition, it can't handle blob data types and only supports multiple inputs to one block if the input field names are equal for all input sources.
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?