The ColumnStore Bulk Data API enable creation of higher performance adapters for ETL integration and data ingestions. The Streaming Data Adapters are out of box adapters using these API for specific data sources and use cases.

  • MaxScale CDC Data Adapter is integration of the MaxScale CDC streams into MariaDB ColumnStore.
  • Kafka Data Adapter is integration of the Kafka streams into MariaDB ColumnStore.

MaxScale CDC Data Adapter

The MaxScale CDC Data Adapter allows to stream change data events(binary log events) from MariaDB Master hosting non-columnstore engines(InnoDB, MyRocks, MyISAM) to MariaDB ColumnStore. In another words replicate data from MariaDB Master to MariaDB ColumnStore. It acts as a CDC Client for MaxScale and uses the events received from MaxScale as input to MariaDB ColumnStore Bulk Data API to push the data to MariaDB ColumnStore.
maxscale-cdc-adapter
It registers with MariaDB MaxScale as a CDC Client using the MaxScale CDC Connector API, receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore bulk write SDK, converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).

Installation

Pre-requisite:

CentOS 7

sudo yum -y install epel-release
sudo yum -y install <data adapter>.rpm

Debian 9/Ubuntu Xenial:

sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Debian 8:

sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list
sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Usage

Usage: mxs_adapter [OPTION]... DATABASE TABLE

  DATABASE       Source & Target database
  TABLE          Table to stream

  -h HOST      MaxScale host
  -P PORT      Port number where the CDC service listens
  -u USER      Username for the MaxScale CDC service
  -p PASSWORD  Password of the user
  -c CONFIG    Path to the Columnstore.xml file (installed by MariaDB ColumnStore)
  -r ROWS      Number of events to group for one bulk load (default: 1)
  -t TIMEOUT   Timeout in seconds (default: 10)
  -n           Disable metadata generation (timestamp, GTID, event type)
  -i TIME      Flush data after being idle for this many seconds (default: 5)
  -l FILE      Log output to filename given as argument

Quick Start

Download and install both MaxScale and ColumnStore.

Copy the Columnstore.xml file from /usr/local/mariadb/columnstore/etc/Columnstore.xml from one of the ColumnStore UM or PM node to the server where the adapter is installed.

Configure MaxScale according to the CDC tutorial.

Create a CDC user by executing the following MaxAdmin command on the MaxScale server. Replace the `<service>` with the name of the avrorouter service and `<user>` and `<password>` with the credentials that are to be created.

maxadmin call command cdc add_user <service> <user> <password>

Then we can start the adapter by executing the following command.

mxs_adapter -u <user> -p <password> -h <host> -P <port> -c <path to Columnstore.xml> <database><table>

The `<database>` and `<table>` define the table that is streamed to ColumnStore. This table should exist on the master server where MaxScale is reading events from. If the table is not created on ColumnStore, the adapter will print instructions on how to define it in the correct way.

The `<user>` and `<password>` are the users created for the CDC user, `<host>` is the MaxScale address and `<port>` is the port where the CDC service listener is listening.

The `-c` flag is optional if you are running the adapter on the server where ColumnStore is located.

Kafka to ColumnStore Adapter

The Kafka data adapter streams all messages published to Apache Kafka topics in Avro format to MariaDB AX automatically and continuously - enabling data from many sources to be streamed and collected for analysis without complex code. The Kafka adapter is built using librdkafka and the MariaDB ColumnStore bulk write SDK
kafka-data-adapter

A tutorial for the Kafka adapter for ingesting Avro formatted data can be found in the kafka-to-columnstore-data-adapter document.

ColumnStore - Pentaho Data Integration - Data Adapter

Starting with MariaDB ColumnStore 1.1.4, a data adapter for Pentaho Data Integration (PDI) / Kettle is available to import data directly into ColumnStore’s WriteEngine. It is build on MariaDB’s rapid paced Bulk Write SDK.


PDI Plugin Block info graphic

Compatibility notice

The plugin was designed for following software composition:

  • Operating system: Ubuntu 16.04 / RHEL/CentOS+ 7
  • MariaDB ColumnStore >= 1.1.4
  • MariaDB Java Database client* >= 2.2.1
  • Java >= 8
  • Pentaho Data Integration >= 7

+not officially supported by Pentaho.

*only needed if you want to execute DDL.

Installation

Following steps are necessary to install the ColumnStore Data adapter (bulk loader plugin):

  1. build the plugin from source or download it from our website
  2. extract the archive kettle-columnstore-bulk-exporter-plugin-*.zip into your PDI installation directory $PDI-INSTALLATION/plugins.
  3. copy MariaDB's JDBC Client mariadb-java-client-2.2.x.jar into PDI's lib directory $PDI-INSTALLATION/lib.
  4. install the additional library dependencies

Ubuntu dependencies

sudo apt-get install libuv1

CentOS dependencies

sudo yum install epel-release
sudo yum install libuv

Now the plugin is available for transformations under the bulk loading section.

Configuration

Each MariaDB ColumnStore Bulk Loader block needs to be configured. On the one hand it needs to know how to connect to the underlying Bulk Write SDK to inject data into ColumnStore, and on the other hand it needs to have a proper JDBC connection to execute DDL.

Both configurations can be set in each block’s settings tab.


PDI Plugin Block settings info graphic

The database connection configuration follows PDI’s default schema.

By default the plugin tries to use ColumnStore's default configuration /usr/local/mariadb/columnstore/etc/Columnstore.xml to connect to the ColumnStore instance through the Bulk Write SDK. In addition, individual paths or variables can be used too.

Information on how to prepare the Columnstore.xml configuration file can be found here.

Usage


PDI Plugin Block mapping info graphic

Once a block is configured and all inputs are connected in PDI, the inputs have to be mapped to ColumnStore’s table format.

One can either choose “Map all inputs”, which sets target columns of adequate type, or choose a custom mapping based on the structure of the existing table.

The SQL button can be used to generate DDL based on the defined mapping and to execute it.

Limitations

This plugin is a beta release.

In addition, it can't handle blob data types and only supports multi inputs to one block if the input field names are equal for all input sources.

Comments

Comments loading...