ColumnStore Streaming Data Adapters

You are viewing an old version of this article. View the current version here.

The ColumnStore Bulk Data API enable creation of higher performance adapters for ETL integration and data ingestions. The Streaming Data Adapters are out of box adapters using these API for specific data sources and use cases.

  • MaxScale CDC Data Adapter is integration of the MaxScale CDC streams into MariaDB ColumnStore.
  • Kafka Data Adapter is integration of the Kafka streams into MariaDB ColumnStore.

MaxScale CDC Data Adapter

The MaxScale CDC Data Adapter allows to stream change data events(binary log events) from MariaDB Master hosting non-columnstore engines(InnoDB, MyRocks, MyISAM) to MariaDB ColumnStore. In another words replicate data from MariaDB Master to MariaDB ColumnStore. It acts as a CDC Client for MaxScale and uses the events received from MaxScale as input to MariaDB ColumnStore Bulk Data API to push the data to MariaDB ColumnStore.
maxscale-cdc-adapter
It registers with MariaDB MaxScale as a CDC Client using the MaxScale CDC Connector API, receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore Bulk Data Adapter API, converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).

Installation

CentOS 7

sudo yum -y install epel-release
sudo yum -y install <data adapter>.rpm

Debian 9/Ubuntu Xenial:

sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Debian 8:

sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list
sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Usage

Usage: mxs_adapter [OPTION]... DATABASE TABLE

  DATABASE       Source & Target database
  TABLE          Table to stream

  -h HOST      MaxScale host
  -P PORT      Port number where the CDC service listens
  -u USER      Username for the MaxScale CDC service
  -p PASSWORD  Password of the user
  -c CONFIG    Path to the Columnstore.xml file (installed by MariaDB ColumnStore)
  -r ROWS      Number of events to group for one bulk load (default: 1)
  -t TIMEOUT   Timeout in seconds (default: 10)

Quick Start

Download and install both MaxScale and ColumnStore.

Copy the Columnstore.xml file from /usr/local/mariadb/columnstore/etc/Columnstore.xml from one of the ColumnStore UM or PM node to the server where the adapter is installed.

Configure MaxScale according to the CDC tutorial.

Create a CDC user by executing the following MaxAdmin command on the MaxScale server. Replace the `<service>` with the name of the avrorouter service and `<user>` and `<password>` with the credentials that are to be created.

maxadmin call command cdc add_user <service> <user> <password>

Then we can start the adapter by executing the following command.

mxs_adapter -u <user> -p <password> -h <host> -P <port> -c <path to Columnstore.xml> <database><table>

The `<database>` and `<table>` define the table that is streamed to ColumnStore. This table should exist on the master server where MaxScale is reading events from. If the table is not created on ColumnStore, the adapter will print instructions on how to define it in the correct way.

The `<user>` and `<password>` are the users created for the CDC user, `<host>` is the MaxScale address and `<port>` is the port where the CDC service listener is listening.

The `-c` flag is optional if you are running the adapter on the server where ColumnStore is located.

Kafka Data Adapter

Installation

CentOS 7

sudo yum -y install epel-release
sudo yum -y install <data adapter>.rpm

Debian 9/Ubuntu Xenial:

sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Debian 8:

sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list
sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Usage

Usage: mcskafka [OPTION...] BROKER TOPIC SCHEMA TABLE
mcskafka - A Kafka consumer to write to MariaDB ColumnStore

  -g, --group=GROUP_ID       The Kafka group ID (default 1)
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version
  • BROKER: The host/IP of the Kafka broker server
  • TOPIC: The Kafka topic to consume
  • SCHEMA: The target ColumnStore schema name
  • TABLE: The target ColumnStore table name

Quick Start

  1. Setup MaxScale to use CDC as with the CDC adapter
  2. Setup a Kakfa producer as indicated in this blog post
  3. Create a table on your ColumnStore installation with the same schema as the one used by the Kafka producer
  4. Run mcskafka on a server that has the Columnstore.xml configuration file (such as the UM server) in a similar way to this:
mcskafka localhost CDC_DataStream test t1

Insert queries will be automatically streamed into ColumnStore using the bulk write API.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.