ColumnStore Streaming Data Adapters
Contents
The ColumnStore Bulk Data API enable creation of higher performance adapters for ETL integration and data ingestions. The Streaming Data Adapters are out of box adapters using these API for specific data sources and use cases.
- MaxScale CDC Data Adapter is integration of the MaxScale CDC streams into MariaDB ColumnStore.
- Kafka Data Adapter is integration of the Kafka streams into MariaDB ColumnStore.
MaxScale CDC Data Adapter
The MaxScale CDC Data Adapter allows to stream change data events(binary log events) from MariaDB Master hosting non-columnstore engines(InnoDB, MyRocks, MyISAM) to MariaDB ColumnStore. In another words replicate data from MariaDB Master to MariaDB ColumnStore. It acts as a CDC Client for MaxScale and uses the events received from MaxScale as input to MariaDB ColumnStore Bulk Data API to push the data to MariaDB ColumnStore. It registers with MariaDB MaxScale as a CDC Client using the MaxScale CDC Connector API, receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore Bulk Data Adapter API, converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).
Installation
CentOS 7
sudo yum -y install epel-release sudo yum -y install <data adapter>.rpm
Debian 9/Ubuntu Xenial:
sudo apt-get update sudo dpkg -i <data adapter>.deb sudo apt-get -f install
Debian 8:
sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list sudo apt-get update sudo dpkg -i <data adapter>.deb sudo apt-get -f install
Usage
Usage: mxs_adapter [OPTION]... DATABASE TABLE DATABASE Source & Target database TABLE Table to stream -h HOST MaxScale host -P PORT Port number where the CDC service listens -u USER Username for the MaxScale CDC service -p PASSWORD Password of the user -c CONFIG Path to the Columnstore.xml file (installed by MariaDB ColumnStore) -r ROWS Number of events to group for one bulk load (default: 1) -t TIMEOUT Timeout in seconds (default: 10)
Quick Start
Download and install both MaxScale and ColumnStore.
Copy the Columnstore.xml file from
/usr/local/mariadb/columnstore/etc/Columnstore.xml
from one of the ColumnStore UM or PM node to the server where the adapter is installed.
Configure MaxScale according to the CDC tutorial.
Create a CDC user by executing the following MaxAdmin command on the MaxScale server. Replace the `<service>` with the name of the avrorouter service and `<user>` and `<password>` with the credentials that are to be created.
maxadmin call command cdc add_user <service> <user> <password>
Then we can start the adapter by executing the following command.
mxs_adapter -u <user> -p <password> -h <host> -P <port> -c <path to Columnstore.xml> <database><table>
The `<database>` and `<table>` define the table that is streamed to ColumnStore. This table should exist on the master server where MaxScale is reading events from. If the table is not created on ColumnStore, the adapter will print instructions on how to define it in the correct way.
The `<user>` and `<password>` are the users created for the CDC user, `<host>` is the MaxScale address and `<port>` is the port where the CDC service listener is listening.
The `-c` flag is optional if you are running the adapter on the server where ColumnStore is located.
Kafka Data Adapter
Installation
CentOS 7
sudo yum -y install epel-release sudo yum -y install <data adapter>.rpm
Debian 9/Ubuntu Xenial:
sudo apt-get update sudo dpkg -i <data adapter>.deb sudo apt-get -f install
Debian 8:
sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list sudo apt-get update sudo dpkg -i <data adapter>.deb sudo apt-get -f install
Usage
Usage: mcskafka [OPTION...] BROKER TOPIC SCHEMA TABLE mcskafka - A Kafka consumer to write to MariaDB ColumnStore -g, --group=GROUP_ID The Kafka group ID (default 1) -?, --help Give this help list --usage Give a short usage message -V, --version Print program version
- BROKER: The host/IP of the Kafka broker server
- TOPIC: The Kafka topic to consume
- SCHEMA: The target ColumnStore schema name
- TABLE: The target ColumnStore table name
Quick Start
- Setup MaxScale to use CDC as with the CDC adapter
- Setup a Kakfa producer as indicated in this blog post
- Create a table on your ColumnStore installation with the same schema as the one used by the Kafka producer
- Run mcskafka on a server that has the Columnstore.xml configuration file (such as the UM server) in a similar way to this:
mcskafka localhost CDC_DataStream test t1
Insert queries will be automatically streamed into ColumnStore using the bulk write API.