Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale

In this blog post, we look at how to configure Change Data Capture (CDC) from the MariaDB Server to MariaDB ColumnStore via MariaDB MaxScale. Our goal in this blog post is to have our analytical ColumnStore instance reflect the changes that happen on our operational MariaDB Server.

MariaDB MaxScale Configuration

We start by creating a MaxScale configuration with a binlogrouter and avrorouter instances. The
former acts as a replication slave and fetches binary logs and the latter processes the binary logs
into CDC records.

[replication-router]
type=service
router=binlogrouter
user=maxuser
passwd=maxpwd
server_id=2
master_id=1
binlogdir=/var/lib/maxscale
mariadb10-compatibility=1
filestem=mariadb-bin

[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=3306

[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

[avro-listener]
type=listener
service=avro-router
protocol=cdc
port=4001

Copy the contents of this file into the `maxscale.cnf` file.

The docker-compose.yml File

The next step is to clone the MaxScale repository and to create the docker-compose file.

To clone the MaxScale repository, execute the following command.

git clone https://github.com/mariadb-corporation/MaxScale.git --branch=2.2 --depth=1

After the command completes, create the `docker-compose.yml` file with the following contents in the
same directory where you cloned MaxScale.

version: '2'
services:
    master:
        image: mariadb:10.2
        container_name: master
        environment:
            MYSQL_ALLOW_EMPTY_PASSWORD: Y
        command: mysqld --log-bin=mariadb-bin --binlog-format=ROW --server-id=1
        ports:
            - "3306:3306"

    maxscale:
        build: ./MaxScale/docker/
        container_name: maxscale
        volumes:
            - ./maxscale.cnf:/etc/maxscale.cnf.d/maxscale.cnf
        ports:
            - "3307:3306"
            - "4001:4001"

    mcs:
        image: mariadb/columnstore_singlenode:latest
        container_name: mcs
        ports:
            - "3308:3306"

    adapter:
        image: centos:7
        container_name: adapter
        command: /bin/sleep 0xffffffff

This file contains a MariaDB Server that acts as the master server, a MaxScale instance in a CDC
configuration and a single-node ColumnStore container. We also use a plain CentOS 7 container where
we install the adapter.

To start the cluster, run the following commands.

docker-compose build
docker-compose up -d

Configuring

The next step is to copy the ColumnStore configuration file from the `mcs` container and modify it
to use the container hostname instead of the loopback address. To do this, execute the following
commands.

docker cp mcs:/usr/local/mariadb/columnstore/etc/Columnstore.xml .
sed -i 's/127.0.0.1/mcs/' Columnstore.xml
docker cp Columnstore.xml adapter:/etc/Columnstore.xml

After we have copied the configuration file into the `adapter` container, we are ready to install the adapter.

Installing Adapter

To access the container, execute `docker-compose exec adapter bash`. This will launch a new shell
where the following commands will be executed.

yum -y install epel-release
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-columnstore-api/1.1.3/centos/x86_64/7/mariadb-columnstore-api-1.1.3-1-x86_64-centos7.rpm
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-streaming-data-adapters/cdc-data-adapter/1.1.3/centos-7/mariadb-columnstore-maxscale-cdc-adapters-1.1.3-1-x86_64-centos7.rpm

After the adapter is installed, exit the shell.

Next we can start preparing the data on the master server and configure the replication between it
and MaxScale.

Preparing Data and Configuring Replication

We connect to the MariaDB Server running on the `master` container with the following command.

mysql -uroot -h 127.0.0.1 -P 3306

Once connected, executing the following SQL. This will prepare the server, create a table and insert
some dummy data into the table. It also modified the data to emulate changes in the database.

RESET MASTER;
CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
GRANT ALL ON *.* TO 'maxuser'@'%';
CREATE DATABASE test;
USE test;
CREATE TABLE t1(id INT);
INSERT INTO t1 VALUES (1), (2), (3);
UPDATE t1 SET id = 4 WHERE id = 2;
DELETE FROM t1 WHERE id = 3;

Once we have created some data, we configure the replication between MaxScale and the master
server. To do this, execute the following command.

mysql -umaxuser -pmaxpwd -h 127.0.0.1 -P 3307 -e "CHANGE MASTER TO MASTER_HOST='master', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4; START SLAVE"

MaxScale will start to replicate events from the master server and process them into CDC records.

Create CDC User

To use the CDC system in MaxScale, we have to create a user for it. Execute the following command to create a user.

docker-compose exec maxscale maxctrl call command cdc add_user avro-router cdcuser cdcpassword

Starting the Adapter

We again execute the commands inside the adapter container. To access the container, execute
`docker-compose exec adapter bash`.

Once inside the container, we can try to start the adapter. Given that the table `test.t1` does not
exist on ColumnStore, the adapter will give us an error when we try to start it:

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
Table not found, create with:

    CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;

To create the table on ColumnStore, we have to exit the container. Once out of the container, we
connect to the ColumnStore container and create the table described in the error message with the
following command.

mysql -uroot -h 127.0.0.1 -P 3308 -e "CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;"

Once the table is created, we go back into the adapter container with `docker-compose exec adapter
bash` and try to start it again.

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
4 rows and 1 transactions inserted in 0.210798 seconds. GTID = 0-1-6
2 rows and 1 transactions inserted in 0.164197 seconds. GTID = 0-1-7

This time we see that it processed a total of six rows of data. We can now connect to the
ColumnStore on another terminal and see what the table contains.

[markusjm@localhost blog]$ mysql -uroot -h 127.0.0.1 -P 3308 -e "SELECT * FROM test.t1"
+--------+--------------+---------------+------+----------+-----------+------------+
| domain | event_number | event_type    | id   | sequence | server_id | timestamp  |
+--------+--------------+---------------+------+----------+-----------+------------+
|      0 |            1 | insert        |    1 |        5 |         1 | 1523948280 |
|      0 |            2 | insert        |    2 |        5 |         1 | 1523948280 |
|      0 |            3 | insert        |    3 |        5 |         1 | 1523948280 |
|      0 |            1 | update_before |    2 |        6 |         1 | 1523948280 |
|      0 |            2 | update_after  |    4 |        6 |         1 | 1523948280 |
|      0 |            1 | delete        |    3 |        7 |         1 | 1523948281 |
+--------+--------------+---------------+------+----------+-----------+------------+

The changes we did on the master MariaDB Server have been propagated to ColumnStore. To understand
what the values are, we can map the SQL statements to the rows in the table.

The first SQL statement is `INSERT INTO t1 VALUES (1), (2), (3);` which inserts three values into
the table. We see that the first three rows in the resultset are of type `insert` and the values
match what we inserted.

The next SQL statement is `UPDATE t1 SET id = 4 WHERE id = 2;` which only touches one row. Although
it modifies only one row in the database, it generated two rows in ColumnStore. This happened
because the MaxScale CDC system stores both the before and after images of the modified row. This
allows easy comparisons between new and old values.

The final SQL statement was `DELETE FROM t1 WHERE id = 3;` which deleted one row. This statement was
converted to a delete entry with the data that was deleted (row with `id` of 3). This allows deleted
data to be retained for analytical and auditing purposes without actually storing it on the master
database.