August 26, 2016

Configuring MariaDB Master and MariaDB MaxScale for Data Streaming

In the previous blog, I introduced Data Streaming with MariaDB MaxScale. In this post, I will show you how to configure a MariaDB Master server and MariaDB MaxScale to stream binary log events from the Master server to MaxScale, and convert them to AVRO records.

First of all, some checks and eventually some modifications will be needed in the MariaDB 10 database.

Configuring the Master Database

MariaDB MaxScale requires that the binary log events provided to each row be modified rather than the operation on the row, whether it was inserted, updated or deleted. Additionally, in order to stream the entire content of the row being modified by a binary log event, replication on the Master database must be configured with ‘row’ format. Simply edit my.cnf, add the two required options and restart mysqld process.

[mysqld]
....
binlog_format=row
binlog_row_image=full

This way, in row-based replication, each row change event contains two images, a “before” image whose columns are matched against when searching for the row to be updated, and an “after” image containing the changes.

You can find out more about replication formats from the MariaDB Knowledge Base.

Configuring the MaxScale Server

MariaDB MaxScale can be configured to run on the same server as MariaDB 10 Master or separately on a dedicated server. In this blog we will show you configurations when MariaDB MaxScale is running on its own dedicated server.

If MariaDB MaxScale is installed on a dedicated server, it needs to register with the Master database as a Slave. To do so, the binlog server should be configured to MariaDB MaxScale in order to receive binlog events from the MariaDB 10 Master database.

AVRO Service Setup

We start by adding two new services into the configuration file. The first service (replication service) uses the binlogrouter plugin which reads the binary logs from the Master server.

The binlog router supports both old style binlog events without GTID and MariaDB 10 binlog events with GTID. However for the purpose of data streaming, MariaDB 10 GTID is required, so we need to specifically set the router_options mariadb10-compatibility to 1.

# The Replication Proxy service
[replication-service]
type=service
router=binlogrouter
router_options=server-id=4000,
               master-id=3000,
               binlogdir=/var/lib/maxscale/binlog/,
               mariadb10-compatibility=1
user=maxuser
passwd=maxpwd

The second service (avro-service) reads the binary logs as they are streamed from the Master through the binlog router and converts them into AVRO format files. Please note that the source of the binary log events is set to “replication-service” - this is what enables AVRO-service to continuously consume the binary log events received by the replication and converts them to AVRO records. The router option avrodir is the directory where the converted AVRO records are stored. The router option filestem is the prefix of the source binlog files.

# The Avro conversion service
[avro-service]
type=service
router=avrorouter
source=replication-service
router_options=avrodir=/var/lib/maxscale/avro/,
               filestem=binlog

Next, we need to setup the listener on MaxScale, so that MaxScale can be administratively configured to register with the Master server as a Slave.

# The listener for the replication-service
[replication-listener]
type=listener
service=replication-service
protocol=MySQLClient
port=4000

Now, we need to set up the listener on MaxScale where the CDC protocol clients can request AVRO change data records.

# The client listener for the avro-service
[avro-listener]
type=listener
service=avro-service
protocol=CDC
port=4001

AVRO Schema Setup

Before starting the conversion process, AVRO schema files for all the tables that need to be replicated need to be on the MaxScale server.

Every converted AVRO record corresponds to a single table record and contains the schema of the table including schema id that refers to the id of an AVRO schema file. The schema files are in JSON format per Avro specification and stored in $avrodir.  Before the conversion process starts, so that MaxScale can generate these schema file, either:

  • the binary log events needs to include CREATE TABLE and any ALTER TABLE events, OR;

  • The schema files needs to be created on MaxScale manually using the cdc_schema Go utility.

All AVRO file schemas follow the same general idea. They are in JSON and follow the following format:

{
    "Namespace": "MaxScaleChangeDataSchema.avro",
    "Type": "record",
    "Name": "ChangeRecord",
    "Fields":
    [
        {
            "Name": "name",
            "Type": "string"
        },
        {
            "Name":"address",
            "Type":"string"
        },
        {
            "Name":"age",
            "Type":"int"
        }
    ]
}

The AVRO converter uses the schema file to identify the columns, their names and what type they are. The Name field contains the name of the column and the Type contains the AVRO type. Read the AVRO specification for details on the layout of the schema files.

Starting MariaDB MaxScale

The next step is to start MariaDB MaxScale and set up the binlog server. We do that by connecting to the listener of the replication_router service and executing a Slave command to set the Master server, then start the Slave on MaxScale.

An example of CHANGE MASTER command required to configure the binlog server:

# mysql -h maxscale-host -P 4000
CHANGE MASTER TO MASTER_HOST='172.18.0.1',
       MASTER_PORT=3000,
       MASTER_LOG_FILE='binlog.000001',
       MASTER_LOG_POS=4,
       MASTER_USER='maxuser',
       MASTER_PASSWORD='maxpwd';
START SLAVE;

That command will start the replication of binary logs from the Master server at 172.18.0.1:3000. After the binary log streaming has started, the AVRO router will automatically start converting the binlogs into AVRO files. Please note that the first binlog file to be converted is the one with the highest sequence number in the $binlogdir with the $filestem prefix.

Now, let us create an example table and insert some data into it to see if our setup works.

First, create a simple test table using the following statement and populate it.

CREATE TABLE test.t1 (id INT); 
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);

Table creation and insert data events will be converted into an AVRO file, which can be inspected by using the maxavrocheck utility program.

[maxscale@localhost]$ maxavrocheck /var/lib/maxscale/avro/test.t1.000001.avro
File sync marker: caaed7778bbe58e701eec1f96d7719a
/var/lib/maxscale/avro/test.t1.000001.avro: 1 blocks, 1 records and 12 bytes

Now we can see the generated AVRO record. In the next blog post, we will explore how these AVRO records can be requested in real time with the MariaDB MaxScale CDC API.

Related Blog Posts

Data Streaming with MariaDB MaxScale

MaxScale as a Replication Proxy, aka the Binlog Server

About Massimiliano Pinto

Massimiliano is a Senior Software Solutions Engineer working mainly on MaxScale. Massimiliano has worked for almost 15 years in Web Companies playing the roles of Technical Leader and Software Engineer. Prior to joining MariaDB he worked at Banzai Group and Matrix S.p.A, big players in the Italy Web Industry. He is still a guy who likes too much the terminal window on his Mac. Apache modules and PHP extensions skills are included as well.

Read all posts by Massimiliano Pinto