Installation and Configuration

This document describes the installation and configuration of MariaDB ColumnStore 1.2, Apache Spark 2.4.0, and mcsapi for Spark in a dockerized lab environment. Production system installations need to follow the same steps. Installation and configuration commands and paths might change depending on your operating systems, software versions, and network setup.

Lab environment setup

The lab environment consists of:

  • A multi node MariaDB ColumnStore 1.2 installation with 1 user module (UM) and 2 performance modules (PMs)
  • A multi node Apache Spark 2.4 installation with 1 Spark driver and 2 Spark workers

It is defined through following docker-compose.yml configuration.

To start the lab environment download it and go to the folder containing the docker-compose.yml file. Then execute:

docker-compose up -d

This will spin up the environment with six containers.

Installation of mcsapi for Spark

To utilize mcsapi for Spark’s functions you have to install it on every Spark worker node as well as the Spark driver. Therefore, you first have to set up the regarding software repository via:

docker exec -it {SPARK_MASTER | SPARK_WORKER_1 | SPARK_WORKER_2} bash #to get a shell in the docker container instance
apt-get update
apt-get install -y apt-transport-https dirmngr wget
echo "deb https://downloads.mariadb.com/MariaDB/mariadb-columnstore-api/latest/repo/debian9 stretch main" > /etc/apt/sources.list.d/mariadb-columnstore-api.list

Then add the repository key and refresh the repositories via:

wget -qO - https://downloads.mariadb.com/MariaDB/mariadb-columnstore/MariaDB-ColumnStore.gpg.key | apt-key add -
apt-get update

And finally install mcsapi for Spark and its dependencies:

apt-get install -y mariadb-columnstore-api-spark

It is further advised to install the MariaDB Java JDBC library on the Spark driver to be able to execute DDL.

cd ${SPARK_HOME}/jars
wget https://downloads.mariadb.com/Connectors/java/connector-java-2.3.0/mariadb-java-client-2.3.0.jar

For other operating systems, please follow the dedicated installation document in our Knowledge Base.

Spark configuration

To configure Spark to use mcsapi for Spark two more actions need to be executed.

On the one hand, the Spark master’s configuration needs to be adapted to utilize the newly introduced Java libraries for javamcsapi and mcsapi for Spark.

cd ${SPARK_HOME}/conf   # if ${SPARK_CONF_DIR} is set it needs to be used instead
echo "spark.driver.extraClassPath /usr/lib/javamcsapi.jar:/usr/lib/spark-scala-mcsapi-connector.jar" >> spark-defaults.conf
echo "spark.executor.extraClassPath /usr/lib/javamcsapi.jar:/usr/lib/spark-scala-mcsapi-connector.jar" >> spark-defaults.conf

On the other hand, mcsapi for Spark needs information about the ColumnStore cluster to write data to. This information needs to be provided in form of a Columnstore.xml configuration file. This needs to be copied from ColumnStore’s um1 node to the Spark master and each Spark worker node.

docker cp COLUMNSTORE_UM_1:/usr/local/mariadb/columnstore/etc/Columnstore.xml .
docker exec -it {SPARK_MASTER | SPARK_WORKER_1 | SPARK_WORKER_2} mkdir -p /usr/local/mariadb/columnstore/etc
docker cp Columnstore.xml {SPARK_MASTER | SPARK_WORKER_1 | SPARK_WORKER_2}:/usr/local/mariadb/columnstore/etc

More information about creating appropriate Columnstore.xml configuration files and Spark configuration changes can be found in our Knowledge Base.

Firewall setup

In production environments with installed firewalls you have to ensure that the Spark master and worker nodes can reach TCP port 3306 on the ColumnStore user modules, and TCP ports 8616, 8630, and 8800 on the ColumnStore performance modules. The lab environment is already fully configured, therefore there is nothing to do in this case.

Finishing note

Note that the configured Spark container aren’t persistent. Once the container are stopped you have to install and configure mcsapi for Spark again. You could use docker commit to save your changes. Feel free to check out our Interactive test environments if you want to tinker around further with mcsapi for Spark.