Installation and Configuration¶
This document describes the installation and configuration of MariaDB ColumnStore 1.2, Apache Spark 2.4.0, and mcsapi for Spark in a dockerized lab environment. Production system installations need to follow the same steps. Installation and configuration commands and paths might change depending on your operating systems, software versions, and network setup.
Lab environment setup¶
The lab environment consists of:
- A multi node MariaDB ColumnStore 1.2 installation with 1 user module (UM) and 2 performance modules (PMs)
- A multi node Apache Spark 2.4 installation with 1 Spark driver and 2 Spark workers
It is defined through following docker-compose.yml configuration.
To start the lab environment download it and go to the folder containing the docker-compose.yml file. Then execute:
docker-compose up -d
This will spin up the environment with six containers.
Installation of mcsapi for Spark¶
To utilize mcsapi for Spark’s functions you have to install it on every Spark worker node as well as the Spark driver. Therefore, you first have to set up the regarding software repository via:
docker exec -it {SPARK_MASTER | SPARK_WORKER_1 | SPARK_WORKER_2} bash #to get a shell in the docker container instance
apt-get update
apt-get install -y apt-transport-https dirmngr wget
echo "deb https://downloads.mariadb.com/MariaDB/mariadb-columnstore-api/latest/repo/debian9 stretch main" > /etc/apt/sources.list.d/mariadb-columnstore-api.list
Then add the repository key and refresh the repositories via:
wget -qO - https://downloads.mariadb.com/MariaDB/mariadb-columnstore/MariaDB-ColumnStore.gpg.key | apt-key add -
apt-get update
And finally install mcsapi for Spark and its dependencies:
apt-get install -y mariadb-columnstore-api-spark
It is further advised to install the MariaDB Java JDBC library on the Spark driver to be able to execute DDL.
cd ${SPARK_HOME}/jars
wget https://downloads.mariadb.com/Connectors/java/connector-java-2.3.0/mariadb-java-client-2.3.0.jar
For other operating systems, please follow the dedicated installation document in our Knowledge Base.
Spark configuration¶
To configure Spark to use mcsapi for Spark two more actions need to be executed.
On the one hand, the Spark master’s configuration needs to be adapted to utilize the newly introduced Java libraries for javamcsapi and mcsapi for Spark.
cd ${SPARK_HOME}/conf # if ${SPARK_CONF_DIR} is set it needs to be used instead
echo "spark.driver.extraClassPath /usr/lib/javamcsapi.jar:/usr/lib/spark-scala-mcsapi-connector.jar" >> spark-defaults.conf
echo "spark.executor.extraClassPath /usr/lib/javamcsapi.jar:/usr/lib/spark-scala-mcsapi-connector.jar" >> spark-defaults.conf
On the other hand, mcsapi for Spark needs information about the ColumnStore cluster to write data to. This information needs to be provided in form of a Columnstore.xml configuration file. This needs to be copied from ColumnStore’s um1 node to the Spark master and each Spark worker node.
docker cp COLUMNSTORE_UM_1:/usr/local/mariadb/columnstore/etc/Columnstore.xml .
docker exec -it {SPARK_MASTER | SPARK_WORKER_1 | SPARK_WORKER_2} mkdir -p /usr/local/mariadb/columnstore/etc
docker cp Columnstore.xml {SPARK_MASTER | SPARK_WORKER_1 | SPARK_WORKER_2}:/usr/local/mariadb/columnstore/etc
More information about creating appropriate Columnstore.xml configuration files and Spark configuration changes can be found in our Knowledge Base.
Firewall setup¶
In production environments with installed firewalls you have to ensure that the Spark master and worker nodes can reach TCP port 3306 on the ColumnStore user modules, and TCP ports 8616, 8630, and 8800 on the ColumnStore performance modules. The lab environment is already fully configured, therefore there is nothing to do in this case.
Finishing note¶
Note that the configured Spark container aren’t persistent. Once the container are stopped you have to install and configure mcsapi for Spark again. You could use docker commit
to save your changes. Feel free to check out our Interactive test environments if you want to tinker around further with mcsapi for Spark.