MariaDB ColumnStore with Spark

You are viewing an old version of this article. View the current version here.

Introduction

Apache Spark (http://spark.apache.org/) is a popular open source data processing engine. It can be integrated with MariaDB ColumnStore utilizing the Spark SQL feature.

Configuration

The following steps outline installing and configuring the MariaDB Java Connector to be available to the spark runtime:

  • The latest version of the MariaDB Java Connector should be downloaded from https://mariadb.com/downloads/connector and copied to the master node, e.g. under /usr/share/java.
  • The configuration file /usr/local/spark/conf/sparks-default.conf should be created or updated to point to the jdbc directory:
spark.driver.extraClassPath /usr/share/java/mariadb-java-client-1.5.7.jar
spark.executor.extraClassPath /usr/share/java/mariadb-java-client-1.5.7.jar

Usage

Currently Spark does not correctly recognize mariadb specific jdbc connect strings and so the jdbc:mysql syntax must be used. The followings shows a simple pyspark script to query the results from ColumnStore UM server columnstore_1 into a spark dataframe:

from pyspark import SparkContext
from pyspark.sql import DataFrameReader, SQLContext
url = 'jdbc:mysql://columnstore_1:3306/test'
properties = {'user': 'root', 'driver': 'org.mariadb.jdbc.Driver'}
sc = SparkContext("local", "ColumnStore Simple Query Demo")
sqlContext = SQLContext(sc)
df = DataFrameReader(sqlContext).jdbc(url='%s' % url, table='results', properties=properties)
df.show()

Spark SQL currently offers very limited push down capabilities, so to take advantage of ColumnStore's ability to perform efficient group by, then an inline table must be used, for example:

df = DataFrameReader(sqlContext).jdbc(url='%s' % url, 
    table='( select year, sum(closed_roll_assess_land_value) sum_land_value from property_tax group by year) pt',
                                      properties=properties)

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.