MariaDB for Analytics + Spark
March 12, 2019
With the mainstream use of machine learning analytics, Apache Spark has become one of the most popular platforms for machine learning libraries. Spark provides machine learning algorithms such as classification, regression, clustering and collaborative filtering in Scala, Python, R and Java so that data scientists can easily apply those algorithms in large datasets with in-memory performance. The challenge is that many data scientists find it difficult to operationalize their insights.
To make valuable insights available faster and easier, MariaDB introduces Spark Connector to stream the data insights found in Spark to the data warehouse, MariaDB. MariaDB Spark Connector takes the result set in Spark DataFrame and sends it to MariaDB ColumnStore via the bulk write API. By leveraging ColumnStore’s streaming data adapter, data can be ingested at high speed – 100x faster than generic JDBC. Spark ML can also be applied directly on the data in ColumnStore using standard JDBC.