ColumnStore Bulk Write SDK
Introduction
Starting with MariaDB ColumnStore 1.1 a C++ SDK is available which supports bulk write into ColumnStore. Conceptually this is an API version of cpimport. The SDK is intended to be integrated by custom code and adapters to enable easier publishing of data into ColumnStore.
The API is licensed under LGPLv3.
Getting Started
Prebuilt binary packages may be downloaded here or you can build from scratch from here. To build from scratch see the appropriate building document below in the Documentation section, latest version here.
Package Installation
RHEL / CentOS 7
The following libraries need to be installed on the system for the package install:
yum install epel-release yum install libuv libxml2 snappy python34
The API rpm can be installed via:
rpm -ivh mariadb-columnstore-api-*-centos7.rpm
Ubuntu 16 / Debian 9
The following libraries need to be installed on the system for the package install:
apt-get install libuv1 libxml2 libsnappy1v5
The API deb package can be installed via:
dpkg -i mariadb-columnstore-api-*.deb
Debian 8
The following libraries need to be installed on the system for the package install (jessie-backports is needed to install libuv1):
echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list apt-get update apt-get install libuv1 libxml2 libsnappy1
The API deb package can be installed:
dpkg -i mariadb-columnstore-api-*.deb
In addition installing OpenJDK8 for java support requires the backports repo and the following command:
apt-get install -t jessie-backports openjdk-8-jdk
Environment Configuration
If the SDK is being installed to a server that is not part of a MariaDB ColumnStore server then it requires a local copy of the ColumnStore.xml file in order to determine how to connect to ColumnStore. The simplest approach is to copy this file from one of the ColumnStore servers to one of the following 2 locations on the SDK server ensuring read privileges for the OS user being used:
- /usr/local/mariadb/columnstore/etc/Columnstore.xml
- $COLUMNSTORE_INSTALL_DIR/etc/Columnstore.xml
Alternatively a custom file location may be passed as an argument to the ColumnStoreDriver constructor. This is also necessary if you plan to write to multiple ColumnStore servers from the same host. The SDK server must be able to communicate to the ColumnStore PM servers over the standard ColumnStore port ranges of 8600 - 8622, 8700, and 8800.
If the ColumnStore server was configured as a single server deployment, then the Columnstore.xml file will need the IP addresses updated from 127.0.0.1 to the actual ip / hostname of the ColumnStore server in order to be used on a remote SDK server. A simple sed statement should suffice for updating:
sed "s/127.0.0.1/172.21.21.8/g" Columnstore.xml > Columnstore_new.xml
Getting Started with C++
The documentation below is the best place to get started with building and developing against the C++ SDK. Some sample programs are installed to /usr/share/doc/mcsapi/ for review.
Getting Started with Java
The Java version of the SDK provides a very similar API to the C++ one so the pdf documentation can generally be transposed 1 for 1 to understand the API calls. Since the Java version is a wrapper on top of the C++ API the underlying library must be loaded using a static initializer once in your program.
Starting with version 1.1.3 the library is loaded whilst importing the ColumnStoreDriver through:
import com.mariadb.columnstore.api.ColumnStoreDriver;
Versions prior to 1.1.3 need to manually load the system library:
static { System.loadLibrary("javamcsapi"); // use _javamcsapi for centos7 }
The corresponding java jar must be also be included in the java classpath. The packaged install is built and tested with OpenJDK 8.
First a simple table is created with the mcsmysql client:
MariaDB [test]> create table t1(i int, c char(3)) engine=columnstore;
Next create a file MCSAPITest.java with the following contents:
import com.mariadb.columnstore.api.*; public class MCSAPITest { public static void main(String[] args) { ColumnStoreDriver d = new ColumnStoreDriver(); ColumnStoreBulkInsert b = d.createBulkInsert("test", "t1", (short)0, 0); try { b.setColumn(0, 2); b.setColumn(1, "XYZ"); b.writeRow(); b.commit(); } catch (ColumnStoreException e) { b.rollback(); e.printStackTrace(); } } }
Now compile and run the program. For RHEL / CentOS 7 (library installed in /usr/lib64):
javac -classpath ".:/usr/lib64/javamcsapi.jar" MCSAPITest.java java -classpath ".:/usr/lib64/javamcsapi.jar" MCSAPITest
For Ubuntu / Debian:
javac -classpath ".:/usr/lib/javamcsapi.jar" MCSAPITest.java java -classpath ".:/usr/lib/javamcsapi.jar" MCSAPITest
Now back in mcsmysql verify the data is written:
MariaDB [test]> select * from t1; +------+------+ | i | c | +------+------+ | 2 | XYZ | +------+------+
Getting Started with Python
The current package install supports Python 2.7 and Python 3. Once installed the library is available for immediate use on the system.
First a simple table is created with the mcsmysql client:
MariaDB [test]> create table t1(i int, c char(3)) engine=columnstore;
For this simple test the python CLI will be used by simply running the python program with no arguments and entering the following:
import pymcsapi driver = pymcsapi.ColumnStoreDriver() bulk = driver.createBulkInsert('test', 't1', 0, 0) bulk.setColumn(0,1) bulk.setColumn(1, 'ABC') bulk.writeRow() bulk.commit()
In interactive command line mode, the bulk.setColumn and bulk.writeRow methods return the bulk object to allow for more concise chained invocation. You may see something like the following as a result which is normal:
>>> bulk.setColumn(0,1) <pymcsapi.ColumnStoreBulkInsert; proxy of <Swig Object of type 'mcsapi::ColumnStoreBulkInsert *' at 0x7f0d5295bcc0> >
Now back in mcsmysql verify the data is written:
MariaDB [test]> select * from t1; +------+------+ | i | c | +------+------+ | 1 | ABC | +------+------+
Documentation
The following documents provide SDK documentation: