MariaDB ColumnStore Write API

spacer

There are many great unique features of MariaDB ColumnStore, one of which is the speed in which you can get data from CSV files into MariaDB ColumnStore using the tool ‘cpimport’. But what if your data is in another format? Or you wish to stream data into MariaDB ColumnStore from another application? This is where MariaDB ColumnStore’s new write API comes into play.

The MariaDB ColumnStore write API is a new C++ API which lets you inject data directly into MariaDB ColumnStore’s WriteEngine using a series of simple calls. This allows you to easily write custom data injection tools that are much faster than using the SQL interface. If you are a Python or Java developer, then we also bundle in wrappers for those languages.

We designed the API to be familiar to users of ORM ways of accessing the database, we have the function setColumn() which is used to set a column in a row you are going to write, writeRow() to store the row as well as commit() and rollback() functions.

This is an example of a simple application that will write to a MariaDB ColumnStore that has just two integer columns. The full source code for this can be found in the example/basic_bulk_insert.cpp file in the API’s source code:

int main(void)
{
    mcsapi::ColumnStoreDriver* driver = nullptr;
    mcsapi::ColumnStoreBulkInsert* bulk = nullptr;
    try {
        driver = new mcsapi::ColumnStoreDriver();

The ColumnStoreDriver class will automatically discover the MariaDB ColumnStore cluster by trying to find the Columnstore.xml which is in every module, there is also an optional parameter to specify a location for this xml file. It will throw an error if this cannot be found. You can copy the xml file to another server and use it there as long as that server can talk to all your PM nodes.

        bulk = driver->createBulkInsert("test", "t1", 0, 0);

This creates an instance of a bulk insert class from the driver class and sets everything up as required. You can see that we are writing to the table “test.t1”. The API can create many bulk insert objects from a single driver object. Each bulk insert object should be considered a database transaction.

        for (int i = 0; i < 1000; i++)
        {
            bulk->setColumn(0, (uint32_t)i);
            bulk->setColumn(1, (uint32_t)1000 - i);
            bulk->writeRow();
        }

This is a very simple ‘for’ loop which sets an integer for the first and second column of the table and then stores that row. Note that writeRow() is designed to not immediately send data to ColumnStore for performance. It will instead buffer 100,000 rows or wait for a commit() before it actually sends the data.

            bulk->commit();

After we have set these 1000 rows we ask the MariaDB ColumnStore API to commit the data. In this example the data would be sent to the MariaDB ColumnStore installation along with the commit() since we only have 1000 rows. At this point the ‘bulk’ object cannot be used to send any more data. It can only be used for retrieving summary information.

    } catch (mcsapi::ColumnStoreError &e) {
            std::cout << "Error caught: " << e.what() << std::endl;
    }
    delete bulk;
    delete driver;
}

Finally we have a little bit of error handling and cleanup.

The data is written using the same atomic methods as ‘cpimport’ by appending new blocks of data to the column files and moving an atomic “High Water Mark” block pointer upon commit. This means that select queries are not blocked during the insert process and do not see the data until after it has been committed.

There are several more advanced features of the API which you can find in the documentation. The API is an Open Source project and the source code can be easily obtained via GitHub. The API has been released alongside MariaDB ColumnStore 1.1 and you can get it from our download page along with other components that make up MariaDB AX, our modern data warehousing solution.