What’s Great About MariaDB ColumnStore 1.1

spacer

I’m excited that our second major GA release of MariaDB ColumnStore 1.1 is now available for download. In this blog, I review some of the major features that make up this release. Our focus for this release was to enable greater extensibility and to provide some additional features that have come up as we have worked with prospects and customers.

One of the features I’m most excited about is the bulk write SDK. This is a separate new product available here. This has been built to enable data streaming, integration and publishing use cases. Streaming means that it enables you to consume data for queuing systems such as Apache Kafka. The SDK will enable creation of higher performance adapters for ETL integration. Finally, I see this being used as a way to programatically record and publish results for machine learning platforms enabling business users to interact with the data in their tool of choice. The SDK is implemented in C++ and currently provides Python and Java wrapper implementations. More details can be found here. We also plan to develop and support a number of streaming data adapters applying the SDK to specific use cases such as replication and Kafka integration. A blog from my colleague Dipti Joshi provides more details on the streaming data adapters.

Continuing the extensibility theme, the capability to support user defined aggregate and window functions now exists. This provides a C++ SDK framework enabling the creation of functions that can scale out aggregate calculation across many PM’s. Distributed reference implementations of median and sum of squares are provided for use or extension. You can learn more about this feature here.

The number of data types supported by MariaDB ColumnStore has been extended to support Blob and Text types. I was surprised to see this being an in demand data type for analytics but many users are looking at MariaDB ColumnStore as an archive databases for OLTP data and this is one of the gaps. In addition, Text columns are a common workaround to allow a greater number of long string columns while keeping within the MariaDB row size limit.

There are a number of improvements for installation and manageability in this release. The first capability is that the postConfigure script now offers a ‘Data Redundancy’ storage option to leverage GlusterFS to provide data high availability for on premise customers that lack a networked storage device. Second, the install now offers an option where you can pre-install the software packages rather than having postConfigure perform remote installs. This will enable us to support package repository installs and will also make integration with orchestration tools simpler. Finally, a backup and restore tool is now provided that automates the current manual procedure.

MariaDB ColumnStore 1.1 has been updated to be based off of MariaDB Server 10.2. As part of this, the window function implementation was migrated to use the same front end SQL parser introduced in MariaDB Server 10.2.

Finally, we have made some significant investments internally in our processes. We have migrated to utilize buildbot as our continuous integration tool (which is also used by MariaDB Server). Also, with the significant increase in OS distributions and deployment options, we have invested in parallelizing and fully automating install, upgrade, and system verification for well over a 100 permutations of operating system, deployment topology, and configuration. In addition, we have made ongoing improvements to our developer regression test, other system tests, and performance benchmark tests.

Over the coming weeks, we’ll publish more detailed blogs drilling into each of these features. However, if you can’t wait, feel free to download MariaDB ColumnStore and provide us feedback so we can continue to improve and make this the best open source OLTP and OLAP database out there.

Learn more about MariaDB AX, our modern data warehousing solution for large scale advanced analytics.