December 14, 2016

A Look Inside MariaDB ColumnStore 1.0.6 GA

Today, MariaDB ColumnStore has reached a major milestone – MariaDB ColumnStore 1.0 is now GA with the release of MariaDB ColumnStore 1.0.6 GA. The journey of MariaDB ColumnStore began in January 2016 when our team started building ColumnStore. The support from our early alpha and beta adopters and community users has helped us take MariaDB ColumnStore from the first alpha release to the GA today.

 

MariaDB ColumnStore is a massively parallel, high-performance, distributed columnar storage engine built on MariaDB Server. It is the first columnar storage engine for big data analytics in the MariaDB ecosystem. It can be deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage.

A Look Inside

In MariaDB ColumnStore’s architecture, three components – a MariaDB SQL front end called User Module (UM), a distributed query engine called Performance Module (PM) and distributed data storage – work together to deliver high-performance, big data analytics.

sSY3fOZ63DssEkjWboKANuw.png

  • User Module (UM):
    The UM is made up of the front end MariaDB Server instance and a number of processes specific to MariaDB ColumnStore that handle concurrency scaling. The storage engine plugin for MariaDB ColumnStore hands over the query to one of these processes which then further break down SQL requests, distributing the various parts to one or more Performance Modules to process the query. Finally, the UM assembles all the query results from the various participating Performance Modules to form the complete query result set that is returned to the user.
     

  • Performance Module (PM):
    The PM is responsible for storing, retrieving and managing data, processing block requests for query operations, and passing it back to the User Module(s) to finalize the query requests. The PM selects data from disk and caches it in a shared-nothing data cache that is part of the server on which the PM resides. MPP is accomplished by allowing the user to configure as many Performance Modules as they would like; each additional PM adds more cache to the overall database as well as more processing power.
     

  • Distributed Data Storage:
    MariaDB ColumnStore is extremely flexible with respect to the storage system. When running on premise, it can use either local storage or shared storage (e.g., SAN) to store data. In the Amazon EC2 environment, it can use ephemeral or Elastic Block Store (EBS) volumes.

MariaDB ColumnStore 1.0 Features

  • Scale

    • Massively parallel architecture designed for big data scaling

      • Linear scalability as new nodes are added

    • Easy horizontal scaling

      • Add new data nodes as your data grows

      • Continue read queries when adding new nodes

    • Compression

      • Data compression designed to accelerate decompression rate, reducing disk I/O

  • Performance

    • High-performance, real-time and ad-hoc analytics

      • Columnar optimized, massively parallel, distributed query processing on commodity servers

    • High-speed data load and extract

      • Load data while continuing analytics queries

      • Fully parallel high-speed data load and extract

  • Enterprise-Grade Analytics

    • Analytics

      • In-database distributed analytics with complex join, aggregation, window functions

      • Extensible UDF for custom analytics

    • Cross-engine access

      • Use a single SQL interface for analytics and OLTP

      • Cross join tables between MariaDB and ColumnStore for full insight

    • Security

      • MariaDB security features – SSL, role-based access and auditability

      • Out-of-the-box BI tool connectivity using ODBC/JDBC or standard MariaDB connectors

  • Management and Availability  

    • Easy to install, manage, maintain and use

      • Automatic horizontal partitioning

      • No index, views or manual partition tuning needed for performance

      • Online schema changes while read queries continue

    • Deploy anywhere

      • On premise or on AWS

      • On premise using commodity servers

    • High Availability

      • Automatic UM failover

      • Multi-PM distributed data attachment across all PMs in SAN and EBS environment for automatic PM failover

 

The release notes for MariaDB ColumnStore 1.0.6, along with a list of bugs fixed, can be found here. Documentation is available in our Knowledge Base. Binaries for MariaDB 1.0.6 are available for download here. For developers wanting to do a quick install, Docker and Vagrant options are available. You can also find MariaDB-ColumnStore-1.0.6 AMI in the AWS marketplace.

 

Reaching the GA could not have been possible without the valuable feedback we have received from the community and our beta customers. Thanks to everyone who contributed. Special acknowledgment also goes to the outstanding work by MariaDB ColumnStore Engineering team whose hard work and dedication has made this GA possible.

The journey does not stop here. As the new year unfolds we will start looking at the content and begin planning for MariaDB ColumnStore 1.1. Based on what we have already learned from our beta users, we will be adding streaming and more manageability features in 1.1. If you have any ideas or suggestions that you would like to see in the next release, please create a request in JIRA. For questions or comments, you can reach me at dipti.joshi@mariadb.com or tweet me @dipti_smg

 

About Dipti Joshi

Dipti Joshi is Senior Product Manager for MariaDB MaxScale and MariaDB ColumnStore

Read all posts by Dipti Joshi