Getting to Know MariaDB ColumnStore

With the recent announcement of MariaDB ColumnStore, we get many questions on the architecture and functionality of MariaDB ColumnStore. This blog post describes the architecture of MariaDB ColumnStore.

MariaDB ColumnStore is a GPLv2 open source columnar database built on MariaDB Server. It is a fork and evolution of the former InfiniDB product. It can deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage.

MariaDB ColumnStore – a massively parallel, distributed database

MariaDB ColumnStore consists of two main component service classes:

  • User Modules :  providing the MariaDB SQL engine front end and query orchestration.
  • Performance Modules : providing distributed query processing.

By utilizing the MariaDB server as the front end, all server capabilities of MariaDB can also be leveraged including secure connections, audit plugin, and other storage engines. For the latter, ColumnStore supports the ability to perform cross engine joins to allow querying a ColumnStore table against say an InnoDB table.

The MariaDB server process incoming connection requests and queries for each user connection as shown in the diagram below. Once a SQL query is received by the User Module, it processes that SQL query and distributes query operations across the Performance Modules. The Performance Modules executes the query operations in distributed manner and reads/writes MariaDB ColumnStore Columnar data files and return intermediate query operation results to User Modules. Any operation that cannot be distributed is performed at the User Module level before returning results through the MariaDB server process back to the client.

Blog - Getting to Know ColumnStore.png

Both User Modules and Performance Modules are horizontally scalable. Scaling Performance Modules first allows for the greatest reduction in individual query performance. Scaling User Modules allows for high availability and increased query concurrency. Both User Modules and Performance Modules are multi threaded further increasing increasing performance on a per node level.

MariaDB ColumnStore – A Columnar Database

As a columnar database, MariaDB ColumnStore stores table data in columns rather than rows. This allows the query optimizer to only read columns necessary to fulfil a given query and its result set. Once a particular column value has been identified, the corresponding row values can easily be determined through a logical offset into those other column files. Data partitioning by columns is also called Vertical Partitioning. Horizontal Partitioning of data is achieved by distributing the data across Performance Modules. Further data elimination within a Performance Module is achieved by maintaining range metadata within a distributed Extent Map allowing elimination of particular column extent files should the value fall outside of the range. By storing data as columns it is also much easier to add and remove columns over time even online.

It is important to recognize that both the Vertical and Horizontal partition are automatically provided and managed by MariaDB ColumnStore. Very little configuration and maintenance is required to maintain high performance of the system. As a result indexes are not required to be defined or maintained either.

MariaDB ColumnStore – The Big Data Platform

If your analytical query workload is up to a hundred thousand rows and your table’s size remains under a million rows, an OLTP engine such as InnoDB or MyISAM will handle this with reasonable performance. Beyond that, performance is much harder to tune for and maintain. MariaDB ColumnStore is designed for such workloads.

It is suitable for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows. As the data size grows, MariaDB ColumnStore allows you to add more PM nodes to scale your performance linearly.

My next blog post will continue this topic and go deeper into more specifics of how MariaDB ColumnStore is able to handle such big data workloads. You can write to me at dthompson@mariadb.com or follow me @davidwbt for more insights into MariaDB ColumnStore.