What is ColumnStore and Why is it Important?
MariaDB ColumnStore is the analytical component for MariaDB Platform. It is a columnar storage engine that utilizes a massively parallel distributed data architecture designed for big data scaling to process petabytes of data, linear scalability and exceptional performance with real-time response to analytical queries. It leverages the I/O benefits of columnar storage, compression, just-in-time projection, and horizontal and vertical partitioning to deliver tremendous performance when analyzing large data sets.
Relational databases store data in rows because a typical SQL query looks for multiple fields within a record. For example, if you ask for name, zip code and email address of all your customers in New York, the result is presented in rows, with each row containing several fields from a single record. Row structures are also well optimized to handle a lot of inserts and updates.
But analytic queries are better handled with a column structure because they are more likely to go deep on a column of data, most queries relate only to a tiny subset of all the available columns and they’re also mostly read-only. For example, retrieving daily sales data for all your stores in California for the past two years is a columnar operation because it cuts across many records to retrieve data from a specific field. A typical ad-hoc aggregation query doesn’t care about most fields, just the trends in one field.
MariaDB ColumnStore is not only optimized for columnar operations, but also simplifies management. There is no need for indexing; metadata is stored in memory. That eliminates a cumbersome tuning process. ColumnStore, when paired with MariaDB, supports just about any query you want to throw at it. You can even join a MariaDB ColumnStore table and a InnoDB or remote MySQL table, a feature for unified simplicity. But there is much more.
In the last decade, we kept hearing that SQL is not needed for data processing or analytics, yet in the last few years every single OLTP and analytics solution is building a SQL layer. SQL is the most proven way of processing data. So, MariaDB ColumnStore is compatible with standard SQL using the MariaDB interface. Full SQL compliance means MariaDB ColumnStore works out-of-the-box with your existing business intelligence tools and SQL queries. In fact, it would work with most popular business intelligence tools, like Tableau and Business Objects, as well as anything that supports ODBC/JDBC. For data scientists, it works with R for advanced statistical analysis.
At the same time, we realize that SQL is not the best choice for machine learning and data discovery use-cases. We want to integrate Apache SPARK libraries like MLLib into ColumnStore to complete the picture.
Most importantly, MariaDB ColumnStore is based on an Open-Source GPLv2 fork of InfiniDB community project. We believe community driven software development is the new mandate of our time. We want to leverage our community strength in building MariaDB ColumnStore.
Read the next part on ColumnStore, ColumnStore Architecture & Use-cases.