April 29, 2016

Why is ColumnStore important?

Relational databases store data in rows because a typical SQL query looks for multiple fields within a record. For example, if you ask for name, zip code and email address of all your customers in New York, the result is presented in rows, with each row containing several fields from a single record. Row structures are also well optimized to handle a lot of inserts and updates.

 Why is ColumnStore important?

But analytic queries are better handled with a column structure because they are more likely to go deep on a column of data, most queries relate only to a tiny subset of all the available columns and they’re also mostly read-only. For example, retrieving daily sales data for all your stores in California for the past two years is a columnar operation because it cuts across many records to retrieve data from a specific field. A typical ad-hoc aggregation query doesn’t care about most fields, just the trends in one field.

MariaDB ColumnStore is not only optimized for columnar operations, but also simplifies management. There is no need for indexing; metadata is stored in memory. That eliminates a cumbersome tuning process. ColumnStore, when paired with MariaDB, supports just about any query you want to throw at it. You can even join a MariaDB ColumnStore table and a InnoDB or remote MySQL table, a feature for unified simplicity. But there is much more.

In the last decade, we kept hearing that SQL is not needed for data processing or analytics, yet in the last few years every single OLTP and analytics solution is building a SQL layer. SQL is the most proven way of processing data. So, MariaDB ColumnStore is compatible with standard SQL using the MariaDB interface. Full SQL compliance means MariaDB ColumnStore works out-of-the-box with your existing business intelligence tools and SQL queries. In fact, it would work with most popular business intelligence tools, like Tableau and Business Objects, as well as anything that supports ODBC/JDBC. For data scientists, it works with R for advanced statistical analysis.

At the same time, we realize that SQL is not the best choice for machine learning and data discovery use-cases. We want to integrate Apache SPARK libraries like MLLib into ColumnStore to complete the picture.

Most importantly, MariaDB ColumnStore is based on an Open-Source GPLv2 fork of InfiniDB community project. We believe community driven software development is the new mandate of our time. We want to leverage our community strength in building MariaDB ColumnStore.

Read the next part on ColumnStore, ColumnStore Architecture & Use-cases.

About Nishant Vyas

Nishant joins MariaDB as Head of Product and Strategy from LinkedIn, where he was one of the early employees. During his almost nine-year tenure at LinkedIn, he contributed to building, scaling and operating production data stores using technologies like Oracle, MySQL, NoSQL and more. Nishant has extensive experience as a database engineer, database architect and DBA, and has held various leadership roles. 

Read all posts by Nishant Vyas