May 2, 2016

ColumnStore Architecture & Use-case

In the previous blog, I've talked about, Why is ColumnStore important?. Let's look at MariaDB ColumnStore architecture and use-cases.

ColumnStore Architecture & Use-case

MariaDB ColumnStore is built on a three-tier scalable architecture that supports the kind of growth that MariaDB users have grown accustomed to. Queries are processed by user modules, which assign tasks to parallel performance modules that access the columnar distributed storage layer below. Performance modules scale almost infinitely, providing both performance and capacity growth as you add servers. These modules don’t process queries; they just take instructions from the user modules, which organize and deliver the results.

At the very high-level, when a query arrives at MariaDB Database Server; It gets the data from storage engine, applies projection, filtering, sorting, aggregation etc before returning results to client. This approach is monolithic and serial, thus slow in performance when processing large amount of data.

Whereas, in MariaDB ColumnStore, query interface hands over the query to UM. UM runs on MariaDB server as ExeMgr. ExeMgr from MariaDB Server perspective is the storage engine. The ExeMgr is multi-threaded and pushes down query operations to PM nodes. As ExecMgr on UM node is multithreaded, it can handle the results coming from PM in different order, without blocking one PM while waiting for intermediate data back from another PM.

As UM can push down the query operations to distributed and multithreaded PM nodes to access data in parallel, apply filter and projection before sending it back to query interface. Also, If more than one PM are involved to access the data, ExeMgr will do the final aggregation before sending the results back to client. This would greatly improve the performance of large and complex analytical queries. For those familiar with Hadoop can easily relate MPP technology with Map-Reduce.

Much of our work integrating InfiniDB with MariaDB was adding the robustness that our customers have come to expect. This includes SSL support, audit and authentication plug-ins and role-base access. Our product roadmap includes additional features integration, such as support for Apache Spark, a regression window function and full integration with MaxScale, our dynamic data routing platform that provides for minimum downtime, security, scalability and interoperability beyond MariaDB and MySQL.

ColumnStore doesn’t support unstructured data natively. This was a conscious decision. There has been a lot of active development in the Hadoop and Spark ecosystems that leverage analytics and data discovery on unstructured data. We would like to integrate Apache Spark’s in-memory compute ability with its machine learning libraries like MLLib to allow our users to take full advantage of it.

We believe ColumnStore is more advanced in traditional OLAP use cases than other NoSQL alternatives. For example, Cassandra is a popular open-source database for handling write-intensive operations that involve many data types. It’s a good decision-support database, but it isn’t fully SQL compliant, it lacks in-database analytics and it does not have the ability to perform joins. Also, Cassandra is billed as a “column index” database, but it doesn’t have a true columnar structure. It requires manual partitioning, which restricts performance benefits to one single partition column. Basically, it’s much slower than ColumnStore for analytics operations.

A good example of ColumnStore’s speed and scalability is advertising serving and analytics use-case, an application with huge data volumes and the need for split-second speed. A hosted advertising serving platform can deliver hundreds of billions of ad impressions per month, a task that involves inserting nearly 100 million rows per day on tables with many columns and an uncompressed database size of hundreds of terabytes with just 7-10 nodes.

With MariaDB ColumnStore’s data ingestion rate of up to one million rows per second, the ad impression data can be made available to end users in near-real-time. Using the familiar SQL interface, end users can perform ad hoc and analytics reporting, as well as query the stock of available inventory with three-second response times. That translates into better-targeted ads, higher conversion rates and more sales.

We are thrilled to bring MariaDB ColumnStore to you and can’t wait to see what our customers do with it. Please share your own experiences so we can continue to evolve according to the needs of our community.

ColumnStore will be open for beta testing by end of May. You can Sign up for notification of Beta availability. You can also talk to our sales or setup a call with our solution architects to discuss more about ColumnStore and how it can fit your use-case.

About Nishant Vyas

Nishant joins MariaDB as Head of Product and Strategy from LinkedIn, where he was one of the early employees. During his almost nine-year tenure at LinkedIn, he contributed to building, scaling and operating production data stores using technologies like Oracle, MySQL, NoSQL and more. Nishant has extensive experience as a database engineer, database architect and DBA, and has held various leadership roles. 

Read all posts by Nishant Vyas