MariaDB’s new analytics engine – MariaDB ColumnStore – has been in the works for some time. What is it and how did it come about? This post outlines our thinking in choosing the engine and features we implemented.
Databases are expanding from their roots as systems of record into new analytics applications that some people call “systems of intelligence.” That means that instead of just storing and querying transactional data, databases are increasingly being used to yield insights, predict the future and prescribe actions users should take. Led by the open-source Hadoop ecosystem, online analytic processing (OLAP) is making its way out of the corporate data center and into the hands of everyone who needs it.
In the last decade, as data analytics became more important, solving these problems became more challenging. Everyone was led to believe, through hype and skewed opinions, that scale-out, big clusters and data processing without using SQL is probably the only way to do data analytics. We were made to believe that solving analytics would require either scale-out or spending big money on proprietary solutions.
On one end, there are traditional OLAP data warehouses, which are powerful and SQL rich BUT too costly, proprietary and often black box appliances. On the other end, we saw the rise of hadoop ecosystems, which challenged traditional OLAP providers, paved a way to machine learning and data discovery that was not easy with traditional solutions but came with the complexity of scale out and lacked SQL interfaces.
We know our users choose MariaDB because they value performance, scalability, reliability, security and extensibility through open source, with 100% SQL compatibility. We wanted our OLAP choice to reflect the same values.
One commercial product that caught our eye is SAP’s HANA. It’s a database appliance that supports both transactional and analytical applications on the same platform. HANA has gotten rave reviews, not only for functionality but also for simplicity. But HANA appliances are expensive and they’re really intended for organizations that are using SAP (price: don’t ask). We knew there was an open-source, scalable, software-only columnar analytic DBMS alternative: InfiniDB.
We thought InfiniDB would be a terrific fit with our OLAP and big data strategy. It provides a columnar, massively parallel storage engine with performance that scales with the addition of database servers, Performance Modules. Redundancy and high availability are built in, and InfiniDB supports full SQL syntax including joins, window functions and cross engine joins. It also works with Hadoop’s HDFS file system to provide incredible scalability.
We saw a unique opportunity to fill the need for high performance analytic data engine in the open-source market by binding InfiniDB’s OLAP engine to MariaDB’s OLTP engine to enable users to run analytic queries on production data in near real-time. Much of our development work has been enabling that tight integration.
Read the next part on ColumnStore, Why is ColumnStore important?.