Big Data, Big Opportunities with MariaDB 10

In my previous role as the manager of an enterprise-grade public cloud, I’ve had the opportunity to work with many enterprises from across industries and across the world. All of these companies were looking for new ways to better understand and serve their customers, to improve efficiency, and improve their financials. There were many different approaches, but they all had one thing in common; it is all about data. Every day nearly 4 exabytes of data is created. This data holds great value for business, science and society. However, in order to extract that value, data needs to be stored, secured, managed and analysed. This is why I am excited to join SkySQL, a company that is at the forefront of driving innovation in databases.

Big Data is a big topic – and depending on who you’re talking to, it can mean very different things. There are two kinds of big data analytics – depending on whether you have a lot of structure to your collected information, and know what questions you’re trying to answer, or whether you’ve just got a huge pile of undifferentiated data and are trying to deduce correlations and relationships hidden in this data.

More mature applications often generate a lot of highly detailed, structured information from everyday transactional activity. You can collect this historical transactional and operational data into large, structured data sets in data warehouses. Traditional Business Intelligence (BI) and Online Analytic Processing (OLAP) applications turn this rich, structured information into valuable business insight. Using BI/OLAP, analysts use statistical methods to derive conclusions that guide decision-making.

Multi-source Replication

Multi-source replication

BI/OLAP applications usually are built on top of relational databases. MariaDB 10 has some awesome features and partner add-on products that are designed to simplify and speed up this kind of analysis, such as:

  • Multi-Source Replication: Simplifies collecting data from multiple databases and applications into a single location for loading into a data warehouse.
  • CONNECT Storage Engine: flexible tool to access diverse data sources dynamically, including unstructured files such as log files in a folder, or any database with an ODBC connector, from within MariaDB. Great for ETL or even real-time analysis.
  • TokuDB Storage Engine (partner product): uses an indexing algorithm that permits faster updates to large and very large data sets. Can improve performance of data warehouses, and of the data loading operations of BI/OLAP applications.
  • InfiniDB Columnar Storage Engine (partner product): A potential future direction could be to work with partners such as InfiniDB. Their product maintains the standard SQL interface and relational model, but stores data in adjacent columns rather than adjacent rows, greatly reducing I/O. Makes your data warehouse much faster in processing analytical workloads.

MariaDB 10 is going to simplify DBAs’ work in lot of other ways, and help them set things up to deliver on both operational and analytic requirements.

But what if your data is unstructured? What if you don’t even know what questions to ask? Those problems are the province of the newer big data technologies. They take a different approach: they accumulate massive data sets of raw, uninterpreted data and then use inductive statistical methods to find the patterns in this data. This is the method employed by Apache Hadoop and similar systems. Hadoop can uncover the latent patterns in this unstructured data and provide insights into populations.

The same features of MariaDB 10 that aid in loading data into data warehouses – Multi-Source Replication and the CONNECT Storage Engine – can assist in gathering data for analysis in a Hadoop cluster. MariaDB can feed raw data – for example, click-streams in e-commerce applications – to Hadoop-based analytical applications. While the data is stored in special-purpose unstructured databases such as HBase which are tightly integrated with Hadoop, the source of such data remains the online, transactional applications that drive everyday business.

MariaDB includes a unique Dynamic Column feature which can store different, labeled data objects in each row of a table in much the same way as NoSQL technologies such as MongoDB. MariaDB can also directly read and write data to an Apache Cassandra cluster as well, using the Cassandra Storage Engine to permit SQL language statements to execute against Cassandra data. Looking further into the future, SkySQL’s MaxScale proxy engine will be able to filter and transform database operations to efficiently feed data into big data applications based on Hadoop and related technologies.

There is a persistent myth circling around the database world that RDBMS technology is ill-suited to big data analytics. MariaDB combines the best of both worlds – high-performance relational and structured data with unstructured text or binary objects – both as a source of data and as the foundation for high-performance BI/OLAP applications. Big data is a big deal – and for that you need a powerful, mature database to help you mine for those golden nuggets of insight. With revolutionary new features like Multi-Source Replication, MariaDB 10 is ready to deliver!