March 24, 2014

The Best of Both Worlds

Image is a mix of 2 photos, one from stock.xchng ( and the other CC-BY from flickr, user massmatt.In my previous blog post, I described some of the key new replication features in MariaDB 10 that let this powerful open-source database shine in web-scale and analytical applications. Now I’d like to continue my tour of MariaDB 10’s new capabilities, focusing in on the NoSQL integration features that let you integrate RDBMS transactional performance, reliability and security with NoSQL features built into MariaDB, and enabling easy interoperability with NoSQL technologies. Now with MariaDB 10, it is easy to build sophisticated hybrid applications that tie together diverse systems and data into a cohesive whole.

The CONNECT Storage Engine

Today’s applications run on a startlingly diverse set of databases and platforms, and handle data in a wide range of formats. There are a wide range of RDBMS and NoSQL databases with differing features and performance characteristics, and applications built at different times running on very different infrastructure stacks. And not all data can be neatly extracted from databases with queries. Fixed format flat files, delimited files, files whose content is defined by a markup language - a bewildering scope of data. What if your application needs to process log files from your web server and JOIN individual log records with transactional records from your online store? What if your online store sells and fulfills orders from a broad range of different suppliers, and your systems must access and combine their inventory records - stored in different SQL databases from Oracle to MySQL to PostgreSQL to SQL Server? Combining such different data sources together in real time has required a lot of hand-coded logic - a costly and time-consuming proposition.

MariaDB CONNECT Storage Engine Diagram Slide

The CONNECT storage engine - included in MariaDB 10 - can map many kinds of data to tables that can be queried and combined with regular tables stored in other storage engines such as XtraDB, InnoDB, or Spider. The external data isn’t imported - it is just accessed as though it were in database tables. CONNECT can give you access to many data sources, including:

  • Flat text or binary files with fixed-offset or delimited columns, including CSV.
  • Popular configuration file formats(INI), XML and HTML files.
  • Any tables in external databases accessible through ODBC or using MySQL connectors (like the Federated engine) - including other SQL databases and Microsoft Excel files.
  • Tables generated from other tables, including merging tables (like the MERGE engine), occurrence tables, or even pivot tables like those available in spreadsheet applications.
  • Tables generated by your own methods - through a plug-in API.

CONNECT was developed and contributed to MariaDB by Olivier Bertrand, an ex-IBM database researcher who has long had a dream of a more versatile method to access external data sources. He’s realized his dream - CONNECT is a powerful and flexible tool for simplifying access to a wide range of data. Bringing loosely structured data into the RDBMS world has always been a programming headache. With CONNECT, MariaDB 10 can simplify this common problem. This new storage engine is particularly well-suited to gathering data for Business Intelligence and analytics - an increasingly important tool in the enterprise data management arsenal. CONNECT is a young technology, and may be a little rough around the edges, but we believe it has great potential. Its use cases are limited only by your imagination!

The Cassandra Storage Engine

The Cassandra project originated at Facebook, and now is an Apache project. Cassandra is widely adopted for its linear insert/update performance, simple and powerful replication, scalability and availability characteristics. It is is great for handling non-transactional simple structured data in huge volumes, with no single point of failure. It is widely adopted by web-scale applications such as Hulu, GoDaddy, and Netflix. But Cassandra can’t do it all, and when you need to integrate your Cassandra and OLTP applications together, MariaDB has you covered.

MariaDB Cassandra Storage Engine

The Cassandra storage engine in MariaDB 10 provides a “window” into a Cassandra ring, allowing applications to read and write data to a Cassandra Column Family as though it were a relational table, using standard SQL queries. You can JOIN data from Cassandra with tables in other storage engines, and run a cluster of MariaDB 10 servers together to provide highly available access to Cassandra data. Imagine you have a web-scale video application in which Cassandra stores your users’ current watchlists and history, but you need to join that with user accounts and payment information in your commerce system. With MariaDB 10 you can build applications that fully utilize Cassandra’s NoSQL data schemas, and process that data with relational tables in other OLTP applications like that payment system. Rely on MariaDB’s transactional power to process critical operations reliably, then combine the results with Cassandra data. Truly the best of both worlds!

MariaDB Dynamic Columns

As powerful as NoSQL databases can be, they typically do not deliver the ACID characteristics of a mature RDBMS like MariaDB. Imagine what you could do if you could combine MariaDB’s powerful transactional storage engines like XtraDB and InnoDB with the ability to store unstructured data and retrieve it easily using a simple API? Lets say most of what you’re doing is OLTP and transactions, but you need to store user profile data or item attributes alongside the usual structured columns? Once again, you could have the best of both worlds - updates that satisfy ACID constraints, but storing dynamically defined and structured information.

Cust ID Account Balance Dyn_Col_BLOBs
2035 $154.04 NAME: John Smith|LOC: 45.35243, -74.98348|IMAGE: x27A8B8C ...
2036 $929.10 NAME: Jane Doe|LOC: 45.35243, -74.98348|AGE: 32| GENDER: F...
2037 $377.53 NAME: Carol Jones|AGE: 43|GENDER: F||IMAGE: xA9674DE678 ...
INSERT INTO customers VALUES (2037, 377.53, COLUMN_CREATE(“NAME”, “Carol Jones”, “AGE”, 43, “GENDER”, “F”,  . . . ));

SELECT COLUMN_JSON(Dyn_Col_BLOBs) FROM customers WHERE Cust_ID = 2037;
{“NAME”:“Carol Jones”, “AGE”:”43”, “GENDER”:”F”, “IMAGE”:”A9674DE678…”}

WIth Dynamic Columns, an exclusive MariaDB feature first introduced in MariaDB 5.3 and extensively improved in MariaDB 10, you can store the various named fields of a data object together, and retrieve them as a unit using a provided API, nest objects within other objects, and include those objects in transactional updates. Think of a set of dynamic columns within a particular row as like a document in a document-store NoSQL database like MongoDB. A MariaDB table then can be thought of as being the equivalent of a MongoDB document collection. But with MariaDB’s Dynamic Columns, you can update multiple rows (documents) together in an atomic transaction! Once again - MariaDB 10 brings you the best of both worlds.

HandlerSocket Interface

As useful as SQL is for structured data manipulations, there is a good bit of overhead in parsing, evaluating, and optimizing SQL statements. What if you have an application that doesn’t need all of that, but needs the most basic operations, delivered in a super-simple API that performs like lightning? Great, but perhaps you have other applications that need the full power of SQL to query and manipulate the data that this simple application inserts and updates. Once again, you need the best of both worlds - NoSQL simplicity, and RDBMS sophistication.

Handler Socket Plug-In

The HandlerSocket Plug-In provides low-level, high-performance access to tables in MariaDB storage engines like InnoDB, XtraDB, and Spider. The plug-in doesn’t process SQL. It exposes just the basic operations like find(), insert(), update(), delete() in a CRUD API that bypasses the SQL statement processing components of MariaDB and directly accesses the storage engine. Without the parsing and optimizing steps, the plug-in needs much less CPU, memory, and network resources - the secret to its performance. HandlerSocket batches together operations if it can, reducing the number of requests to the storage engine, further improving performance. The HandlerSocket Plug-In is a powerful building block that lets you build sophisticated non-relational processing in your application and use MariaDB’s advanced storage engines to persistently store the data. When you need the best of both worlds, look to MariaDB 10 to deliver, with innovations like the CONNECT and Cassandra storage engines, Dynamic Columns, and the HandlerSocket Plug-in.

About Rich Sands

Rich Sands has been a software product manager and community developer for over 20 years. As Product Manager he helps SkySQL drive business success through developer engagement and community building. Prior to joining SkySQL, Rich worked for Black Duck and Sun Microsystems in various product marketing and community marketing roles.

Read all posts by Rich Sands