At the Intersection of Mission Critical and Engagement

Road sign image altered from a CC-BY image on flickr.com by user Hawaii County.  http://creativecommons.org/licenses/by/2.0/deed.enI’ve been a part of the software and open source world for two decades, but I’m a relative newcomer to the database world. Well, not exactly: enterprise software has always needed a place to store, retrieve, and process all that data, and so I’ve seen how database technology has evolved in the context of the broader software industry. Now that I’ve been a part of SkySQL’s ongoing efforts to deliver the world’s most capable open source database technology along with the world-class engineering and support services that customers need and expect, I’ve come to appreciate how the most challenging use cases demand unprecedented scale, continuous availability, operational agility and sheer performance all together, at the same time. A very tall order! But MariaDB 10 brings some powerful new features and advanced technology to bear in fulfilling these requirements.

MariaDB 10 is the culmination of almost 2 years and 10 development releases of coding, testing, and iteration by the MariaDB team and its fast growing user community. The database is built on MySQL’s decades of battle-tested code and millions of installations, but it strikes out in some new directions. The MariaDB core team has been hard at work bridging the gap between mission-critical performance, availability and security of a mature RDBMS and the scale and development simplicity of NoSQL. MariaDB 10 combines the power and reliability of MySQL with capabilities that let today’s web-scale applications engage with millions of customers, and DevOps teams shrink the development cycle down to hours. I’d like to take you on a quick tour of MariaDB 10’s new capabilities and explain how they bring RDBMS and NoSQL together into a new kind of database.

Replication: The Magic Behind Massive Scale and Continuous Availability

As the price of commodity hardware drops, and the price of cloud infrastructure drops even faster, scalability comes from effectively harnessing all these inexpensive resources into massively parallel computing architectures that can ramp up to handle even web-scale loads. Databases scale this way too, but replicating data and allowing for parallelism while keeping it all consistent is a huge algorithmic challenge. MariaDB 10’s most exciting innovations bring new ways to replicate data that enable wholly new use cases previously out of reach for the MySQL ecosystem.

Parallel Slave Replication

If you want many replicas of your database in order to scale access to the data, you have to somehow keep all those replicas up to date. MariaDB 10 includes a new implementation of an old idea: parallel slave replication. What if all those slave replicas could process updates to your master database instance in parallel? The slave databases could handle much higher update loads while still serving up enormous read scalability. The trick to allowing those parallel slave updates is to tag each and every replication event with a unique id that lets the slave systems process updates in the correct order. MySQL 5.6 includes a Global Transaction ID (GTID) – that unique tag I just mentioned. But MariaDB 10’s new GTID implementation lets MariaDB 10 replicate updates to slave databases with much more parallelism. Thats a fancy way to say that MariaDB can handle a whole lot more updates per second – nearly 10x as many updates per second compared to a single-threaded replication process1.

So what? Well, that’s less replicas, less resources, higher throughput and more connections. Yup that’s right – you save money! And MariaDB can handle even more enormous workloads. Advanced technology that delivers some easy to understand benefits.

Multi-source Replication

So now with MariaDB 10 you’ve got all those databases and applications and they’re crunching huge numbers of transactions per second. Maybe you’re a global retailer with hundreds or thousands of brick-and-mortar stores with Point-of-sales (POS) systems, and a huge e-commerce web presence and sophisticated content management system. You’re collecting an incredibly rich bounty of data every second of every day: how do you get all that data into your data warehouse or Big Data analytics engine to mine it for insights into your business?

Imagine if MariaDB 10 could just replicate that data right into your analytics applications? Well, it can! Using Multi-source replication, you can set up a slave database to collect data from many applications, many locations – everything from transactional data to site click-streams to crowd-sourced reviews and ratings can be collected together into database tables, transformed and normalized, and poured into your analytics processing. MariaDB 10’s new GTID is part of the magic here too – it sorts out the incoming data from all those different systems and applications, keeping it all straight.

Multi-source Replication simplifies administration of all those applications too, saving your operations staff a lot of time and trouble. More analytics, more insight, fewer headaches – more simple benefits from the advanced technology baked into MariaDB 10.

Sharding with the Spider Storage Engine

So you’ve got a web-scale application with millions of customers, millions of profile records. And a lot of processing to do – your store, your inventory system, your recommendation engine, your customer service portal, your …. well you get the idea. Those tables with millions of rows are becoming a bottleneck. What if you could partition those tables across multiple servers and your applications could run many operations against those tables in parallel, using the same scale-out approach we’ve already seen work with replication? This is a technique known in the database world as sharding, and while it is very effective in handling massive tables, often it requires your developers to hand-craft the logic in your application and – wow does that get complex!

What if MariaDB 10 could partition those huge tables behind the scenes, so that applications could keep on accessing them using simple logic, but queries on those tables could be doled out to many servers in parallel, eliminating those bottlenecks? Thats what the Spider Storage Engine, an option built into MariaDB 10, allows you to do. With Spider, you define how the tables are partitioned inside the database, not inside the application. Your developers remain blissfully unaware of all the magic, but your application can handle millions of rows and terabytes of data in individual tables with much better performance.

Web-scale, without the development hassle. And Spider is a feature built by a third-party developer and integrated into MariaDB 10 by the team. Faster innovation, developed in parallel too!

Notes

1 Run with sysbench 0.5 oltp.lua. In-order parallel replication.Single table, 10M rows, crash-safe binlog, medium-sized transactions, 1 primary key update/transation, 16GB buffer pool. See http://kristiannielsen.livejournal.com/18435.html for more details.