December 7, 2016

Facebook MyRocks at MariaDB

Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyRocks storage engine. Today I’m going to share a bit more on what that means for MariaDB users. Members of the Facebook Database Engineering team helped us answer some questions we think our community will have about MyRocks.

Benefits of MariaDB Server’s Extensible Architecture
Before discussing specifics of MyRocks, new readers may benefit from a description of MariaDB Server architecture, which is extensible at every layer including the storage layer. This means users and the community can add functionality to meet unique needs. Community contributions are one of MariaDB’s greatest advantages over other databases, and a big reason for us becoming the fastest growing open source database in the marketplace.

Openness in the storage layer is especially important because being able to use the right storage engine for the right use case ensures better performance optimization. Both MySQL and MariaDB support InnoDB - a well known, general purpose storage engine. But InnoDB is not suited to every use case, so the MariaDB engineering team is extending support for additional storage engines, including Facebook’s MyRocks for workloads requiring greater compression and IO efficiency, and MariaDB ColumnStore (currently in beta), which will provide faster time-to-insight with Massively Parallel Execution (MPP).

Facebook MyRocks for MariaDB
When searching for a storage engine that could give greater performance for web scale type applications, MyRocks was an obvious choice because of its superior handling of data compression and IO efficiency. Besides that, its LSM architecture allows for very efficient data ingestion, like read-free replication slaves, or fast bulk data loading.

As we add support for new storage engines, many of our current users may ask, “What happens to MariaDB’s support for InnoDB? Do I have to migrate?” Of course not! We have no plans to abandon InnoDB. InnoDB is a proven storage engine and we expect it to continue to be used by MariaDB users. But we do expect that deployments that need highest possible efficiency will opt for MyRocks because of its performance gains and IO efficiency. Over time, as MyRocks matures we expect it will become appropriate for even more use cases.

The first MariaDB version of MyRocks will be available in a release candidate of MariaDB Server 10.2 coming this winter. Our goal is for MyRocks to work with all MariaDB features, but some of them, like optimistic parallel replication, may not work in the first release. MariaDB is an open source project that follows the "release often, release early" approach, so our goal is to first make a release that meets core requirements, and then add support for special cases in subsequent releases.

Now let’s move onto my discussion with Facebook’s Database Engineering team!

Can you tell us a bit about the history of RocksDB at Facebook?

In 2012, we started to build an embedded storage engine optimized for flash-based SSD, by forking LevelDB. The fork became RocksDB, which was open-sourced on November 2013 [1] . After RocksDB proved to be an effective persistent key-value store for SSD, we enhanced RocksDB for other platforms. We improved its performance on DRAM in 2014 and on hard drives in 2015, two platforms with production use cases now.

Over the past few years, we've introduced numerous features and improvements. To name a few, we built compaction filter and merge operator in 2013, backup and column families in 2014, transactions and bulk loading in 2015, and persistent cache in 2016. See the list of features that are not in LevelDB: https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB .

Early RocksDB adopters at Facebook such as the distributed key-value store ZippyDB [2], Laser [2] and Dragon [3] went into production in early 2013. Since then, many more new or existing services at Facebook started to use RocksDB every year. Now RocksDB is used in a number of services across multiple hardware platforms at Facebook.

[1] https://code.facebook.com/posts/666746063357648/under-the-hood-building-and-open-sourcing-rocksdb/ and http://rocksdb.blogspot.com/2013/11/the-history-of-rocksdb.html
[2] https://research.facebook.com/publications/realtime-data-processing-at-facebook/
[3] https://code.facebook.com/posts/1737605303120405/dragon-a-distributed-graph-query-engine/

Why did FB go down the RocksDB path for MySQL?

MySQL is a popular storage solution at Facebook because we have a great team dedicated to running MySQL at scale that provides a high quality of service. The MySQL tiers store many petabytes of data that have been compressed with InnoDB table compression. We are always looking for ways to improve compression and the LSM algorithm used by RocksDB has several advantages over the B-Tree used by InnoDB. This led us to MyRocks: RocksDB is a key-value storage engine. MyRocks implements that MySQL storage engine API to make RocksDB work with MySQL and provide SQL functionality. Our initial goal was to get 2x more compression from MyRocks than from compressed InnoDB without affecting read performance. We exceeded our goal. In addition to getting 2x better compression, we also got much lower write rates to storage, faster database loads, and better performance.

Lower write rates enable the use of lower endurance flash, and faster loads simplify the migration from MySQL on InnoDB to MySQL on RocksDB. While we don't expect better performance for all workloads, the way in which we operate the database tier for the initial MyRocks deployment favors RocksDB more than InnoDB. Finally, there are features unique to an LSM that we expect to support in the future, including the merge operator and compaction filters. MyRocks can be helpful to the MySQL community because of efficiency and innovation.

We considered multiple write-optimized database engines. We chose RocksDB because it has excellent performance and efficiency and because we work directly with the team. The MyRocks effort has benefited greatly from being able to collaborate on a daily basis with the RocksDB team. We appreciate that the RocksDB team treats us like a very important customer. They move fast to make RocksDB better for MyRocks.

How was MyRocks developed?

MyRocks is developed by engineers from several locations across the globe. The team had the privilege to work with Sergey Petrunia right from the beginning, and he is based in Russia. At Facebook's Menlo Park campus, Siying Dong leads RocksDB development and Yoshinori Matsunobu leads the collaboration with MySQL infrastructure and data performance teams. From the Seattle office, Herman Lee worked on the initial validation of MyRocks that gave the team the confidence to proceed with MyRocks for our user databases as well as led the MyRocks feature development. In Oregon, Mark Callaghan has been benchmarking all aspects of MyRocks and RocksDB, which has helped developers prioritize performance improvements and feature work. Since the rollout began, the entire database engineering team has been helping to make MyRocks successful by developing high-confidence testing, improving MySQL rollout speed, and addressing other issues. At the same time, the MySQL infrastructure and data performance teams worked to adapt our automation around MyRocks.

What gave Facebook the confidence to move to MyRocks in production?

Much of our early testing with the new storage engine was running the Linkbench benchmark used to simulate Facebook's social graph workload. While these results were promising, we could not rely completely on them to make a decision. In order for MyRocks to be compelling for our infrastructure, MyRocks needed to reduce space and write rates by 50% compared with InnoDB on production workloads.

Once we supported enough features in MyRocks, we created a MyRocks test replica from a large production InnoDB server. We built a tool to duplicate the read and write traffic from the production InnoDB server to the MyRocks test replica. Compared with compressed InnoDB, we confirmed that MyRocks used half the space and reduced the storage write rate by more than half while providing similar response times for read and write operations.

We ran tests where we consolidated two InnoDB production servers onto a single MyRocks server and showed that our hardware can handle the double workload. This was the final result we needed to show that MyRocks is capable of reducing our server requirements by half and gave us the confidence that we should switch from InnoDB to MyRocks.

What approach did Facebook take for deploying MyRocks in production?

Moving to a new storage engine for MySQL comes with some risk and requires extensive testing and careful planning. Starting the RocksDB deployment with our user databases that store the social graph data may seem counterintuitive. However, the team chose to go this route because of two mutually reinforcing reasons:

  1. Based on benchmark and production experiments, the efficiency gains were significant enough and proportional to the scale of the deployment
  2. The workload on our user database tier is relatively simple, well known, and something our engineering team could easily reason about as most of it comes from our TAO Cache.

The benefits we expect as well as further details on the MyRocks project can be found in Yoshinori's post.

Both MyRocks and MariaDB are open source projects that are made stronger with community involvement. How will it help MyRocks when MariaDB releases a supported version? How would you like to see the community get more involved?

We expect MyRocks to get better faster when it is used beyond Facebook. But for that to happen it needs to be in a distribution like MariaDB Server that has great documentation, expert support, a community, and many power users. The community brings more skills, more use cases, and more energy to the MyRocks effort. We look forward to getting bug reports when MyRocks doesn't perform as expected, feature requests that we might have missed, and pull requests for bug fixes and new features.

I am most excited about attending conference talks about MyRocks presented by people who don't work at Facebook. While I think it is a great storage engine, the real test is whether other people find it useful — and hopefully useful enough that they want to talk and write about it.