Institute for Health Metrics and Evaluation (IHME)

The Challenge

Commissioned by the World Bank, the Global Burden of Disease (GBD) project, led by the Institute for Health Metrics and Evaluation (IHME), serves as the most comprehensive effort to date to analyze disability and death from a multitude of causes worldwide. GBD results are developed through many internal processes and pipelines that rely on a MySQL-compliant infrastructure. Unfortunately, the choice to adopt MySQL was made at a time when the scope of work was much smaller, and the growth in data to today’s size was unimaginable:

In 2010, there were approximately 2 billion data points
By 2016, results exceeded 10 billion per table

When IHME was faced with the task of processing petabytes of health data and analyzing it, they needed a cost-effective, scalable and flexible data warehouse and a Big Data analytical engine. That challenge? Taking over 10 billion data points and processing the information so it can be submitted to the GBD project and analyzed through multiple lenses.

Deciding Factors: Multi-Host Sharding, MySQL Compatibility, Columnar Storage Engine

In the beginning, IHME invested significant time and resources into performance-tuning the legacy database environment to handle the query and data-loading traffic from their 16,000-CPU-core high-performance computing cluster. But eventually, issues developed surrounding the platform’s ability to handle continued, exponential growth of data.

When evaluating solutions, key criteria included cost-effectiveness, a MySQL-compatible interface and, ideally, an open source offering with community-driven input and contributions. What’s more, IHME did not want a solution that would require its team to rewrite vast amounts of research application code.

A couple of top contenders were MemSQL and MySQL 5.7, but in the end, neither one made the cut. While MemSQL performed extremely well at scale, it had a number of non-standard constructs for database design and implementation, and lacked definable security mechanisms for authentication and authorization with their non-commercial product. MySQL 5.7 offered higher benchmark speeds for ingest and query optimization, but the platform could not support transformational improvements to scalability and workload efficiency.

In particular, IHME needed a platform that could leverage multi-host sharding with multi-threaded software. Its database team is small, and while building a sharding infrastructure is reasonably straightforward, IHME knew it wouldn’t be realistic to ask the development staff to make the applications shard-aware. For IHME, the decision to select MariaDB for analytics – with its powerful open source storage engine – was an easy one.

MariaDB Platform and ColumnStore Solve Volume and Scale Problems

MariaDB Platform, a hybrid transactional/analytical database, delivers the following key benefits to IHME:

Higher performance

MariaDB Platform’s columnar storage engine reduces disk I/O, making it much faster than row-based storage for read-intensive analytic workloads on large datasets. Having quick access to results of aggregate functions allowed for better analysis of the data.

Improved scalability and efficiency

The MariaDB ColumnStore storage engine provides advanced data compression, which enabled IHME to compress data to approximately 40% of the original – thereby reducing the hardware needed for its massive data deployments.

Better security

MariaDB ColumnStore accesses all the same security capabilities delivered in MariaDB Server, including encryption of data in motion, role-based access control, and audit features.

Faster analysis

MariaDB ColumnStore lets IHME leverage SQL to bring transactional and analytic workloads into a single enterprise-grade system. It simplifies enterprise administration and execution with a standard SQL front end for OLTP and analytics. Therefore, IHME didn’t have to change any existing application code that leverages standard SQL.

MariaDB AX, with the ColumnStore storage engine, solved both a volume and scale problem in our environment that allows us to seamlessly handle both current and planned increases in workload.

Andrew Ernst, Assistant Director, Infrastructure, IHME

Global Health Initiative Relies on MariaDB to Manage and Analyze Multi-Billion-Row Tables

The Challenge

Deciding Factors: Multi-Host Sharding, MySQL Compatibility, Columnar Storage Engine

MariaDB Platform and ColumnStore Solve Volume and Scale Problems

Additional Resources

Why choose a columnar database for modern analytics

IHME Believes Open Source MariaDB ColumnStore Is The Future of Data Warehousing

MariaDB Analytics Tutorial: 5 Steps to Get Started in 10 Minutes