Commissioned by the World Bank, the Global Burden of Disease (GBD) project, led by the Institute for Health Metrics and Evaluation (IHME), serves as the most comprehensive effort to date to analyze disability and death from a multitude of causes worldwide. GBD results are developed through many internal processes and pipelines that rely on a MySQL-compliant infrastructure. Unfortunately, the choice to adopt MySQL was made at a time when the scope of work was much smaller, and the growth in data to today’s size was unimaginable:
When IHME was faced with the task of processing petabytes of health data and analyzing it, they needed a cost-effective, scalable and flexible data warehouse and a Big Data analytical engine. That challenge? Taking over 10 billion data points and processing the information so it can be submitted to the GBD project and analyzed through multiple lenses.
In the beginning, IHME invested significant time and resources into performance-tuning the legacy database environment to handle the query and data-loading traffic from their 16,000-CPU-core high-performance computing cluster. But eventually, issues developed surrounding the platform’s ability to handle continued, exponential growth of data.
When evaluating solutions, key criteria included cost-effectiveness, a MySQL-compatible interface and, ideally, an open source offering with community-driven input and contributions. What’s more, IHME did not want a solution that would require its team to rewrite vast amounts of research application code.
A couple of top contenders were MemSQL and MySQL 5.7, but in the end, neither one made the cut. While MemSQL performed extremely well at scale, it had a number of non-standard constructs for database design and implementation, and lacked definable security mechanisms for authentication and authorization with their non-commercial product. MySQL 5.7 offered higher benchmark speeds for ingest and query optimization, but the platform could not support transformational improvements to scalability and workload efficiency.
In particular, IHME needed a platform that could leverage multi-host sharding with multi-threaded software. Its database team is small, and while building a sharding infrastructure is reasonably straightforward, IHME knew it wouldn’t be realistic to ask the development staff to make the applications shard-aware. For IHME, the decision to select MariaDB for analytics – with its powerful open source storage engine – was an easy one.
MariaDB Platform, a hybrid transactional/analytical database, delivers the following key benefits to IHME:
Higher performance
MariaDB Platform’s columnar storage engine reduces disk I/O, making it much faster than row-based storage for read-intensive analytic workloads on large datasets. Having quick access to results of aggregate functions allowed for better analysis of the data.
Improved scalability and efficiency
The MariaDB ColumnStore storage engine provides advanced data compression, which enabled IHME to compress data to approximately 40% of the original – thereby reducing the hardware needed for its massive data deployments.
Better security
MariaDB ColumnStore accesses all the same security capabilities delivered in MariaDB Server, including encryption of data in motion, role-based access control, and audit features.
Faster analysis
MariaDB ColumnStore lets IHME leverage SQL to bring transactional and analytic workloads into a single enterprise-grade system. It simplifies enterprise administration and execution with a standard SQL front end for OLTP and analytics. Therefore, IHME didn’t have to change any existing application code that leverages standard SQL.