Key takeaways

  • Building a resilient database architecture is more than a technical “checkbox”; it is the critical difference between a massive service failure and invisible continuity during a regional disaster.
  • True mission-critical reliability is achieved when your systems can automatically handle outages across different datacenters without requiring any immediate human intervention.
  • Strategic disaster recovery allows your technical team to handle the “heavy lifting” of data migration and stabilization, freeing up your leadership to focus on their most important asset: their people.

On March 1, various AWS datacenters went down, creating havoc in the Middle East. As a result various services in the region were out of order, from taxi services to banking applications. We also hosted multiple systems in those AWS datacenters, and we faced a wall of red alerts from all the systems and services that were impacted. Still, our customer continued to operate with their web and mobile applications without any interruption. Let me explain why.

Understanding What is Important

Our customer – a well known brand in the Middle East – depends on web and mobile operations for 40% of their business. In other words, those operations are mission critical to them. That is why they designed their web and mobile applications to be fully resilient, including the databases. We helped them design their database hosting architecture, including global delivery with sufficient performance for their usual business activities, as well to handle outages.

Our relational database software manages the replication to multiple nodes, providing performance and resilience. We designed it to be available in other regions as well, we did not rely on a single datacenter. And we ensured that we had backups to restore their environments just in case.

Supporting the Human Element

In any crisis, a company’s first priority is its people. We recognized that while our customer’s database was our focus, their internal teams were rightfully focused on the safety and well-being of their staff.

When the outage occurred, no human intervention was required to continue operations: that was all handled by our software. Unfortunately, it also became clear that the outage would last for an extended period and consequently the decision was made to set up replication in another region. 

Our role was to handle the heavy lifting so our customer didn’t have to. Our support and professional services teams stepped in immediately to:

  • Monitor and Stabilize: Ensuring the remaining environments were handling the shifted load without issues.
  • Rapid Provisioning: We stood up an entirely new replica environment in a safe region, successfully migrating 1.9Tb of critical data to restore their full redundancy levels in 48 hours.
  • Operational Peace of Mind: Providing the technical assurance that allowed our customer to focus on their humanitarian priorities.

The Value of Being Prepared

The AWS outage is a reminder that “Disaster Recovery” isn’t just a checkbox in a contract; it is a promise of continuity to your end users. Whether it’s a regional infrastructure failure or a technical glitch, the goal is the same: to make the underlying technology invisible so the service remains invincible.

At MariaDB, we are proud to stand behind the brands that keep the world moving, ensuring that even in the most challenging times, their service never falters. 

Maybe you cannot prevent disaster, though you can certainly prepare for it. It is up to you: do you want to be remembered for your service or for your failure to deliver?