Uncategorized
8 min read

AWS Outage Reminder: Business Continuity & Disaster Recovery

December 15, 2025 By Al Kao

The digital world came to a halt early morning on Monday October 20, 2025 when a major technical service interruption at Amazon Web Services (AWS) triggered a significant outage. Fortunately, this incident wasn’t the result of a sophisticated cyberattack, but rather a technical failure. However, the service disruption caused problems for companies worldwide.

Companies like Uber, Lyft, Coinbase, Fortnite, Robinhood were all down due to the outage. The UK tax payment systems were also down. Some analysts are already forecasting that the cost to service disruption could be “hundreds of billions.”

The initial cybersecurity reminder is that companies are built around a fragile infrastructure. That companies are too reliant on a small number of systems and tools that themselves are subject to breaking, but which can spread contagion when disrupted.

However, that is more philosophical than practical. The reality for corporations and many SMBs is that AWS is a critical technical partner and solution. Not only that, for many SMBs, AWS is a direct solution for them as well as their third party vendors.

For an SMB, a few hours of downtime can be catastrophic, leading to lost revenue, missed deadlines, damaged customer trust, and idle, frustrated staff.

The critical question this AWS outage exposes is not how to mitigate disruption in services but rather, how to ensure business continuity?

When you migrate to AWS, you are entering into a Shared Responsibility Model. AWS is responsible for the security of the cloud—the physical hardware, data centers, and global network. However, you, your company, are responsible for the security in the cloud—your data, your configurations, your identity access, and, crucially, your application’s ability to survive a service degradation. A technical failure at AWS, whether localized or regional, becomes your business problem unless you have planned for it.

This is where Business Continuity (BC) and Disaster Recovery (DR) planning shift from a compliance requirement to a survival necessity. DR focuses on the technical restoration of your systems, like getting your website back online, while BC ensures the entire business can continue functioning, such as processing sales or communicating with customers, during the downtime. The goal is simple: ensure your entire business infrastructure does not have a single point of failure tied to a massive cloud provider.

Resiliency Checklist: 7 Steps to Safeguard Your AWS Environment

Here is a practical checklist of IT and cybersecurity measures that an SMB can implement today to mitigate the impact of future AWS outages.

1. Embrace Multi-Availability Zone (Multi-AZ) Redundancy

Every critical resource, from your virtual servers (EC2) to your managed databases (RDS), should be deployed across at least two Availability Zones (AZs) within your primary AWS Region.

Mitigation: If a hardware or network issue takes down one AZ, traffic automatically fails over to the healthy AZs, often with zero application disruption.

2. Implement Cross-Region Backups for Disaster Recovery

If an entire AWS Region were to suffer a major failure, relying on backups stored in that same region is pointless.

Mitigation: Use services like AWS Backup or Amazon S3 replication to automatically copy all critical data and configuration snapshots to a secondary, geographically distant AWS Region or even an entirely separate cloud provider (a multi-cloud strategy).

3. Define and Meet RTO and RPO Targets

A recovery strategy is only as good as its objectives. You must clearly define:

  • Recovery Time Objective (RTO): The maximum amount of time your business can tolerate being down (e.g., 2 hours).
  • Recovery Point Objective (RPO): The maximum amount of data loss your business can tolerate (e.g., 15 minutes of transactional data).

Mitigation: These metrics dictate your chosen DR architecture (Pilot Light, Warm Standby, etc.), ensuring you invest appropriately to meet your business’s true continuity needs.

4. Test Your Disaster Recovery Plan Regularly

An untested plan is just a theory. SMBs must schedule and execute non-disruptive DR drills at least once or twice a year.

Mitigation: Testing validates that your failover mechanisms (like DNS updates or scaling rules) work as intended, and ensures your staff knows the procedures when panic sets in.

5. Automate Scaling and Load Balancing

Human intervention during a massive outage is slow and prone to error. Automation is your best defense.

Mitigation: Use Auto Scaling Groups to automatically replace failed instances and Elastic Load Balancing (ELB) to intelligently distribute incoming customer traffic only to healthy application instances across all available zones.

6. Enforce Strong Identity and Access Management (IAM)

While a technical failure isn’t a cyberattack, poor security posture can still amplify the outage impact. Grant users and services only the permissions they absolutely need (the Principle of Least Privilege).

Mitigation: Restricting powerful permissions reduces the chance that a single, unauthorized user or a technical bug in one service can accidentally trigger a cascading configuration change across your entire environment. Multi-Factor Authentication (MFA) is mandatory for all administrative access.

7. Create a Non-Technical Business Continuity Communication Plan

The true test of business continuity often lies outside the tech stack.

Mitigation: Prepare a non-technical plan detailing:

  • External Communication: Pre-written social media posts and emails to inform customers immediately.
  • Internal Communication: Instructions on how staff will communicate with each other (e.g., via a separate, non-cloud-dependent messaging app).
  • Manual Processes: Identify essential, revenue-generating functions that can temporarily revert to manual, paper, or local-system processes until cloud services are restored.

Conclusion: Resilience is a Strategy, Not a Feature

The recent AWS outage served as a crucial wake-up call. It proves that no cloud infrastructure, regardless of its scale, is perfectly immune to failure. For SMBs, adopting a resilient architecture and establishing a documented, tested Business Continuity and Disaster Recovery plan is not an optional add-on—it is a core business strategy.

By implementing this checklist, you stop relying solely on a vendor’s promise of uptime and take full control over your business’s ability to withstand and quickly recover from the inevitable next disruption.

Sources: