On Monday, October 20, 2025, the internet felt like it was stuck in slow motion. Apps froze, websites failed to load, and even a Premier League soccer game was disrupted. The culprit? A massive Amazon Web Services, AWS outage that rippled across global industries—impacting everything from social media platforms like Snapchat to British banking systems and ticket sales for Tottenham Hotspur’s home games.
This AWS outage wasn’t just another “brief glitch.” Downdetector logged over 17 million user reports across 60 countries—a +970% surge from baseline traffic. It was the largest internet outage since Meta’s 2021 BGP failure. But what exactly went wrong? The answer reveals both the complexity and fragility of cloud dependency today.
When the Cloud’s Beating Heart Faltered
At approximately 07:11 GMT, AWS’s US-EAST-1 data center in Northern Virginia—its oldest and most critical region—faltered. This region hosts billions of API calls daily, powering services from Alexa to global banking apps. During a scheduled update to DynamoDB’s API, a misconfiguration triggered cascading Domain Name System (DNS) errors.
DNS acts as the internet’s “phonebook,” translating friendly URLs into numeric addresses computers can read. When AWS’s DNS resolution buckled, servers could no longer locate their destinations. That single failure immobilized interconnected services across sectors—from Spotify streams to smart home systems.
Within two hours, 113 AWS services were impacted. The knock-on effects crippled essential apps like Venmo, Reddit, and Duolingo, as well as core retail and logistics systems under Amazon’s own umbrella. By 10:11 GMT, most services were restored, but the internet’s backbone had just been exposed as dangerously centralized.
Why One Region Brought Down the World
The AWS outage began in US-EAST-1, but its effects were global. This region functions as AWS’s nerve center, where numerous customer apps—by design or oversight—host their default workloads. Despite widespread recommendations for multi-region redundancy, many businesses never implemented it, primarily due to cost and architectural inertia.
This left organizations vulnerable to single points of logical failure, as highlighted in Ookla’s post-outage report. Even the most resilient infrastructures can “topple under concentration risk” when overdependent on one provider—or worse, one region.
Experts compare this event to a financial system collapse: the cloud is now systemic. As Cornell’s Ken Birman remarked to BBC, “Companies using Amazon haven’t been taking enough care to build protection systems into their applications.” In simpler terms, overreliance on convenience beat out design caution.
AWS outage: Real-World Impacts Across Industries
The outage’s impact was staggering, cutting across critical verticals and even spilling into daily life:
- Finance: Trading apps such as Robinhood and Coinbase went offline. UK banks like Lloyds and Halifax saw mobile banking failures, impacting transactions and customer support wait times.
- Aviation and Transport: Airlines like United and Delta experienced check-in system slowdowns, delaying flights and disrupting travel management systems reliant on real-time API calls.
- Sports and Entertainment: The Premier League’s VAR systems and Tottenham’s ticketing operations were disrupted, while North American fans faced access issues for MLB and NFL events. Betting marketplaces such as FanDuel and DraftKings went dark—locking users out mid-wager.
- Consumer Apps: Fitness app Strava was offline for hours, prompting the quip, “If it’s not on Strava, did it even happen?” Meanwhile, smart bed maker Eight Sleep reported that some users’ beds froze mid-adjustment due to loss of internet connection.
Each of these cases reflects the intricate dependences hidden under every digital interaction today.
A Deep Dive Into The Root Cause
AWS later confirmed that a faulty automation process compounded the DNS issue. This automation glitch prevented correct synchronization between internal monitoring subsystems and external API endpoints. When the automation began deploying its update to DynamoDB’s API, an error created a recursive feedback loop—essentially directing servers to “nowhere.”
Compounding the chaos, the network health monitor—a subsystem responsible for checking service stability—malfunctioned simultaneously. This led AWS systems to interpret healthy services as failing ones, triggering unnecessary shutdowns and throttles.
While AWS restored operations within hours, recovery for customers took much longer. Large enterprises with clustered workloads faced latency spikes for up to 10 hours, while smaller SaaS platforms dependent on DynamoDB replication experienced data synchronization issues for over a day.
Lessons in Cloud Vulnerability
The October 2025 AWS outage underscored one uncomfortable truth: resilience isn’t evenly distributed across the internet. For all its sophistication, cloud architecture remains prone to cascading failure.
Experts say “the cloud’s convenience culture” feeds complacency. Many organizations migrate workloads without redesigning for distributed resilience. The assumption that AWS—or any major hyperscaler—offers infinite uptime is now outdated.
Wired’s analysis summed it up bluntly: outages like these are “nearly unavoidable given the scale,” but the duration of this incident “serves as a warning.” The takeaway isn’t just that outages happen—it’s about how long they hurt.
The Customer Experience Fallout
From a customer experience (CX) lens, this outage was an acid test of trust. When users can’t access daily essentials—banking, communication, travel—the emotional impact far outweighs technical downtime metrics.
For affected companies, CX failure wasn’t just about service interruption. It shattered perceived reliability—a key component of customer loyalty. Consider Duolingo promising to preserve “learning streaks” to maintain user continuity or Lloyds Bank deploying heavy frontline resources to handle complaints. In both cases, perception management was as critical as restoration.
This event also exposed the experience silos within cloud-dependent companies. Many support teams lacked real-time visibility into upstream issues, leading to misinformation or silence—two deadly CX sins during a crisis.

What CX and EX Leaders Can Learn
Outages like this one aren’t mere operational footnotes—they’re defining moments for brand trust. For CX and EX leaders, the AWS incident offers several actionable takeaways:
- Map Your Cloud Dependencies: Document exactly which AWS services your apps depend on. Visibility is the first step toward resilience.
- Design for Multiregion Continuity: Use geographically distributed backups and “failover” regions. Consider a hybrid-cloud or multi-cloud approach to mitigate concentration risk.
- Empower Internal Communication: Train frontline experience teams to handle outage crises with clarity and empathy. Silence or scripted apologies deepen frustration.
- Build Transparent Status Ecosystems: Automatic, real-time public dashboards restore customer confidence faster than PR statements.
- Partner Beyond Cloud Reliance: Diversify critical workloads. The future CX strategy must include resilience at the infrastructure level.
Turning Outage Chaos Into Strategic Insight
The 2025 AWS outage wasn’t just a technical meltdown—it was a mirror reflecting organizational overdependence, fragile continuity design, and reactive communication.
Brands that learn from it will reimagine experience ecosystems not just for uptime but for graceful degradation—the ability to fail smartly. As digital-first experiences increasingly define brand equity, designing for “contained failure” becomes a competitive advantage.
Resilience, after all, isn’t about preventing the next outage. It’s about building systems—and customer relationships—that can bend without breaking.
For every CX and EX professional, that’s the real wake-up call.
