AWS Introduces DNS Failover for US East Region—Acknowledging Its Reliability Problem
Amazon Web Services has launched a new feature allowing customers to make DNS changes within 60 minutes during service disruptions in its US East (N. Virginia) region. The announcement tacitly acknowledges what many have long observed: AWS's largest and most critical region has a reliability problem.

The New Feature
Amazon Route 53 Accelerated Recovery for managing public DNS records provides a 60-minute recovery time objective (RTO) during service disruptions specifically in the US East region. The feature maintains access to essential Route 53 API operations during regional outages, including ChangeResourceRecordSets, GetChange, ListHostedZones, and ListResourceRecordSets.
AWS frames this as responding to customer needs, particularly from regulated industries like banking, FinTech, and SaaS organisations requiring confidence they can make DNS changes during unexpected regional disruptions. This allows them to quickly provision standby cloud resources or redirect traffic when needed.
The implementation is straightforward. Customers can enable accelerated recovery through the AWS Management Console, CLI, SDKs, or infrastructure-as-code tools like CloudFormation and CDK. There's no additional cost, and it works with existing Route 53 setups without requiring application or script modifications.
What the Feature Actually Means
The 60-minute RTO reveals an uncomfortable truth: AWS expects US East disruptions severe enough to prevent DNS changes for up to an hour. During that time, customers cannot provision new infrastructure or redirect traffic flows—leaving applications vulnerable and businesses unable to respond to crises.
Sixty minutes represents substantial potential for widespread outages and service interruptions. For organisations running mission-critical applications, an hour without ability to modify DNS records or provision failover infrastructure can translate into significant financial losses, reputational damage, and regulatory compliance failures.
The feature targets "DNS changes that customers can make within 60 minutes of a service disruption" rather than guaranteeing immediate availability during problems. This language suggests AWS anticipates scenarios where even this backstop capability takes time to activate.
US East's Troubled History
The mere existence of this feature speaks volumes about US East's reliability track record. Recent problems include the DynamoDB DNS failure on 20th October that brought down services globally, followed by VM problems days later. Significant outages occurred in 2021 and 2023 as well.
As far back as 2022, analyst firm Gartner warned customers that US East represents a weak point in AWS that impairs its ability to handle crises. Despite this warning and repeated incidents, problems have continued.
AWS has previously told The Register that US East's scale isn't less reliable than other regions, but operates at such colossal scale that it stresses cloud services more severely than smaller installations. This explanation essentially admits that size creates reliability challenges AWS hasn't fully solved.
The Timing
Less than six weeks after an especially severe US East outage earned AWS substantial criticism, the cloud giant has found a way to increase resilience. The timing suggests the October DynamoDB failure—which affected services worldwide despite many running in other regions—prompted AWS to prioritise this capability.
That incident exposed how US East's role as home to AWS's common control plane creates systemic vulnerability. Even organisations running workloads exclusively in European regions experienced failures because critical management functions depend on US East infrastructure.
What This Doesn't Solve
Accelerated recovery addresses one specific problem: maintaining ability to make DNS changes during US East disruptions. It doesn't prevent those disruptions from occurring, doesn't reduce their blast radius, and doesn't address the fundamental architectural decisions that make US East so critical to AWS's global operations.
Customers still face exposure to US East failures affecting services beyond DNS management. The control plane functions, global services, and interdependencies that caused October's cascade of failures remain unchanged. This feature provides a limited backstop for one specific capability rather than comprehensive resilience.
The 60-minute RTO also means organisations must still plan for substantial periods without DNS management capability during severe incidents. Business continuity planning cannot assume immediate failover—there's still a window where critical changes cannot be made.
The Broader Context
This announcement fits a pattern where cloud providers add resilience features reactively after high-profile failures rather than proactively architecting systems to prevent such failures. The October DynamoDB incident demonstrated that architectural decisions made years ago create systemic vulnerabilities that bolt-on features cannot fully address.
For customers, accelerated recovery represents welcome additional protection but doesn't eliminate US East risk. Organisations running mission-critical applications must still assume US East failures will occur and plan accordingly—including potential 60-minute windows without DNS management capability.
What Customers Should Do
Enable the feature. There's no cost and no downside to activating accelerated recovery for Route 53 hosted zones. It provides additional protection during US East disruptions even if it doesn't eliminate all risk.
Don't rely on it exclusively. Accelerated recovery is a backstop, not a comprehensive solution. Business continuity planning should still account for US East failures and include strategies that don't depend solely on rapid DNS changes.
Understand the limitations. A 60-minute RTO means substantial potential for service disruption before recovery capabilities activate. Plan for scenarios where you cannot make DNS changes for up to an hour during severe incidents.
Consider multi-region and multi-cloud strategies. Whilst accelerated recovery helps, true resilience may require architectures that don't depend so heavily on any single region or cloud provider.
Monitor AWS's US East investments. This feature acknowledges reliability concerns but doesn't address root causes. Watch for announcements about architectural improvements that might reduce US East's systemic importance.
The Uncomfortable Reality
AWS's launch of DNS failover capabilities specifically for US East represents an implicit admission that the region's reliability falls short of customer needs. The feature is welcome and provides genuine value, but its existence highlights the ongoing challenge of operating cloud infrastructure at unprecedented scale.
For organisations dependent on AWS, this announcement serves as a reminder that even the largest cloud providers face reliability challenges they haven't fully solved. Planning for cloud provider failures—not just individual service outages—remains essential for true business resilience.
Navigate Multi-Cloud Resilience
At Altiatech, we help organisations design cloud strategies that account for provider-level failures and regional vulnerabilities. Our cloud services expertise spans AWS, Azure, and Google Cloud Platform, enabling architectures that maintain operations even when individual regions or providers experience disruptions.
From multi-region deployment strategies to genuine multi-cloud architectures, we provide the expertise needed to build resilience commensurate with your business requirements.
Get in touch:
📧 Email:
innovate@altiatech.com
📞 Phone (UK): +44 (0)330 332 5482
Build resilience beyond single regions. Plan for reality.












