AWS Introduces DNS Failover for US East Region—Acknowledging Its Reliability Problem

November 28, 2025

Amazon Web Services has launched a new feature allowing customers to make DNS changes within 60 minutes during service disruptions in its US East (N. Virginia) region. The announcement tacitly acknowledges what many have long observed: AWS's largest and most critical region has a reliability problem.

The New Feature

Amazon Route 53 Accelerated Recovery for managing public DNS records provides a 60-minute recovery time objective (RTO) during service disruptions specifically in the US East region. The feature maintains access to essential Route 53 API operations during regional outages, including ChangeResourceRecordSets, GetChange, ListHostedZones, and ListResourceRecordSets.


AWS frames this as responding to customer needs, particularly from regulated industries like banking, FinTech, and SaaS organisations requiring confidence they can make DNS changes during unexpected regional disruptions. This allows them to quickly provision standby cloud resources or redirect traffic when needed.


The implementation is straightforward. Customers can enable accelerated recovery through the AWS Management Console, CLI, SDKs, or infrastructure-as-code tools like CloudFormation and CDK. There's no additional cost, and it works with existing Route 53 setups without requiring application or script modifications.



What the Feature Actually Means

The 60-minute RTO reveals an uncomfortable truth: AWS expects US East disruptions severe enough to prevent DNS changes for up to an hour. During that time, customers cannot provision new infrastructure or redirect traffic flows—leaving applications vulnerable and businesses unable to respond to crises.


Sixty minutes represents substantial potential for widespread outages and service interruptions. For organisations running mission-critical applications, an hour without ability to modify DNS records or provision failover infrastructure can translate into significant financial losses, reputational damage, and regulatory compliance failures.


The feature targets "DNS changes that customers can make within 60 minutes of a service disruption" rather than guaranteeing immediate availability during problems. This language suggests AWS anticipates scenarios where even this backstop capability takes time to activate.



US East's Troubled History

The mere existence of this feature speaks volumes about US East's reliability track record. Recent problems include the DynamoDB DNS failure on 20th October that brought down services globally, followed by VM problems days later. Significant outages occurred in 2021 and 2023 as well.


As far back as 2022, analyst firm Gartner warned customers that US East represents a weak point in AWS that impairs its ability to handle crises. Despite this warning and repeated incidents, problems have continued.


AWS has previously told The Register that US East's scale isn't less reliable than other regions, but operates at such colossal scale that it stresses cloud services more severely than smaller installations. This explanation essentially admits that size creates reliability challenges AWS hasn't fully solved.



The Timing

Less than six weeks after an especially severe US East outage earned AWS substantial criticism, the cloud giant has found a way to increase resilience. The timing suggests the October DynamoDB failure—which affected services worldwide despite many running in other regions—prompted AWS to prioritise this capability.


That incident exposed how US East's role as home to AWS's common control plane creates systemic vulnerability. Even organisations running workloads exclusively in European regions experienced failures because critical management functions depend on US East infrastructure.



What This Doesn't Solve

Accelerated recovery addresses one specific problem: maintaining ability to make DNS changes during US East disruptions. It doesn't prevent those disruptions from occurring, doesn't reduce their blast radius, and doesn't address the fundamental architectural decisions that make US East so critical to AWS's global operations.


Customers still face exposure to US East failures affecting services beyond DNS management. The control plane functions, global services, and interdependencies that caused October's cascade of failures remain unchanged. This feature provides a limited backstop for one specific capability rather than comprehensive resilience.


The 60-minute RTO also means organisations must still plan for substantial periods without DNS management capability during severe incidents. Business continuity planning cannot assume immediate failover—there's still a window where critical changes cannot be made.



The Broader Context

This announcement fits a pattern where cloud providers add resilience features reactively after high-profile failures rather than proactively architecting systems to prevent such failures. The October DynamoDB incident demonstrated that architectural decisions made years ago create systemic vulnerabilities that bolt-on features cannot fully address.


For customers, accelerated recovery represents welcome additional protection but doesn't eliminate US East risk. Organisations running mission-critical applications must still assume US East failures will occur and plan accordingly—including potential 60-minute windows without DNS management capability.



What Customers Should Do

Enable the feature. There's no cost and no downside to activating accelerated recovery for Route 53 hosted zones. It provides additional protection during US East disruptions even if it doesn't eliminate all risk.


Don't rely on it exclusively. Accelerated recovery is a backstop, not a comprehensive solution. Business continuity planning should still account for US East failures and include strategies that don't depend solely on rapid DNS changes.

Understand the limitations. A 60-minute RTO means substantial potential for service disruption before recovery capabilities activate. Plan for scenarios where you cannot make DNS changes for up to an hour during severe incidents.

Consider multi-region and multi-cloud strategies. Whilst accelerated recovery helps, true resilience may require architectures that don't depend so heavily on any single region or cloud provider.

Monitor AWS's US East investments. This feature acknowledges reliability concerns but doesn't address root causes. Watch for announcements about architectural improvements that might reduce US East's systemic importance.



The Uncomfortable Reality

AWS's launch of DNS failover capabilities specifically for US East represents an implicit admission that the region's reliability falls short of customer needs. The feature is welcome and provides genuine value, but its existence highlights the ongoing challenge of operating cloud infrastructure at unprecedented scale.


For organisations dependent on AWS, this announcement serves as a reminder that even the largest cloud providers face reliability challenges they haven't fully solved. Planning for cloud provider failures—not just individual service outages—remains essential for true business resilience.



Navigate Multi-Cloud Resilience

At Altiatech, we help organisations design cloud strategies that account for provider-level failures and regional vulnerabilities. Our cloud services expertise spans AWS, Azure, and Google Cloud Platform, enabling architectures that maintain operations even when individual regions or providers experience disruptions.


From multi-region deployment strategies to genuine multi-cloud architectures, we provide the expertise needed to build resilience commensurate with your business requirements.


Get in touch:

📧 Email: innovate@altiatech.com
📞 Phone (UK): +44 (0)330 332 5482




Build resilience beyond single regions. Plan for reality.

Ready to move from ideas to delivery?


Whether you’re planning a cloud change, security uplift, cost governance initiative or a digital delivery programme, we can help you shape the scope and the right route to market.


Email:
innovate@altiatech.com or call 0330 332 5842 (Mon–Fri, 9am–5:30pm).


Main contact page: https://www.altiatech.com/contact

Person using a calculator with a tablet on a wooden table.
By Wafik Rozeik February 25, 2026
Examines AI-augmented attacks targeting FortiGate devices at scale, what the risks mean for organisations, and the immediate steps to strengthen security.
Digital, pixelated person with red data streams, facing forward. Cyberpunk, data glitch effect.
By Simon Poole February 24, 2026
Examines AI-augmented attacks targeting FortiGate devices at scale, what the risks mean for organisations, and the immediate steps to strengthen security.
Person typing on laptop, cloud computing displayed on the screen, on a wooden table.
By Wafik Rozeik February 23, 2026
Explains why AI spend behaves differently and how anomaly management is becoming essential in FinOps to control costs, reduce risk, and improve cloud visibility.
Hand holding a phone displaying the Microsoft Copilot logo with the Microsoft logo blurred in the background.
By Simon Poole February 18, 2026
A practical governance checklist for Microsoft Copilot in 2026, using the Copilot Control System to manage risk, security, compliance, and oversight.
Route to market diagram: Bank to delivery platform, with steps like product mgmt and customer support.
By Simon Poole February 12, 2026
Explains what the Technology Services 4 (TS4) framework means for public sector buyers and how to procure Altiatech services through compliant routes.
Two people shaking hands between cloud data and data analytics dashboards.
By Simon Poole February 10, 2026
Explores where IT waste really comes from and how FinOps helps organisations regain control of cloud spend, improve efficiency, and turn cost visibility into advantage.
People discussing data and cloud infrastructure, near a government building.
By Simon Poole February 9, 2026
An overview of CCS Digital Outcomes 7 explaining Altiatech’s routes to market and how public sector organisations can procure services.
January 26, 2026
Cyberattacks, system failures, natural disasters, and human errors will occur—the question isn't if but when. Cyber resilience planning ensures organisations can withstand incidents, maintain critical operations during disruptions, and recover quickly when systems fail. It's not just about preventing attacks; it's about ensuring business continuity regardless of what goes wrong.
January 19, 2026
Manual user provisioning - the process of creating accounts and granting access through email requests and IT tickets - seems manageable for small organisations. As organisations grow, this approach creates mounting security risks, operational inefficiencies, and frustrated users waiting days for access they need immediately.
January 12, 2026
Multi-cloud strategies deliver flexibility, redundancy, and the ability to select the best platform for each workload. They also create complex security challenges, particularly around identity and access management. Each cloud provider offers different security models, tools, and terminology, making unified security difficult to achieve.