AWS Introduces DNS Failover for US East Region—Acknowledging Its Reliability Problem

November 28, 2025

Amazon Web Services has launched a new feature allowing customers to make DNS changes within 60 minutes during service disruptions in its US East (N. Virginia) region. The announcement tacitly acknowledges what many have long observed: AWS's largest and most critical region has a reliability problem.

The New Feature

Amazon Route 53 Accelerated Recovery for managing public DNS records provides a 60-minute recovery time objective (RTO) during service disruptions specifically in the US East region. The feature maintains access to essential Route 53 API operations during regional outages, including ChangeResourceRecordSets, GetChange, ListHostedZones, and ListResourceRecordSets.


AWS frames this as responding to customer needs, particularly from regulated industries like banking, FinTech, and SaaS organisations requiring confidence they can make DNS changes during unexpected regional disruptions. This allows them to quickly provision standby cloud resources or redirect traffic when needed.


The implementation is straightforward. Customers can enable accelerated recovery through the AWS Management Console, CLI, SDKs, or infrastructure-as-code tools like CloudFormation and CDK. There's no additional cost, and it works with existing Route 53 setups without requiring application or script modifications.



What the Feature Actually Means

The 60-minute RTO reveals an uncomfortable truth: AWS expects US East disruptions severe enough to prevent DNS changes for up to an hour. During that time, customers cannot provision new infrastructure or redirect traffic flows—leaving applications vulnerable and businesses unable to respond to crises.


Sixty minutes represents substantial potential for widespread outages and service interruptions. For organisations running mission-critical applications, an hour without ability to modify DNS records or provision failover infrastructure can translate into significant financial losses, reputational damage, and regulatory compliance failures.


The feature targets "DNS changes that customers can make within 60 minutes of a service disruption" rather than guaranteeing immediate availability during problems. This language suggests AWS anticipates scenarios where even this backstop capability takes time to activate.



US East's Troubled History

The mere existence of this feature speaks volumes about US East's reliability track record. Recent problems include the DynamoDB DNS failure on 20th October that brought down services globally, followed by VM problems days later. Significant outages occurred in 2021 and 2023 as well.


As far back as 2022, analyst firm Gartner warned customers that US East represents a weak point in AWS that impairs its ability to handle crises. Despite this warning and repeated incidents, problems have continued.


AWS has previously told The Register that US East's scale isn't less reliable than other regions, but operates at such colossal scale that it stresses cloud services more severely than smaller installations. This explanation essentially admits that size creates reliability challenges AWS hasn't fully solved.



The Timing

Less than six weeks after an especially severe US East outage earned AWS substantial criticism, the cloud giant has found a way to increase resilience. The timing suggests the October DynamoDB failure—which affected services worldwide despite many running in other regions—prompted AWS to prioritise this capability.


That incident exposed how US East's role as home to AWS's common control plane creates systemic vulnerability. Even organisations running workloads exclusively in European regions experienced failures because critical management functions depend on US East infrastructure.



What This Doesn't Solve

Accelerated recovery addresses one specific problem: maintaining ability to make DNS changes during US East disruptions. It doesn't prevent those disruptions from occurring, doesn't reduce their blast radius, and doesn't address the fundamental architectural decisions that make US East so critical to AWS's global operations.


Customers still face exposure to US East failures affecting services beyond DNS management. The control plane functions, global services, and interdependencies that caused October's cascade of failures remain unchanged. This feature provides a limited backstop for one specific capability rather than comprehensive resilience.


The 60-minute RTO also means organisations must still plan for substantial periods without DNS management capability during severe incidents. Business continuity planning cannot assume immediate failover—there's still a window where critical changes cannot be made.



The Broader Context

This announcement fits a pattern where cloud providers add resilience features reactively after high-profile failures rather than proactively architecting systems to prevent such failures. The October DynamoDB incident demonstrated that architectural decisions made years ago create systemic vulnerabilities that bolt-on features cannot fully address.


For customers, accelerated recovery represents welcome additional protection but doesn't eliminate US East risk. Organisations running mission-critical applications must still assume US East failures will occur and plan accordingly—including potential 60-minute windows without DNS management capability.



What Customers Should Do

Enable the feature. There's no cost and no downside to activating accelerated recovery for Route 53 hosted zones. It provides additional protection during US East disruptions even if it doesn't eliminate all risk.


Don't rely on it exclusively. Accelerated recovery is a backstop, not a comprehensive solution. Business continuity planning should still account for US East failures and include strategies that don't depend solely on rapid DNS changes.

Understand the limitations. A 60-minute RTO means substantial potential for service disruption before recovery capabilities activate. Plan for scenarios where you cannot make DNS changes for up to an hour during severe incidents.

Consider multi-region and multi-cloud strategies. Whilst accelerated recovery helps, true resilience may require architectures that don't depend so heavily on any single region or cloud provider.

Monitor AWS's US East investments. This feature acknowledges reliability concerns but doesn't address root causes. Watch for announcements about architectural improvements that might reduce US East's systemic importance.



The Uncomfortable Reality

AWS's launch of DNS failover capabilities specifically for US East represents an implicit admission that the region's reliability falls short of customer needs. The feature is welcome and provides genuine value, but its existence highlights the ongoing challenge of operating cloud infrastructure at unprecedented scale.


For organisations dependent on AWS, this announcement serves as a reminder that even the largest cloud providers face reliability challenges they haven't fully solved. Planning for cloud provider failures—not just individual service outages—remains essential for true business resilience.



Navigate Multi-Cloud Resilience

At Altiatech, we help organisations design cloud strategies that account for provider-level failures and regional vulnerabilities. Our cloud services expertise spans AWS, Azure, and Google Cloud Platform, enabling architectures that maintain operations even when individual regions or providers experience disruptions.


From multi-region deployment strategies to genuine multi-cloud architectures, we provide the expertise needed to build resilience commensurate with your business requirements.


Get in touch:

📧 Email: innovate@altiatech.com
📞 Phone (UK): +44 (0)330 332 5482




Build resilience beyond single regions. Plan for reality.

January 12, 2026
Multi-cloud strategies deliver flexibility, redundancy, and the ability to select the best platform for each workload. They also create complex security challenges, particularly around identity and access management. Each cloud provider offers different security models, tools, and terminology, making unified security difficult to achieve.
January 5, 2026
Privileged accounts—those with administrative rights to critical systems—represent the most attractive target for attackers. A single compromised privileged credential gives attackers complete control over infrastructure, data, and operations. Yet many organisations manage privileged access inadequately, creating unnecessary risk.
December 22, 2025
Identity and access management represents a critical security capability, yet many organisations struggle to assess whether their IAM implementation is truly effective. Identity governance maturity models provide a framework for evaluation, revealing gaps and priorities for improvement.
December 15, 2025
Traditional security models assumed everything inside the corporate network was trustworthy, focusing defensive efforts on the perimeter. This approach fails catastrophically in today's hybrid work environment where employees access resources from homes, coffee shops, and co-working spaces whilst applications reside across multiple clouds.
Microsoft logo on a wood-paneled wall, with colorful squares and company name.
December 10, 2025
Microsoft is introducing major Microsoft 365 licensing changes in 2026. Learn what’s changing, who is affected and how businesses should prepare.
December 8, 2025
Cloud computing promised cost savings through pay-per-use models and elastic scaling. Yet many UK organisations discover their cloud bills steadily increasing without corresponding business growth. The culprit? Cloud waste - unnecessary spending on unused or inefficiently configured resources.
November 28, 2025
A threat group known as Scattered Lapsus$ Hunters is targeting Zendesk users through a sophisticated campaign involving fake support sites and weaponised helpdesk tickets, according to security researchers at ReliaQuest. The operation represents an evolution in how cybercriminals exploit trust in enterprise SaaS platforms.
November 28, 2025
A Scottish council remains unable to fully restore critical systems two years after a devastating ransomware attack, highlighting the long-term consequences of inadequate cybersecurity preparation and the challenges facing resource-constrained local authorities.  Comhairle nan Eilean Siar, serving Scotland's Western Isles, suffered a ransomware attack in November 2023 that required extensive system reconstruction. According to a report published by Scotland's Accounts Commission, several systems remain unrestored even now, with large data volumes slowing the digital recovery process.
November 26, 2025
Ready to migrate from Windows 10? Contact Altiatech for a comprehensive migration assessment and strategy tailored to your organisation's needs.
November 25, 2025
The Cybersecurity and Infrastructure Security Agency has issued an alert warning that multiple cyber threat actors are actively leveraging commercial spyware to target users of mobile messaging applications including Signal and WhatsApp. The sophisticated campaigns use advanced social engineering and exploit techniques to compromise victims' devices and gain unauthorized access to their communications.