Join the Community

23,986

Expert opinions

40,655

Total members

365

New members (last 30 days)

205

New opinions (last 30 days)

29,266

Total comments

Join Sign in

Building Resilient Architecture Patterns on AWS Cloud: Strategies and Use Case

1 Like 1 26 September 2025 1 comment

Sonali Patil

Cloud Solution Architect

TCS

Introduction

This blog explores the architecture patterns for building resilient architecture on AWS Cloud. In the banking & insurance domain, challenges have been observed during design phase of application migration where applications needed either active-passive DR setup, active-active setup, phase wise migration for active-active setup, active-standby solutions in single DC and so on. Expectations from business varies around building application availability, scalability and fault tolerance depending on various use cases. And that is where building resilient architecture patterns plays a vital role during design phase.

Resilient architecture is the practice to design applications which are capable to operate without impacting end users, automatically/manually failover from failures, building recovery solutions in advance if system fails to perform, detecting faults and building distributed systems, scale in/out when needed etc. AWS cloud has broad set of services which supports both infrastructure and managed services to build resilient architecture on cloud.

In this blog, we’ll explore key resilient architecture patterns, how they are implemented on AWS, and a real-life use case demonstrating these concepts in action.

Patterns

Let’s look at the effective patterns you can adopt for resilient design on AWS.

1. Application using single AZ deployment

If your application requirement is single AZ deployment which will also ensure availability in case of failure within hours of RTO/RPO then you can use

AWS Services:

EC2 instance as standby upon instance failure
AMI for quick deployment
Snapshots for EBS backup
Amazon S3 with lifecycle policies
EC2 DB on standby/Amazon RDS backup data, cluster configuration
Route 53 with failover routing along with ALB load balancing

Benefit: If instances fail, standby can become active and load balancing can redirect traffic automatically to a healthy environment or automating start of standby instance in the absence of LB will ensure environment availability within RTO/RPO window.

2. Application using Multi-AZ deployment & Multi region deployment

If your application requirement is deploying active-active setup within 2 DC in single region with RTO/RPO of 15 mins or multi-region, active-active setup with RTO/RPO nearly zero, then you can use

AWS Services:

Amazon RDS Multi-AZ for failover while there is in build cross region data replication feature
Amazon S3 is a global service and will be available on single AZ failure while region failure is supported using (S3 CRR) cross region replication
Route 53 with latency-based or failover routing
ELB for load balancing and routing request to another AZ
Auto scaling for automatically scale in/scale out instances
ECS/EKS with Auto Scaling groups replaces failed instances and maintains performance and availability
Backup, Snapshots, AMI for data/instance recovery
Amazon SQS (message queues) for distributed architecture
Amazon SNS (pub/sub) for notifications and alerts
Amazon EventBridge for notification and building services to recover
AWS Lambda with retry strategies
Amazon API Gateway with throttling and routing API requests without being overwhelmed on peak traffic
Elasticache (Redis/Memcached) for cached data when real-time data service is down
AWS Code Deploy & API Gateway Blue/Green for deploying new versions alongside existing and switch/test code
Amazon CloudWatch (metrics, logs, alarms) for monitoring systems, detecting faults and automating recovery with minimal downtime

Benefit: If one AZ or region fails, traffic can be redirected automatically to a healthy environment.

Use-Case: Real life example for one of the money transfer application

A money transfer company with global customers wants to ensure its platform is highly available, scalable, and resilient with multi region deployment

Architectural Components & Patterns Used:

Component	Pattern	AWS Service	Resilience Role
Web Layer	Auto scaling, Multi-AZ	EC2 + ALB + Auto scaling	Handles traffic surges and AZ failures
API Layer	Circuit Breaker + Graceful Degradation	API Gateway + Lambda + EventBridge + RDS	Reduces pressure on downstream services, distributed architecture
Batch Processing	Queue-based decoupling	S3 + Amazon SQS + Lambda + RDS	Ensure files are not lost even if downstream fails
Database	Multi-AZ + Multi Region +CRR data	Amazon RDS PostgreSQL	Provides automated failover, cross region data replication
Traffic routing	Automated failover	Route 53 + ALB	Failover Policies
Monitoring	Observability + Auto Recovery	CloudWatch+ SNS + Lambda + Systems Manager	Detects and recovery from anomalies
Application Migration	Phase wise migration	Route53	Percentage based routing
Change Requests	Code Deployment +Testing	API Gateway + EC2 + autoscaling in another subnet	1% traffic routing for testing new deployment

Outcomes:

During a peak event, EC2 instances are scaled from 5 to 7 within minutes using Auto Scaling.
If one AZ went offline - ALB has automatically rerouted traffic to healthy AZs.
API gateway directed 1% traffic to production instances in another subnet for testing new changes without disturbing 99% traffic routing to current deployment.
AWS Data replication in-build feature supported data availability
Code deployment had been automated using AWS Cloud formation, AWS catalog and AWS CICD pipeline tools
Distributed architecture for batch processing aided system availability

Conclusion

Resilience architecture is achieved using best practices and design the architecture using broad sets of AWS services. Adopting resilient architecture patterns helps ensure your applications stay available, responsive, and scalable.

Resources