Join the Community

23,986
Expert opinions
40,655
Total members
365
New members (last 30 days)
205
New opinions (last 30 days)
29,266
Total comments

Building Resilient Architecture Patterns on AWS Cloud: Strategies and Use Case

1 Like 1 1 comment

Introduction

This blog explores the architecture patterns for building resilient architecture on AWS Cloud. In the banking & insurance domain, challenges have been observed during design phase of application migration where applications needed either active-passive DR setup, active-active setup, phase wise migration for active-active setup, active-standby solutions in single DC and so on. Expectations from business varies around building application availability, scalability and fault tolerance depending on various use cases. And that is where building resilient architecture patterns plays a vital role during design phase.

Resilient architecture is the practice to design applications which are capable to operate without impacting end users, automatically/manually failover from failures, building recovery solutions in advance if system fails to perform, detecting faults and building distributed systems, scale in/out when needed etc. AWS cloud has broad set of services which supports both infrastructure and managed services to build resilient architecture on cloud.

In this blog, we’ll explore key resilient architecture patterns, how they are implemented on AWS, and a real-life use case demonstrating these concepts in action.

Patterns

Let’s look at the effective patterns you can adopt for resilient design on AWS.

1. Application using single AZ deployment

If your application requirement is single AZ deployment which will also ensure availability in case of failure within hours of RTO/RPO then you can use

AWS Services:

  • EC2 instance as standby upon instance failure
  • AMI for quick deployment
  • Snapshots for EBS backup
  • Amazon S3 with lifecycle policies
  • EC2 DB on standby/Amazon RDS backup data, cluster configuration
  • Route 53 with failover routing along with ALB load balancing

Benefit: If instances fail, standby can become active and load balancing can redirect traffic automatically to a healthy environment or automating start of standby instance in the absence of LB will ensure environment availability within RTO/RPO window.

 2.   Application using Multi-AZ deployment & Multi region deployment

If your application requirement is deploying active-active setup within 2 DC in single region with RTO/RPO of 15 mins or multi-region, active-active setup with RTO/RPO nearly zero, then you can use

AWS Services:

  • Amazon RDS Multi-AZ for failover while there is in build cross region data replication feature
  • Amazon S3 is a global service and will be available on single AZ failure while region failure is supported using (S3 CRR) cross region replication
  • Route 53 with latency-based or failover routing
  • ELB for load balancing and routing request to another AZ
  • Auto scaling for automatically scale in/scale out instances
  • ECS/EKS with Auto Scaling groups replaces failed instances and maintains performance and availability
  • Backup, Snapshots, AMI for data/instance recovery
  • Amazon SQS (message queues) for distributed architecture
  • Amazon SNS (pub/sub) for notifications and alerts
  • Amazon EventBridge for notification and building services to recover
  • AWS Lambda with retry strategies
  • Amazon API Gateway with throttling and routing API requests without being overwhelmed on peak traffic
  • Elasticache (Redis/Memcached) for cached data when real-time data service is down
  • AWS Code Deploy & API Gateway Blue/Green for deploying new versions alongside existing and switch/test code
  • Amazon CloudWatch (metrics, logs, alarms) for monitoring systems, detecting faults and automating recovery with minimal downtime

Benefit: If one AZ or region fails, traffic can be redirected automatically to a healthy environment.

Use-Case: Real life example for one of the money transfer application

A money transfer company with global customers wants to ensure its platform is highly available, scalable, and resilient with multi region deployment

Architectural Components & Patterns Used:

Component

Pattern

AWS Service

Resilience Role

Web Layer

Auto scaling, Multi-AZ

EC2 + ALB + Auto scaling

Handles traffic surges and AZ failures

API Layer

Circuit Breaker + Graceful Degradation

API Gateway + Lambda + EventBridge + RDS

Reduces pressure on downstream services, distributed architecture

Batch Processing

Queue-based decoupling

S3 + Amazon SQS + Lambda + RDS

Ensure files are not lost even if downstream fails

Database

Multi-AZ + Multi Region +CRR data

Amazon RDS PostgreSQL

Provides automated failover, cross region data replication

Traffic routing

Automated failover

Route 53 + ALB

Failover Policies

Monitoring

Observability + Auto Recovery

CloudWatch+ SNS + Lambda + Systems Manager

Detects and recovery from anomalies

Application Migration

Phase wise migration

Route53

 

Percentage based routing

 

Change Requests

Code Deployment +Testing

API Gateway + EC2 + autoscaling in another subnet

1% traffic routing for testing new deployment

Outcomes:

  • During a peak event, EC2 instances are scaled from 5 to 7 within minutes using Auto Scaling.
  • If one AZ went offline - ALB has automatically rerouted traffic to healthy AZs.
  • API gateway directed 1% traffic to production instances in another subnet for testing new changes without disturbing 99% traffic routing to current deployment.
  • AWS Data replication in-build feature supported data availability
  • Code deployment had been automated using AWS Cloud formation, AWS catalog and AWS CICD pipeline tools
  • Distributed architecture for batch processing aided system availability

Conclusion

Resilience architecture is achieved using best practices and design the architecture using broad sets of AWS services. Adopting resilient architecture patterns helps ensure your applications stay available, responsive, and scalable.

Resources

  • AWS Well-Architected Framework – Reliability Pillar

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Join the Community

23,986
Expert opinions
40,655
Total members
365
New members (last 30 days)
205
New opinions (last 30 days)
29,266
Total comments

Trending

Carlo R.W. De Meijer

Carlo R.W. De Meijer The Meyer Financial Services Advisory (MIFS) at MIFSA

Europe’s digital payments push: Consortium of EU banks launch euro-based stablecoin

Alex Malyshev

Alex Malyshev CEO, Co-founder at SDK.finance, FinTech software provider

High-Volume Transactions: Essential Benchmark or Industry Hype?

Anurag Mohapatra

Anurag Mohapatra Director of Fraud Strategy and Marketing at NICE Actimize

The High Stakes of Check Kiting: How Old School Fraud Exploits FIs

Now Hiring