
Building for resilience: Runpod’s response to the AWS us-east-1 outage
An AWS us-east-1 outage degraded Runpod’s control plane, but Pods kept running with no data loss, and within 72 hours we added multi-region failover, cached Serverless configs, corrected charges, and started a partitioned multi-region migration on Runpod’s provider network.
AI Infrastructure

