Hot starts, batch inference, and what's next for Runpod Serverless. Webinar June 25.

Runpod Serverless Pricing Update

Runpod introduces new Serverless pricing with Flex and Active worker types, offering better scalability and up to 40% lower costs for consistent workloads.

Runpod Serverless Pricing Update

We have some good news! We're revamping Serverless pricing to improve our user experience for individuals, startups, and enterprises. The bad news is that if you haven't moved your cloud compute workloads to Runpod yet, that decision might keep you up at night!

With new price changes, we are introducing two different types of Serverless workers to tackle many different use cases. Each worker offers additional concurrency and can handle 1 request at a time or multiple based on your use case.

  • Flex Workers - These handle spikes in your workload and allow you to support higher throughputs without impacting your users. The sum of your Flex and Active workers represents the maximum throughput your Serverless endpoint can support. You can allow your endpoint to scale down to 0 by using only Flex workers.
  • Active Workers - These handle consistent workloads and run 24/7 at much lower costs. Minimum workers will be updated and labeled as Active workers.
Pricing Per Second
GPU Size GPU Type Flex Active (-40%)
16 GBA4000$0.0002$0.00012
24 GBA5000$0.00026$0.00016
24 GB Pro4090$0.00044$0.00026
48 GBA6000$0.00048$0.00029
80 GBA100$0.0013$0.00078
New vs Old Price (only Flex)
GPU Size GPU Type Old New
16 GBA4000$0.00024$0.0002
24 GBA5000$0.00030$0.00026
24 GB Pro4090$0.00050$0.00044
48 GBA6000$0.00055$0.00048
80 GBA100$0.00140$0.0013

This change to our Serverless worker pricing (including the transition to Active and Flex workers) will go live towards the end of this month. Please reach out to us for any inquiries about Serverless at help@runpod.io.

Update:

The 40% discount on Active Workers is now live. Enjoy!

Author profile: Pardeep Singh

Related articles

View All
Deploy When Available is now GA

Deploy When Available is now GA

Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.

All
The Chips Got Faster. The Stack Didn't.

The Chips Got Faster. The Stack Didn't.

Explore why faster chips have shifted the bottleneck to AI infrastructure, and what that means for teams running production workloads.

All

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.