Blog

Runpod Serverless Pricing Update

Runpod introduces new Serverless pricing with Flex and Active worker types, offering better scalability and up to 40% lower costs for consistent workloads.

We have some good news! We're revamping Serverless pricing to improve our user experience for individuals, startups, and enterprises. The bad news is that if you haven't moved your cloud compute workloads to Runpod yet, that decision might keep you up at night!

With new price changes, we are introducing two different types of Serverless workers to tackle many different use cases. Each worker offers additional concurrency and can handle 1 request at a time or multiple based on your use case.

Flex Workers - These handle spikes in your workload and allow you to support higher throughputs without impacting your users. The sum of your Flex and Active workers represents the maximum throughput your Serverless endpoint can support. You can allow your endpoint to scale down to 0 by using only Flex workers.
Active Workers - These handle consistent workloads and run 24/7 at much lower costs. Minimum workers will be updated and labeled as Active workers.

Pricing Per Second

GPU Size	GPU Type	Flex	Active (-40%)
16 GB	A4000	$0.0002	$0.00012
24 GB	A5000	$0.00026	$0.00016
24 GB Pro	4090	$0.00044	$0.00026
48 GB	A6000	$0.00048	$0.00029
80 GB	A100	$0.0013	$0.00078

New vs Old Price (only Flex)

GPU Size	GPU Type	Old	New
16 GB	A4000	$0.00024	$0.0002
24 GB	A5000	$0.00030	$0.00026
24 GB Pro	4090	$0.00050	$0.00044
48 GB	A6000	$0.00055	$0.00048
80 GB	A100	$0.00140	$0.0013

This change to our Serverless worker pricing (including the transition to Active and Flex workers) will go live towards the end of this month. Please reach out to us for any inquiries about Serverless at help@runpod.io.

‍Update:

‍The 40% discount on Active Workers is now live. Enjoy!

Deploy When Available is now GA

Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.

The Chips Got Faster. The Stack Didn't.

Explore why faster chips have shifted the bottleneck to AI infrastructure, and what that means for teams running production workloads.

Multi-Instance GPUs on Runpod: Stop Paying for Compute You Don't Need

With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started

Runpod Serverless Pricing Update

Related posts

Related articles

Deploy When Available is now GA

The Chips Got Faster. The Stack Didn't.

Multi-Instance GPUs on Runpod: Stop Paying for Compute You Don't Need

Build what’s next.