Pods
Thousands of GPUs across 30+ regions. Simple pricing plans for teams of all sizes,
designed to scale with you.
designed to scale with you.
GPU
32GB VRAM
Serverless
Cost effective for every inference
workload. Save 25% over other Serverless
cloud providers on flex workers alone.
workload. Save 25% over other Serverless
cloud providers on flex workers alone.
GPU
Flex
Workers that scale up during traffic spikes and return to idle after completing jobs. Cost-efficient and ideal for bursty workloads.
Active
Always-on workers that eliminate cold starts. Billed continuously but come with up to 30% discount.
8.64
/s
6.84
/s
180GB
B200
Maximum throughput for big models.
5.58
/s
4.46
/s
141GB
H200
Extreme throughput for big models.
4.18
/s
3.35
/s
80GB
H100
PRO
Extreme throughput for big models.
2.72
/s
2.17
/s
80GB
A100
High throughput GPU, yet still very cost-effective.
1.9
/s
1.33
/s
48GB
L40, L40S, 6000 Ada
PRO
Extreme inference throughput on LLMs like Llama 3 7B.
1.22
/s
0.85
/s
48GB
A6000, A40
A cost-effective option for running big models.
1.58
/s
1.11
/s
32GB
5090
PRO
Extreme throughput for small-to-medium models.
1.1
/s
0.77
/s
24GB
4090
PRO
Extreme throughput for small-to-medium models.
0.69
/s
0.48
/s
24GB
L4, A5000, 3090
Great for small-to-medium sized inference workloads.
0.58
/s
0.4
/s
16GB
A4000, A4500, RTX 4000, RTX 2000
The most cost-effective for small models.
Instant Clusters
Launch multi-GPU clusters in minutes with no commitments—scale up to 64 GPUs, attach shared storage, and pay only for what you use.
GPU
Per second
Per hour
H200 SXM
A100 SXM
H100 SXM
L40S
B200
Storage Pricing
Flexible, cost-effective storage for every workload.
No fees for ingress/egress. Persistent and temporary storage available.
Pod Pricing
Storage Type
Running Pods
Idle Pods
Volume
$0.10/GB/mo
$0.20/GB/mo
Container Disk
$0.10/GB/mo
NA
Persistent Network Storage
Storage Type
Under 1TB
Over 1TB
Network Volume
$0.07/GB/mo
$0.05/GB/mo

