

30-second GPU deploys
Spin up any GPU instance in under 30 seconds — no provisioning queues, no sales calls, no wait.


31 global regions
Rent GPU instances in 31 regions across the US, Europe, Asia, and Australia. Deploy where your users are, not where inventory is.


Per-second GPU billing
GPU rental billed by the second. No egress fees, no minimums, no surprises. Run a job for 3 minutes — pay for 3 minutes.

Use Cases
GPU cloud instances for every workload

LLM Inference
Rent H100 or L40S instances for low-latency inference. Deploy vLLM, TGI, or a custom container in seconds.

Model Training & Fine-Tuning
A100 and H100 SXM GPU rentals built for long training runs. Persistent storage, multi-GPU support, no babysitting required.

AI Agents & Automation
On-demand GPU instances that spin up fast and scale with your workload. No cold starts, no idle costs.

Image & Video Generation
RTX 4090 and RTX A6000 rentals for image gen, video synthesis, and diffusion workflows.
"The Runpod team has clearly prioritized the developer experience to create an elegant solution that enables individuals to rapidly develop custom AI apps or integrations while also paving the way for organizations to truly deliver on the promise of AI."
Amjad Masad
"Runpod is the only place I can deploy high-end GPU models instantly—no sales calls, no rate limits, no nonsense."
Daniel Chang
“The main value proposition for us was the flexibility Runpod offered. We were able to scale up effortlessly to meet the demand at launch.”
Josh Payne
“Runpod helped us scale the part of our platform that drives creation. That’s what fuels the rest—image generation, sharing, remixing. It starts with training.”
Matty Shimura
Storage Pricing
Persistent storage for your GPU cloud instances. No ingress or egress fees.
No fees for ingress/egress. Persistent and temporary storage available.
FAQs
Questions? Answers.
Curious about unlocking GPU power in the cloud? Get clear answers to accelerate your projects with on-demand high-performance compute.
GPU Pods are dedicated GPU instances you can spin up on Runpod. Unlike abstracted serverless GPUs, Pods give you full control over the underlying VM, drivers, and environment. You get a persistent instance (or ephemeral, if you prefer) with direct access to powerful GPUs, letting you run training, inference, or other workloads exactly how you want.
We offer 30+ GPU models, from entry-level inference cards to top-tier training accelerators. Examples include A100, H100, RTX 6000 Ada, L4/L40 series, and many more—over 30 options in total. You can pick any supported GPU when you launch a Pod, and new models roll out as soon as they’re live on the platform. For the latest availability, check the dashboard or query the API.
Pricing is shown as an hourly rate but billed by the millisecond. You only pay for the exact time your Pod runs—if you start and stop a Pod in one minute, you’re charged just that minute. Storage volumes may incur minimal fees when attached, but compute costs are metered by the millisecond.
Yes. GPU Pods support custom Docker images. You can build an image with your preferred libraries and push it to a registry (Docker Hub, ECR, etc.), then reference it when you launch the Pod. That way you control the OS, drivers, and dependencies.
Any framework that runs on Linux and supports GPUs: PyTorch, TensorFlow, JAX, ONNX, CUDA toolkits, etc. Since you control the container, you can install whatever versions or additional tools you need (e.g., NCCL, Horovod). We provide base images with common ML stacks to speed up setup.
We offer spot instances where GPU capacity is available at a discount, but with the risk of eviction when demand spikes. You can use them for fault-tolerant or batch workloads. The UI/API will indicate current spot availability and pricing.
Sign up, go to [Pods → Deploy], select your GPU model (H100, A100, RTX 4090, and 30+ others), choose a region, attach a container image, and click Deploy. Your GPU instance is live in under 30 seconds. Billing starts the moment it's running and stops the moment you terminate it — billed by the second, never by the hour.
RTX A5000s start at $0.27/hr and L4s at $0.39/hr — both solid choices for inference, testing, and development. If you need more VRAM, A40s are $0.44/hr and RTX 3090s are $0.46/hr. All GPU rentals are billed by the second, so a short job costs a fraction of an hour's rate.
Yes. H100 PCIe instances are available on-demand from $2.89/hr and A100 PCIe from $1.39/hr, with no contracts or minimums. If you need guaranteed capacity at lower rates, ask about our reserved pricing. Either way, you're billed by the second — not by the hour or day.
Clients
750,000 developers chose Runpod without a sales call.
Engineered for teams building the future.
