Hot starts, batch inference, and what's next for Runpod Serverless. Webinar June 25.

Rent NVIDIA RTX 4090 GPUs from $0.69/hr

High-end consumer GPU based on Ada Lovelace architecture with 24GB GDDR6X memory and 16,384 CUDA cores for AI workloads, machine learning, and image generation tasks.

RTX 4090

Powering the next generation of AI & high-performance computing.

Engineered for large-scale AI training, deep learning, and high-performance workloads, delivering unprecedented compute power and efficiency.

NVIDIA Ada Lovelace Architecture

Next-generation consumer architecture delivering exceptional AI performance with improved power efficiency and advanced compute capabilities.

Fourth-Generation Tensor Cores

Enhanced AI acceleration with 512 Tensor Cores providing significant performance gains for machine learning workloads.

24GB GDDR6X Memory

Massive memory capacity with 1,008GB/s bandwidth enables training and inference on large AI models.

Third-Generation RT Cores

Advanced ray tracing acceleration with 128 RT Cores ideal for AI rendering applications and computer vision tasks.

Why rent the RTX 4090 instead of buying?

Consumer price, professional capability

The RTX 4090 delivers 82.6 TFLOPS FP32 and 24 GB of GDDR6X — more raw compute than many data center cards from the previous generation, at a fraction of the cost of an H100. Runpod's on-demand pricing lets you access RTX 4090 instances from $0.34/hr, with no hardware purchase, no depreciation, and no idle costs between projects.

FP8 inference on Ada Lovelace

Ada Lovelace introduced native FP8 Tensor Core support, giving the 4090 up to 660.6 sparse TOPS for quantized inference workloads. That means production-speed inference on models up to ~13B parameters — at consumer GPU pricing. For teams running high-throughput inference rather than heavy training, the 4090 delivers exceptional value per dollar.

Deploy in seconds, scale without limits

Provision an RTX 4090 pod in seconds. Run multi-card configurations, switch GPU types, or shut everything down when a project wraps. Runpod handles power, cooling, and maintenance so you don't have to.

Key specs at a glance.

Performance benchmarks that push AI, ML, and HPC workloads further.

Memory Bandwidth

1008

GB/s

FP16 Tensor Performance

165.2

TFLOPS

PCIe Gen5 ×16 Bandwidth

63

GB/s

Popular use cases.

Designed for demanding workloads
—learn if this GPU fits your needs.

Inference workload illustration

Inference

Serve inference for image, text, and audio generation at any scale.

Fine-tuning workload illustration

Fine-tuning

Train custom models on
your specific datasets.

AI agents workload illustration

Agents

Build intelligent agent-based systems and workflows.

Compute-heavy workload illustration

Compute-heavy tasks

Run compute-heavy workloads like rendering and simulations.

Ready for your most
demanding workloads.

Essential technical specifications to help you choose the right GPU for your workload.

Specification
Details
Great for...
Memory Bandwidth
1008 GB/s
Feeding large image batches and high-resolution textures into VRAM without stalls for rendering, LLM inference, and real-time simulations.
FP16 Tensor Performance
165.2 TFLOPS
Speeding mixed-precision transformer training and inference, boosting token throughput in generative AI and deep learning workloads.
PCIe Gen5 ×16 Bandwidth
63 GB/s
Enabling high-speed GPU-to-GPU and host-to-device transfers when NVLink isn't available, ensuring smooth multi-GPU scaling for large models.
Specification Details Great for...
Architecture NVIDIA Ada Lovelace (AD102) Workloads requiring 4th-gen Tensor Cores, 3rd-gen RT Cores, and native FP8 support
Manufacturing Process TSMC 4N
Transistors 76.3 billion
Die Size 608 mm²
Form Factor FHFL, dual-slot PCIe Deploying in standard PCIe workstation and server slots
CUDA Cores 16,384 Parallelizing large AI training, rendering, and simulation workloads
Tensor Cores 512 (4th generation) Mixed-precision training and inference with TF32, BF16, FP16, FP8, and INT8 support
RT Cores 128 (3rd generation) Real-time ray tracing for rendering, VFX, and interactive visualization
GPU Memory 24 GB GDDR6X Running mid-size LLMs, large batch sizes, and high-resolution datasets without CPU offloading
Clock Speeds Base 2,235 / Boost 2,520 MHz Sustained high-frequency compute across long training and inference runs
Power Consumption ~450 W TDP High-throughput workloads where absolute performance outweighs power efficiency
FP64 Performance ~1.3 TFLOPS
FP32 Performance 82.6 TFLOPS Standard-precision training, simulation, and rendering compute
TF32 Tensor Core 82.6 TFLOPS (165.2 sparse) Accelerated training with near-FP32 accuracy at 2× the throughput
BF16 Tensor Core 165.2 TFLOPS (330.3 sparse) Large model training with the numeric stability of FP32
FP8 Tensor Core 330.3 TFLOPS (660.6 sparse) Maximum inference throughput with quantized models — Ada Lovelace native
INT8 Tensor Core 660.6 TOPS (1,321.2 sparse) Production inference at scale with quantized models
"The Runpod team has clearly prioritized the developer experience to create an elegant solution that enables individuals to rapidly develop custom AI apps or integrations while also paving the way for organizations to truly deliver on the promise of AI."

Amjad Masad

"Runpod is the only place I can deploy high-end GPU models instantly—no sales calls, no rate limits, no nonsense."

Daniel Chang

“The main value proposition for us was the flexibility Runpod offered. We were able to scale up effortlessly to meet the demand at launch.”

Josh Payne

“Runpod helped us scale the part of our platform that drives creation. That’s what fuels the rest—image generation, sharing, remixing. It starts with training.”

Matty Shimura

Powerful GPUs. Globally available.
Reliability you can trust.

30+ GPUs, 31 regions, instant scale. Fine-tune or go full Skynet—we’ve got you.

Community Cloud
$0.34/hr
Secure Cloud
$0.69/hr
Unique GPU Models
Community Cloud
25
Secure Cloud
19
Global Regions
Community Cloud
17
Secure Cloud
14
Network Storage
Community Cloud
Secure Cloud
Enterprise-Grade Reliability
Community Cloud
Secure Cloud
Savings Plans
Community Cloud
Secure Cloud
24/7 Support
Community Cloud
Secure Cloud
Delightful Dev Experience
Community Cloud
Secure Cloud

Questions? Answers.

What are the current rental rates for an RTX 4090 on Runpod?

Rates vary by instance type and availability. For the most current pricing, see the Runpod pricing page.

How is billing handled for RTX 4090 rentals?

Runpod bills by the second — you pay only for active compute time, with no minimum commitment. On-demand and spot instance pricing are both available. For a full breakdown of pricing options, see the Runpod pricing page.

How does the RTX 4090 perform for AI and deep learning?

The RTX 4090 delivers strong performance for AI training and inference: 16,384 CUDA cores, 24 GB GDDR6X, and 4th-generation Tensor Cores with native FP8 support. It excels at fine-tuning mid-size LLMs, running diffusion models, and rapid experimentation where iteration speed matters more than maximum VRAM. For context on how it compares to a data center GPU, see the RTX 4090 vs H100 SXM comparison.

Can I rent multiple RTX 4090s in a single instance?

Yes — Runpod supports multi-GPU pod configurations. Note that the RTX 4090 does not support NVLink, so GPUs in a multi-card setup do not share a unified memory pool; each card operates with its own 24 GB. Check real-time availability on the Runpod pricing page for current multi-GPU configurations.

How is data security handled on rented RTX 4090 instances?

Runpod implements isolated environments, data wiping between users, and encryption for data at rest and in transit. For compliance requirements (GDPR, HIPAA, SOC 2), see Runpod's security and compliance documentation and contact the team about Secure Cloud deployment options.

10,100,100,100

Requests since launch & 400k developers worldwide

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.