Cloud GPUs

Rent NVIDIA A100 GPUs from $1.39/hr

Name: A100
Brand: NVIDIA

High-performance data center GPU based on Ampere architecture with up to 80 GB HBM2e memory and third-generation Tensor Cores, purpose-built for large-scale AI training, LLM fine-tuning, and multi-tenant inference workloads.

Get started

Powering the next generation of AI & high-performance computing.

Engineered for large-scale AI training, deep learning, and high-performance workloads, delivering unprecedented compute power and efficiency.

NVIDIA Ampere Architecture

Advanced AI acceleration with third-generation Tensor Cores delivering up to 312 TFLOPS TF32 performance.

Third-Generation Tensor Cores

Enhanced AI acceleration with 432 Tensor Cores supporting TF32, BF16, FP16, and INT8 precision modes.

80 GB HBM2e Memory

High-capacity memory with up to 2 TB/s bandwidth enables training and inference on large-scale AI models without CPU offloading.

Multi-Instance GPU (MIG)

Partition a single A100 into up to 7 isolated compute instances, ideal for multi-tenant inference and cost-efficient serving of multiple models.

Why rent the A100 instead of buying?

A proven workhorse for AI training and inference

The A100's Ampere architecture delivers up to 312 TFLOPS for AI operations, significantly faster than its predecessor, the V100. With MIG support for up to 7 isolated instances, it handles multi-tenant inference workloads at scale and large-model training runs equally well. The 80 GB variant holds the weights of today's largest open-source models without CPU offloading.

Pay only for what you use

A100 hardware costs $10,000-$20,000 per card depending on variant and availability. Runpod's on-demand pricing gives you access to the same hardware without a capital commitment. There’s no upfront cost, no maintenance, no idle hardware.

Deploy in seconds, scale without limits

Provision an A100 instance in seconds. Scale from a single GPU for development and fine-tuning to a multi-GPU cluster for full pre-training runs. When the job is done, scale back down or switch GPUs entirely. Runpod handles the infrastructure so you stay focused on your model.

Performance

Key specs at a glance.

Performance benchmarks that push AI, ML, and HPC workloads further.

Memory Bandwidth

2.0

TB/s

FP16 Tensor Performance

624

TFLOPS

NVLink Bandwidth

600

GB/s

Get started

Use Cases

Popular use cases.

Designed for demanding workloads —learn if this GPU fits your needs.

Inference

Serve inference for image, text, and audio generation at any scale.

Fine-tuning

Train custom models on your specific datasets.

Agents

Build intelligent agent-based systems and workflows.

Compute-heavy tasks

Run compute-heavy workloads like rendering and simulations.

Technical Specs

Ready for your most demanding workloads.

Essential technical specifications to help you choose the right GPU for your workload.

Specification

Details

Great for...

Memory Bandwidth

2.0 TB/s

Large model inference, LLM serving, high-throughput batch inference

FP16 Tensor Performance

624 TFLOPS

Model training, fine-tuning, compute-intensive deep learning workloads

NVLink Bandwidth

600 GB/s

Multi-GPU distributed training, large model parallelism (70B+ models), tensor parallelism across GPUs

Specification	Details	Great for...
Architecture	NVIDIA Ampere (GA100)	Large-scale AI training, inference, and HPC workloads requiring MIG support and NVLink scalability
Manufacturing Process	7nm, 54.2 billion transistors	—
CUDA Cores	6,912	Parallel workloads including large-batch inference, scientific simulation, and model training
Tensor Cores	432 (3rd generation)	Mixed-precision training with TF32, BF16, FP16, and INT8 support for LLMs and vision models
GPU Memory	40 GB or 80 GB HBM2e	The 80 GB variant handles large-scale NLP and scientific simulations; 40 GB is cost-effective for standard training and inference
TDP	PCIe: 250–300W / SXM4: 400W	Predictable power budgeting across both form factors
FP64 Performance	9.7 TFLOPS	Scientific computing, HPC simulations, and high-precision workloads
FP32 Performance	19.5 TFLOPS	Standard-precision AI training and inference
TF32 Tensor Core	156 TFLOPS (312 with sparsity)	Accelerated training with near-FP32 accuracy at roughly 2× the standard throughput
Multi-Instance GPU (MIG)	Up to 7 isolated instances	Multi-tenant inference and cost-efficient serving of multiple models from a single GPU
Structural Sparsity	Up to 2× speedup	Production inference with pruned or sparsified models for higher throughput

"The Runpod team has clearly prioritized the developer experience to create an elegant solution that enables individuals to rapidly develop custom AI apps or integrations while also paving the way for organizations to truly deliver on the promise of AI."

Amjad Masad

"Runpod is the only place I can deploy high-end GPU models instantly—no sales calls, no rate limits, no nonsense."

Daniel Chang

“The main value proposition for us was the flexibility Runpod offered. We were able to scale up effortlessly to meet the demand at launch.”

Josh Payne

“Runpod helped us scale the part of our platform that drives creation. That’s what fuels the rest—image generation, sharing, remixing. It starts with training.”

Matty Shimura

Comparison

Powerful GPUs. Globally available. Reliability you can trust.

30+ GPUs, 31 regions, instant scale. Fine-tune or go full Skynet—we’ve got you.

Community Cloud

$1.19/hr

Secure Cloud

$1.39/hr

Unique GPU Models

Community Cloud

Secure Cloud

Global Regions

Community Cloud

Secure Cloud

Network Storage

Community Cloud

Secure Cloud

Enterprise-Grade Reliability

Community Cloud

Secure Cloud

Savings Plans

Community Cloud

Secure Cloud

24/7 Support

Community Cloud

Secure Cloud

Delightful Dev Experience

Community Cloud

Secure Cloud

FAQs

Questions? Answers.

What are the current hourly rates for renting an A100 on Runpod?

‍
For current A100 rental rates including on-demand and reserved options for both the PCIe and the SXM variants, refer to the Runpod pricing page.

How does the A100 compare to the H100 for AI workloads?

‍
The A100 delivers strong performance for training and inference, with 312 TFLOPS TF32 and up to 2 TB/s memory bandwidth. The H100 surpasses it in raw throughput, particularly for large-scale LLM training, but the A100 offers a compelling cost-performance ratio for fine-tuning, inference, and development workloads. For a detailed comparison, see our GPU benchmarks.

Can I run multiple workloads on a single A100 using MIG?

‍
You can run multiple workloads on a single A100 using MIG. The A100's Multi-Instance GPU (MIG) feature partitions a single GPU into up to 7 isolated instances, each with dedicated memory, compute cores, and cache, ideal for multi-tenant environments or serving multiple models without interference.

What's the difference between the A100 PCIe and A100 SXM?

‍

The PCIe connects to the system via standard PCIe lanes and is the more cost-efficient option for most training and inference workloads. The SXM uses a direct socket mount with significantly higher memory bandwidth (2.0 TB/s vs 1.6 TB/s) and NVLink bandwidth, making it better suited for multi-GPU distributed training, large model parallelism, and memory-intensive workloads where inter-GPU communication is a bottleneck. For current pricing on both variants, see the Runpod pricing page.

Is the A100 good for both training and inference?

‍
The A100 is ideal for both training and inference. For inference, it handles high-throughput workloads efficiently, and MIG allows concurrent serving of multiple models from a single GPU. For training, it scales across multiple GPUs via NVLink and is fully compatible with PyTorch, TensorFlow, and JAX.

‍

10,100,100,100

Requests since launch & 400k developers worldwide

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started

Rent NVIDIA A100 GPUs from $1.39/hr

Powering the next generation of AI & high-performance computing.

NVIDIA Ampere Architecture

Third-Generation Tensor Cores

80 GB HBM2e Memory

Multi-Instance GPU (MIG)

Why rent the A100 instead of buying?

A proven workhorse for AI training and inference

Pay only for what you use

Deploy in seconds, scale without limits

Key specs at a glance.

Popular use cases.

Inference

Fine-tuning

Agents

Compute-heavy tasks

Ready for your most demanding workloads.

Powerful GPUs. Globally available. Reliability you can trust.

Questions? Answers.

What are the current hourly rates for renting an A100 on Runpod?

How does the A100 compare to the H100 for AI workloads?

Can I run multiple workloads on a single A100 using MIG?

What's the difference between the A100 PCIe and A100 SXM?

Is the A100 good for both training and inference?

Build what’s next.

Ready for your most demanding workloads.

Powerful GPUs. Globally available. Reliability you can trust.