Announcing Runpod Flash

3,200 Gbps Infiniband GPU Clusters

Launch high-performance multi-node GPU clusters for AI, ML, LLMs, and HPC workloads—fully optimized, rapidly deployed, and cost-effective.

Clusters

Spin up instantly, pay as you go

  • Launch multi-GPU clusters in minutes
  • Scale up to 32 GPUs per cluster
  • No commitments - stop anytime
  • Pay only for what you use
  • Attach shared storage, run jobs, spin down

Scale up to 64 GPUs per cluster

Reserved Clusters

Dedicated capacity with guaranteed availability

  • 3mo+ month reservations
  • Scale up to 10,000+ GPUs
  • Dedicated support & onboarding
  • Guaranteed SLA for uptime
  • Custom configurations (RAM, storage, networking)
  • Discounted rates for longer commitments

Talk to our team about your requirements

Reserved Clusters: Dedicated capacity for large-scale workloads.

Built for scale, secured for trust, and designed to meet your most demanding needs.

Uptime guarantee

Run critical workloads with confidence, backed by industry-leading reliability.

Secure by default

Independently audited SOC 2 Type II compliance for end-to-end data protection.

Scale to thousands
of GPUs

Adapt instantly to demand with infrastructure that grows with you.

"The Runpod team has clearly prioritized the developer experience to create an elegant solution that enables individuals to rapidly develop custom AI apps or integrations while also paving the way for organizations to truly deliver on the promise of AI."

Amjad Masad

"Runpod is the only place I can deploy high-end GPU models instantly—no sales calls, no rate limits, no nonsense."

Daniel Chang

“The main value proposition for us was the flexibility Runpod offered. We were able to scale up effortlessly to meet the demand at launch.”

Josh Payne

“Runpod helped us scale the part of our platform that drives creation. That’s what fuels the rest—image generation, sharing, remixing. It starts with training.”

Matty Shimura

Trusted by today's leaders, built for tomorrow's pioneers.

Engineered for teams building the future.

Wix logo
Otovo logo
Scatter Lab logo
Abzu logo
Aneta logo
Perplexity logo
Replit logo
Civitai logo

Questions? Answers.

Serverless, simplified. Clear answers on running your code without the fuss.

A GPU pod is a single instance with one or more GPUs within the same node. A Cluster consists of multiple nodes interconnected with high-speed networking, allowing for workloads that span across multiple machines. Clusters are ideal for large model inference and distributed training that exceeds the capacity of a single node.

Anyone can access 2 nodes on-demand with up to 16 GPUs. To access larger clusters up to 8 nodes (64 GPUs), you'll need to request a spend limit increase.

Clusters are billed by the second, just like our regular GPU pods. You're only charged for the compute time you actually use, with no minimum commitments or upfront costs. When you're done with your work, simply terminate the cluster to stop billing.

Clusters deliver 1,600–3,200 Gbps east-west bandwidth via InfiniBand or RoCE v2, depending on configuration.

Yes. Each cluster’s east-west fabric is tenant-isolated. We enforce robust L2/L3 segmentation and RDMA fabric partitioning (e.g., InfiniBand P_Keys or RoCE v2 VLAN/VXLAN, depending on site), so there’s no routable path between tenants. Your 1.6–3.2 Tbps inter-node bandwidth is dedicated to your Runpod cluster—no cross-tenant visibility or traffic bleed.

Runpod offers native Network Storage integration where available, providing a shared filesystem layer that can be utilized across all nodes in your cluster. This is ideal for storing large models ranging from tens to hundreds of gigabytes close to your computing resources.

Yes, you can establish connections between your Runpod cluster and AWS environment through application layer mTLS, enabling secure bridging of workloads between platforms.

Currently, Clusters are not compatible with Kubernetes. The cluster environment is managed by Runpod's native orchestration system, eliminating the need for additional container orchestration tools or CNI configuration.

Yes, Clusters fully support Slurm for workload management.

For Clusters, there are no minimum lease terms. You have complete flexibility to deploy and terminate clusters as needed to support your workloads, with no long-term commitments or contract obligations.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.