Blog

Introducing Clusters: On-Demand Multi-Node AI Compute

Runpod's Clusters let you spin up multi-node GPU environments instantly, ideal for scaling LLM training or distributed inference workloads without config.

Introducing Clusters: On-Demand Multi-Node AI Compute

Until now, Runpod users could generally scale up to 8 GPUs in a single pod. For most use cases—like running inference on Llama 70B or fine-tuning FLUX—that was plenty. But some workloads need more compute than a single server. They need to scale across multiple machines.

Today, we’re excited to launch Clusters: a fast, on-demand way to deploy networked multi-node GPU clusters on Runpod’s platform.

With Clusters, your GPUs aren’t limited to a single node anymore. You can now connect up to 8 nodes for up to 64 H100s, with high-speed interconnects that enable private node-to-node communication right out of the box. No delays talking to sales, waiting for integration, or long commits - launch large GPU clusters instantly.

Why This Matters

The rise of large-scale models like DeepSeek R1 (720B parameters) and LLaMA 405B (405B parameters) is pushing infrastructure to its limits. Even with 8x H100s and 640 GB of VRAM, you're nowhere near the 1600 GB+ needed to run these models efficiently.

To meet these demands, you need more than just powerful GPUs—you need infrastructure that can scale across machines. Clusters make that possible.

Here are just a few things Clusters make possible:

Inference on massive models with 720B+ parameters
Fine-tuning foundational models like LLaMA 405B
Training smaller foundational models from scratch (250M to 7B)
Accelerating simulations and research in fields like computational biology, physics, and finance

With support for 16 to 64 GPUs, and no long-term contracts, Clusters give researchers and engineers the flexibility they’ve been waiting for.

How It Works

With Clusters, you can spin up multi-node GPU clusters in minutes—no bare metal setup, no SSH juggling. Once your cluster is live, you can run distributed jobs using the frameworks you already know and love, like Slurm, Ray, or PyTorch’s torchrun utility.

Here’s how it looks in practice:

Example: Multi-Node Job with Pytorch

Ensure main.py exists for every node, then run the following command on all nodes‍

This example assumes:

You’re using PyTorch with torchrun for multi-node orchestration
main.py is your training script (e.g., training something like Mistral-7B)
Each node in your cluster runs this command with different values for node_rank, etc.

You can find more detailed implementation of these examples in our Documentation — but this shows how simple it is to get started with multi-node training on Runpod.

Clusters vs. Bare Metal: What’s the Difference?

Some teams turn to long-term bare metal contracts when they need full system access or specialized configurations. That makes sense for many production environments—but it comes with tradeoffs: setup time, long-term commitments, and more manual overhead.

Clusters offer a different approach:

Deploy clusters in minutes, not days
Pay only for what you use, billed by the second
Manage your cluster through Runpod’s intuitive UI, with templates, billing insights, and team-level controls

Technical Details

GPU Type: NVIDIA H100 (more GPU types upcoming)
Cluster Size: 16 to 64 GPUs (2 to 8 nodes)
Containerized: Runs on Docker
Interconnect: High-performance networking
End to end onboarding time: Just a few minutes

Ideal For

ML Engineers fine-tuning large models
Research labs training from scratch
Startups iterating quickly with flexible infrastructure
Open-source projects needing temporary access to high-end hardware

Clusters bring the power of a full-scale training cluster to anyone—with no commitments, no setup headaches, and no overpriced contracts.

For full system-level access, Bare Metal is still your go-to. But for fast, flexible scaling without any commitments or contracts, Clusters are a game changer.

Try It Now

Ready to deploy your first Instant Cluster? Head to your Runpod console and choose "Clusters" to get started.

‍

Questions? Feedback? Reach out at clusters@runpod.io or join us on Discord.

Clusters are now live. Let the multi-node era begin.

‍

What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

Whether you're already running production endpoints on Runpod or you're sizing us up for the first time, here's a plain-language tour of what Runpod Serverless does today, why it's faster and cheaper than it was six months ago, and how to deploy your first endpoint in minutes.

Beyond the Notebook: The Engineering Realities of Production AI Agents

Shift from stateless inference to stateful architectures to resolve infrastructure bottlenecks like memory management, concurrency limits, and runaway jobs in production AI agents.

One Million Developers on Runpod, and the Cloud We’re Building Next

We raised a $100 million Series A. Here's what it means for you.

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started