Blog

How to Run MoonshotAI’s Kimi-K2-Instruct on Runpod Instant Cluster

Run MoonshotAI's Kimi-K2-Instruct on Runpod Clusters using H200 SXM GPUs and a 2TB shared network volume for seamless multi-node training. This guide.

How to Run MoonshotAI’s Kimi-K2-Instruct on Runpod Instant Cluster

1. Create Network Storage (2TB), Use the CA-MTL-4 region (recommended for now).

2. Spin Up a Pod using Runpod official Pytorch template and mount the network volume you just created, once the pod is running, connect to Jupyter Lab

3. Download the Model

In the Jupyter terminal, install the Hugging Face CLI and download the model:

‍4. Launch the Instant Cluster

Choose Standard Cluster
Select the network volume you created
Set Pod Count to 2 (default is 2)
Pick H200 SXM GPU type
Select Runpod Pytorch official template
Click Edit Template and set container disk size to be 2000 GB
Click Deploy Cluster to launch it.

‍

1. installation on Node 0 with a shared volume

2. Node 1 Instructions

3. You should see the following:

‍‍4. Run on node with ip as host.

Runpod console Clusters view showing two 8x H200 pods with GPU utilization and memory metrics

Jupyter notebook cell testing the Kimi K2 chat completions API with a successful response

Open AI Implementation

Expected Output:

`‍`Known Issues

Currently as of July 21st, vllm library is not up to date so need to build from nightly builds.

https://github.com/MoonshotAI/Kimi-K2/issues/19

Gotchas

uv environment on a Network Volume is slow to initialize ray, recommend any python environments be ran on the machine itself instead of on the Network Volume.

What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

Whether you're already running production endpoints on Runpod or you're sizing us up for the first time, here's a plain-language tour of what Runpod Serverless does today, why it's faster and cheaper than it was six months ago, and how to deploy your first endpoint in minutes.

Beyond the Notebook: The Engineering Realities of Production AI Agents

Shift from stateless inference to stateful architectures to resolve infrastructure bottlenecks like memory management, concurrency limits, and runaway jobs in production AI agents.

One Million Developers on Runpod, and the Cloud We’re Building Next

We raised a $100 million Series A. Here's what it means for you.

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started

How to Run MoonshotAI’s Kimi-K2-Instruct on Runpod Instant Cluster

Open AI Implementation

Expected Output:

`‍`Known Issues

Gotchas

Related posts

Related articles

What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

Beyond the Notebook: The Engineering Realities of Production AI Agents

One Million Developers on Runpod, and the Cloud We’re Building Next

Build what’s next.

How to Run MoonshotAI’s Kimi-K2-Instruct on Runpod Instant Cluster

Open AI Implementation

Expected Output:

‍Known Issues

Gotchas

Related posts

Related articles

What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

Beyond the Notebook: The Engineering Realities of Production AI Agents

One Million Developers on Runpod, and the Cloud We’re Building Next

Build what’s next.

`‍`Known Issues