We raised a Series A! Read a post from our CEO, Zhen Lu: 1M devs and the cloud we're building next.

How to Run MoonshotAI’s Kimi-K2-Instruct on Runpod Instant Cluster

Run MoonshotAI's Kimi-K2-Instruct on Runpod Clusters using H200 SXM GPUs and a 2TB shared network volume for seamless multi-node training. This guide.

How to Run MoonshotAI’s Kimi-K2-Instruct on Runpod Instant Cluster

1. Create Network Storage (2TB), Use the CA-MTL-4 region (recommended for now).

2. Spin Up a Pod using Runpod official Pytorch template and mount the network volume you just created, once the pod is running, connect to Jupyter Lab

3. Download the Model

  1. In the Jupyter terminal, install the Hugging Face CLI and download the model:


4. Launch the Instant Cluster

  • Choose Standard Cluster
  • Select the network volume you created
  • Set Pod Count to 2 (default is 2)
  • Pick H200 SXM GPU type
  • Select Runpod Pytorch official template
  • Click Edit Template and set container disk size to be 2000 GB
  • Click Deploy Cluster to launch it.

1. installation on Node 0 with a shared volume

2. Node 1 Instructions

3. You should see the following:


4. Run on node with ip as host.

Runpod console Clusters view showing two 8x H200 pods with GPU utilization and memory metrics
Jupyter notebook cell testing the Kimi K2 chat completions API with a successful response

Open AI Implementation

Expected Output:

Known Issues

Currently as of July 21st, vllm library is not up to date so need to build from nightly builds.

https://github.com/MoonshotAI/Kimi-K2/issues/19

Gotchas

uv environment on a Network Volume is slow to initialize ray, recommend any python environments be ran on the machine itself instead of on the Network Volume.

Author profile: Brendan McKeag

Related articles

View All
What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

Whether you're already running production endpoints on Runpod or you're sizing us up for the first time, here's a plain-language tour of what Runpod Serverless does today, why it's faster and cheaper than it was six months ago, and how to deploy your first endpoint in minutes.

All

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.