import requests
# Test the API
def test_api():
url = "http://localhost:8000/v1/chat/completions"
payload = {
"model": "kimi-k2",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}
response = requests.post(url, json=payload)
if response.status_code == 200:
print("✅ API working!")
print(response.json())
else:
print(f"❌ Error: {response.status_code}")
# Run after server starts
test_api()
Open AI Implementation
from openai import OpenAI
client = OpenAI(
api_key="EMPTY", # vLLM doesn’t require a real API key
base_url="http://localhost:8000/v1" # Your vLLM server URL
)
def simple_chat(client: OpenAI, model_name: str, messages):
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
temperature=0.6,
max_tokens=524
)
print(response.choices[0].message.content)
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": [{"type": "text", "text": "What is Runpod."}]},
]
simple_chat(client, "kimi-k2", messages)
Expected Output:
RunPod is a cloud-computing platform designed specifically for **AI and machine-learning workloads**. It provides on-demand access to **GPU instances** (like NVIDIA A100s, H100s, RTX 4090s, etc.) at competitive prices, making it popular for training, fine-tuning, and deploying AI models without the hassle of managing physical hardware.
### **Key Features of RunPod:**
1. **GPU Rental** – Rent high-end GPUs by the hour or second, with options for **secure cloud** or **community GPUs** (cheaper, but shared).
2. **Serverless GPUs** – Deploy AI models as **serverless endpoints**, scaling automatically when needed.
3. **Prebuilt Templates** – One-click deployments for popular AI frameworks like **PyTorch**, **TensorFlow**, **Stable Diffusion**, and **LLMs**.
4. **Persistent Storage** – Attach scalable storage volumes for datasets and models.
5. **Global Availability** – Data centers in **US, EU, and Asia** for low-latency access.
6. **Pay-as-you-go** – No long-term contracts; pay only for what you use.
### **Who Uses RunPod?**
- **AI Researchers & Startups** – Train/fine-tune models without high upfront costs.
- **Developers** – Deploy **LLMs**, **image generation APIs**, or **inference endpoints**.
- **Students & Hobbyists** – Affordable access to GPUs for learning and experimentation.
- **Enterprises** – Spin up temporary GPU clusters for heavy workloads.
### **RunPod vs. Competitors**
- **vs. AWS/GCP/Azure**: Cheaper GPU pricing, simpler setup.
- **vs. Lambda Labs**: More flexible GPU choices and serverless options.
- **vs. Colab Pro**: More powerful GPUs and persistent environments.
### **Pricing Example (as of 2024)**
- **RTX 4090**: ~$0.44/hr
- **A100 80GB**: ~$1.89/hr
- **H100 80GB**: ~$2.79/hr
RunPod is **ideal** for anyone needing **GPU compute on demand** without long-term commitments.
Known Issues
Currently as of July 21st, vllm library is not up to date so need to build from nightly builds.
uv environment on a Network Volume is slow to initialize ray, recommend any python environments be ran on the machine itself instead of on the Network Volume.