Explore our credit programs for startups
You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500
You've unlocked a referral bonus!
Claim Your Bonus
Claim Bonus
Brendan McKeag

Brendan McKeag

DeepSeek V3.1: A Technical Analysis of Key Changes from V3-0324

DeepSeek V3.1 introduces a breakthrough hybrid reasoning architecture that dynamically toggles between fast inference and deep chain-of-thought logic using token-controlled templates—enhancing performance, flexibility, and hardware efficiency over its predecessor V3-0324. This update positions V3.1 as a powerful foundation for real-world AI applications, with benchmark gains across math, code, and agent tasks, now fully deployable on RunPod Instant Clusters.
Read article
AI Workloads

Wan 2.2 Releases With a Plethora Of New Features

Deploy Wan 2.2 on Runpod to unlock next-gen video generation with Mixture-of-Experts architecture, TI2V-5B support, and 83% more training data—run text-to-video and image-to-video models at scale using A100–H200 GPUs and customizable ComfyUI workflows.
Read article
AI Infrastructure

Deep Cogito Releases Suite of LLMs Trained with Iterative Policy Improvement

Deploy DeepCogito’s Cogito v2 models on Runpod to experience frontier-level reasoning at lower inference costs—choose from 70B to 671B parameter variants and leverage Runpod’s optimized templates and Instant Clusters for scalable, efficient AI deployment.
Read article
AI Infrastructure

Comparing the 5090 to the 4090 and B200: How Does It Stack Up?

Benchmark Qwen2.5-Coder-7B-Instruct across NVIDIA’s B200, RTX 5090, and 4090 to identify optimal GPUs for LLM inference—compare token throughput, cost per token, and memory efficiency to match your workload with the right performance tier.
Read article
Hardware & Trends

How to Run MoonshotAI’s Kimi-K2-Instruct on RunPod Instant Cluster

Run MoonshotAI’s Kimi-K2-Instruct on RunPod Instant Clusters using H200 SXM GPUs and a 2TB shared network volume for seamless multi-node training. This guide shows how to deploy with PyTorch templates, optimize Docker environments, and accelerate LLM inference with scalable, low-latency infrastructure.
Read article
AI Workloads

Iterative Refinement Chains with Small Language Models: Breaking the Monolithic Prompt Paradigm

As prompt complexity increases, large language models (LLMs) hit a “cognitive wall,” suffering up to 40% performance drops due to task interference and overload. By decomposing workflows into iterative refinement chains (e.g., the Self-Refine framework) and deploying each stage on serverless platforms like RunPod, you can maintain high accuracy, scalability, and cost efficiency.
Read article
AI Workloads

Running a 1-Trillion Parameter AI Model In a Single Pod: A Guide to MoonshotAI’s Kimi-K2 on Runpod

Moonshot AI’s Kimi-K2-Instruct is a trillion-parameter, mixture-of-experts open-source LLM optimized for autonomous agentic tasks—with 32 billion active parameters, Muon-trained performance rivaling proprietary models (89.5 % MMLU, 97.4 % MATH-500, 65.8 % pass@1), and the ability to run inference on as little as 1 TB of VRAM using 8-bit quantization.
Read article
AI Workloads

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

You’ve unlocked a
referral bonus!

Sign up today and you’ll get a random credit bonus between $5 and $500 when you spend your first $10 on Runpod.