Blog

Runpod Blog

Our team’s insights on building better and scaling smarter.
All
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Benchmarking LLMs: A Deep Dive into Local Deployment & Optimization

Curious how local LLM deployment stacks up? This post explores benchmarking strategies, optimization tips, and what DevOps teams need to know about performance tuning.
Read article
AI Infrastructure

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

Runpod benchmarks AMD’s MI300X against Nvidia’s H100 SXM using Mistral’s Mixtral 8x7B model. The results highlight performance and cost trade-offs across batch sizes, showing where AMD’s larger VRAM shines.
Read article
Hardware & Trends

AMD MI300X vs. NVIDIA H100: Mixtral 8x7B Inference Benchmark

We benchmarked AMD’s MI300X against NVIDIA’s H100 on Mixtral 8x7B. Discover which GPU delivers faster inference and better performance-per-dollar.
Read article
Hardware & Trends

Partnering with Defined AI to Bridge the Data Wealth Gap

Runpod and Defined.ai launch a pilot program to provide startups with access to high-quality training data and compute, enabling sector-specific fine-tuning and closing the data wealth gap.
Read article
Product Updates

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)

Runpod Serverless now supports multi-GPU workers, enabling full-precision deployment of large models like Llama-3 70B. With optimized VLLM support, flashboot, and network volumes, it's never been easier to run massive LLMs at scale.
Read article
Product Updates

Introduction to vLLM and PagedAttention

Learn how vLLM achieves up to 24x higher throughput than Hugging Face Transformers by using PagedAttention to eliminate memory waste, boost inference performance, and enable efficient GPU usage.
Read article
AI Workloads

How to Run vLLM on Runpod Serverless (Beginner-Friendly Guide)

Learn how to run vLLM on Runpod’s serverless GPU platform. This guide walks you through fast, efficient LLM inference without complex setup.
Read article
AI Infrastructure

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.