
Why NVidia's Llama 3.1 Nemotron 70B Might Be the Most Reasonable LLM Yet
NVidia’s Llama 3.1 Nemotron 70B is outperforming larger and closed models on key reasoning tasks. In this post, Brendan tests it against a long-unsolved challenge: consistent, in-character roleplay with zero internal monologue or user coercion—and finds it finally up to the task.
AI Workloads

Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs
Learn how to optimize your serverless GPU deployment on Runpod to balance latency, performance, and cost. From active and flex workers to Flashboot and scaling strategy, this guide helps you build an efficient AI backend that won’t break the bank.
AI Infrastructure

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)
Runpod Serverless now supports multi-GPU workers, enabling full-precision deployment of large models like Llama-3 70B. With optimized VLLM support, flashboot, and network volumes, it's never been easier to run massive LLMs at scale.
Product Updates