Announcing Runpod Flash

Runpod Blog.

Our team’s insights on building better
and scaling smarter.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
RAG vs. Fine-Tuning: Which Is Best for Your LLM?
Shaamil Karim
July 11, 2024

RAG vs. Fine-Tuning: Which Is Best for Your LLM?

Retrieval-Augmented Generation (RAG) and fine-tuning are powerful ways to adapt large language models. Learn the key differences, trade-offs, and when to use each.

AI Workloads
All
How to Benchmark Local LLM Inference for Speed and Cost Efficiency
Jonmichael Hands
July 4, 2024

How to Benchmark Local LLM Inference for Speed and Cost Efficiency

Explore how to deploy and benchmark LLMs locally using tools like Ollama and NVIDIA NIMs. This deep dive covers performance, cost, and scaling insights across GPUs including RTX 4090 and H100 NVL.

AI Workloads
All
Benchmarking LLMs: A Deep Dive into Local Deployment & Optimization
Jonmichael Hands
July 4, 2024

Benchmarking LLMs: A Deep Dive into Local Deployment & Optimization

Curious how local LLM deployment stacks up? This post explores benchmarking strategies, optimization tips, and what DevOps teams need to know about performance tuning.

AI Infrastructure
All
AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference
Marut Pandya
July 1, 2024

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

Runpod benchmarks AMD’s MI300X against Nvidia’s H100 SXM using Mistral’s Mixtral 8x7B model. The results highlight performance and cost trade-offs across batch sizes, showing where AMD’s larger VRAM shines.

All
Partnering with Defined AI to Bridge the Data Wealth Gap
Shaamil Karim
June 17, 2024

Partnering with Defined AI to Bridge the Data Wealth Gap

Runpod and Defined.ai launch a pilot program to provide startups with access to high-quality training data and compute, enabling sector-specific fine-tuning and closing the data wealth gap.

Product Updates
All
Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)
Brendan McKeag
June 6, 2024

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)

Runpod Serverless now supports multi-GPU workers, enabling full-precision deployment of large models like Llama-3 70B. With optimized VLLM support, flashboot, and network volumes, it's never been easier to run massive LLMs at scale.

Product Updates
All
Poddy mascot displayed as a retro TV with static, indicating no results found
We couldn't find anything. Try a different search.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.