Announcing Runpod Flash

Runpod Blog.

Our team’s insights on building better
and scaling smarter.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Run GGUF Quantized Models Easily with KoboldCPP on Runpod
Brendan McKeag
September 25, 2024

Run GGUF Quantized Models Easily with KoboldCPP on Runpod

Lower VRAM usage and improve inference speed using GGUF quantized models in KoboldCPP with just a few environment variables.

AI Workloads
All
How to Work with GGUF Quantizations in KoboldCPP
Brendan McKeag
September 25, 2024

How to Work with GGUF Quantizations in KoboldCPP

GGUF quantizations make large language models faster and more efficient. This guide walks you through using KoboldCPP to load, run, and manage quantized LLMs on Runpod.

Learn AI
All
Introducing Better Forge: Spin Up Stable Diffusion Pods Faster
Brendan McKeag
September 20, 2024

Introducing Better Forge: Spin Up Stable Diffusion Pods Faster

Better Forge is a new Runpod template that lets you launch Stable Diffusion pods in less time and with less hassle. Here's how it improves your workflow.

AI Infrastructure
All
Run Very Large LLMs Securely with Runpod Serverless
Brendan McKeag
September 18, 2024

Run Very Large LLMs Securely with Runpod Serverless

Deploy large language models like LLaMA or Mixtral on Runpod Serverless with strong privacy controls and no infrastructure headaches. Here’s how.

AI Infrastructure
All
Evaluate Multiple LLMs Simultaneously Using Ollama on Runpod
Brendan McKeag
September 13, 2024

Evaluate Multiple LLMs Simultaneously Using Ollama on Runpod

Use Ollama to compare multiple LLMs side-by-side on a single GPU pod—perfect for fast, realistic model evaluation with shared prompts.

AI Workloads
All
Boost vLLM Performance on Runpod with GuideLLM
Marut Pandya
September 10, 2024

Boost vLLM Performance on Runpod with GuideLLM

Learn how to use GuideLLM to simulate real-world inference loads, fine-tune performance, and optimize cost for vLLM deployments on Runpod.

AI Workloads
All
Deploy Google Gemma 7B with vLLM on Runpod Serverless
Shaamil Karim
August 22, 2024

Deploy Google Gemma 7B with vLLM on Runpod Serverless

Deploy Google’s Gemma 7B model using vLLM on Runpod Serverless in just minutes. Learn how to optimize for speed, scalability, and cost-effective AI inference.

AI Workloads
All
Poddy mascot displayed as a retro TV with static, indicating no results found
We couldn't find anything. Try a different search.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.