Brendan McKeag

How to Work with GGUF Quantizations in KoboldCPP

Brendan McKeag

September 25, 2024

How to Work with GGUF Quantizations in KoboldCPP

GGUF quantizations make large language models faster and more efficient. This guide walks you through using KoboldCPP to load, run, and manage quantized LLMs on Runpod.

Learn AI

Introducing Better Forge: Spin Up Stable Diffusion Pods Faster

Brendan McKeag

September 20, 2024

Introducing Better Forge: Spin Up Stable Diffusion Pods Faster

Better Forge is a new Runpod template that lets you launch Stable Diffusion pods in less time and with less hassle. Here's how it improves your workflow.

AI Infrastructure

Run Very Large LLMs Securely with Runpod Serverless

Brendan McKeag

September 18, 2024

Run Very Large LLMs Securely with Runpod Serverless

Deploy large language models like LLaMA or Mixtral on Runpod Serverless with strong privacy controls and no infrastructure headaches. Here’s how.

AI Infrastructure

Evaluate Multiple LLMs Simultaneously Using Ollama on Runpod

Brendan McKeag

September 13, 2024

Evaluate Multiple LLMs Simultaneously Using Ollama on Runpod

Use Ollama to compare multiple LLMs side-by-side on a single GPU pod—perfect for fast, realistic model evaluation with shared prompts.

AI Workloads

Supercharge Your LLMs with SGLang: Boost Performance and Customization

Brendan McKeag

August 15, 2024

Supercharge Your LLMs with SGLang: Boost Performance and Customization

Discover how to boost your LLM inference performance and customize responses using SGLang, an innovative framework for structured LLM workflows.

AI Workloads

Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs

Brendan McKeag

July 25, 2024

Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs

Learn how to optimize your serverless GPU deployment on Runpod to balance latency, performance, and cost. From active and flex workers to Flashboot and scaling strategy, this guide helps you build an efficient AI backend that won’t break the bank.

AI Infrastructure

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)

Brendan McKeag

June 6, 2024

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)

Runpod Serverless now supports multi-GPU workers, enabling full-precision deployment of large models like Llama-3 70B. With optimized VLLM support, flashboot, and network volumes, it's never been easier to run massive LLMs at scale.

Product Updates

Poddy mascot displayed as a retro TV with static, indicating no results found

We couldn't find anything. Try a different search.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

Get started