Brendan McKeag

Brendan McKeag

25 September 2024

Run GGUF Quantized Models Easily with KoboldCPP on Runpod

Lower VRAM usage and improve inference speed using GGUF quantized models in KoboldCPP with just a few environment variables.

Read article

AI Workloads

Brendan McKeag

13 September 2024

Evaluate Multiple LLMs Simultaneously Using Ollama on Runpod

Use Ollama to compare multiple LLMs side-by-side on a single GPU pod—perfect for fast, realistic model evaluation with shared prompts.

Read article

AI Workloads

Brendan McKeag

15 August 2024

Supercharge Your LLMs with SGLang: Boost Performance and Customization

Discover how to boost your LLM inference performance and customize responses using SGLang, an innovative framework for structured LLM workflows.

Read article

AI Workloads

Brendan McKeag

25 July 2024

Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs

Learn how to optimize your serverless GPU deployment on Runpod to balance latency, performance, and cost. From active and flex workers to Flashboot and scaling strategy, this guide helps you build an efficient AI backend that won’t break the bank.

Read article

AI Infrastructure

Brendan McKeag

06 June 2024

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)

Runpod Serverless now supports multi-GPU workers, enabling full-precision deployment of large models like Llama-3 70B. With optimized VLLM support, flashboot, and network volumes, it's never been easier to run massive LLMs at scale.

Read article

Product Updates

Brendan McKeag

28 May 2024

Announcing Runpod's New Serverless CPU Feature

Runpod introduces Serverless CPU: high-performance VM containers with customizable CPU options, ideal for cost-effective and versatile workloads not requiring GPUs.

Read article

Product Updates

Brendan McKeag

15 April 2024

Configurable Endpoints for Deploying Large Language Models

Deploy any Hugging Face large language model using Runpod’s configurable templates. Customize your endpoint with ease and launch scalable LLM deployments in just a few clicks.

Read article

Product Updates

Run GGUF Quantized Models Easily with KoboldCPP on Runpod

Evaluate Multiple LLMs Simultaneously Using Ollama on Runpod

Supercharge Your LLMs with SGLang: Boost Performance and Customization

Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs

Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!)

Announcing Runpod's New Serverless CPU Feature

Configurable Endpoints for Deploying Large Language Models

Build what’s next.

You’ve unlocked areferral bonus!

You’ve unlocked a
referral bonus!