Emmett Fear

May 16, 2025

How to Serve Phi-2 on a Cloud GPU with vLLM and FastAPI

Provides step-by-step instructions to serve the Phi-2 language model on a cloud GPU using vLLM and FastAPI. Covers setting up vLLM for efficient inference and deploying a FastAPI server to expose the model via a REST API.

Guides

May 16, 2025

How to Run OpenChat on a Cloud GPU Using Docker

Offers a guide on running the OpenChat model on a cloud GPU using Docker. Explains how to configure the Docker environment for OpenChat and deploy it for inference, so you can interact with the model without local installation.

Guides

May 17, 2025

How to Run StarCoder2 as a REST API in the Cloud

Shows how to deploy StarCoder2 as a REST API on a cloud GPU. Walks through containerizing the code-generation model and setting up an API service, enabling you to query the model remotely with GPU-accelerated performance.

Guides

May 1, 2025

Train Any AI Model Fast with PyTorch 2.1 + CUDA 11.8 on Runpod: The Ultimate Guide

Demonstrates how to train any AI model quickly using PyTorch 2.1 with CUDA 11.8 on Runpod. Covers preparing the environment and using Runpod’s GPUs to accelerate training, with tips for optimizing training speed in the cloud.

Guides

May 17, 2025

Using Ollama to Serve Quantized Models from a GPU Container

Shows how to use Ollama to serve quantized AI models from a GPU-accelerated Docker container. Details how model quantization improves efficiency and how to set up Ollama in the container for faster, lighter-weight inference.

Guides

April 26, 2025

LLM Training with Runpod GPU Pods: Scale Performance, Reduce Overhead

Describes how to scale large language model (LLM) training using Runpod GPU pods. Highlights performance tuning and cost optimization strategies to maximize training efficiency and reduce overhead in cloud environments.

Guides

April 16, 2025

Instant Clusters for AI Research: Deploy and Scale in Minutes

Highlights how Runpod’s Instant Clusters can accelerate AI research. Discusses deploying GPU clusters within minutes and how this capability allows rapid scaling for experiments and collaborative projects without lengthy setup.

Guides

May 2, 2025

Automate AI Image Workflows with ComfyUI + Flux on Runpod: Ultimate Creative Stack

Shows how to automate AI image generation workflows by integrating ComfyUI with Flux on Runpod. Details setting up an automated pipeline using cloud GPUs and workflow tools to streamline the creation of AI-generated art.

Guides

May 8, 2025

Finding the Best Docker Image for vLLM Inference on CUDA 12.4 GPUs

Guides you in choosing the optimal Docker image for vLLM inference on CUDA 12.4–compatible GPUs. Compares available images and configurations to ensure you select one that maximizes performance for serving large language models.

Guides

How to Serve Phi-2 on a Cloud GPU with vLLM and FastAPI

How to Run OpenChat on a Cloud GPU Using Docker

How to Run StarCoder2 as a REST API in the Cloud

Train Any AI Model Fast with PyTorch 2.1 + CUDA 11.8 on Runpod: The Ultimate Guide

Using Ollama to Serve Quantized Models from a GPU Container

LLM Training with Runpod GPU Pods: Scale Performance, Reduce Overhead

Instant Clusters for AI Research: Deploy and Scale in Minutes

Automate AI Image Workflows with ComfyUI + Flux on Runpod: Ultimate Creative Stack

Finding the Best Docker Image for vLLM Inference on CUDA 12.4 GPUs

Build what’s next.

You’ve unlocked areferral bonus!

You’ve unlocked a
referral bonus!