
Introduction to vLLM and PagedAttention
Learn how vLLM achieves up to 24x higher throughput than Hugging Face Transformers by using PagedAttention to eliminate memory waste, boost inference performance, and enable efficient GPU usage.
Blog
Our team’s insights on building better and scaling smarter.


Learn how vLLM achieves up to 24x higher throughput than Hugging Face Transformers by using PagedAttention to eliminate memory waste, boost inference performance, and enable efficient GPU usage.

Learn how to run vLLM on Runpod’s serverless GPU platform. This guide walks you through fast, efficient LLM inference without complex setup.

Our new Serverless CPU offering lets you launch high-performance containers without GPUs—perfect for lighter workloads, dev tasks, and automation.

Runpod introduces Serverless CPU: high-performance VM containers with customizable CPU options, ideal for cost-effective and versatile workloads not requiring GPUs.

Learn how to securely access your Runpod Pod using SSH with a username and password by configuring the SSH daemon and setting a root password.

Runpod has raised $20MM in a funding round led by Intel Capital and Dell Technologies Capital, fueling our mission to power AI/ML cloud computing and strengthen our team.

Runpod is sunsetting Managed AI APIs to focus on Serverless, empowering users with greater control, flexibility, and streamlined infrastructure for deploying AI workloads.
