
The Chips Got Faster. The Stack Didn't.
The bottleneck has moved.
Blog
Runpod's State of AI report pulls from real production data across 500,000+ developers to reveal what's actually running, not what people say they're.

Most AI reports are built on benchmarks, surveys, or press cycles. This one isn’t.
The Runpod State of AI report is based on the infrastructure powering more than 500,000 developers and companies globally.
Turning that volume of raw infrastructure exhaust into structured insight required something most AI infrastructure companies simply don’t have: a mature, purpose-built data foundation. We built internal pipelines to classify model usage at scale, ran LLM-based analysis across production logs, mapped workloads to GPU selection patterns, and used IP intelligence to understand geographic distribution.
This isn’t a survey of what people say they’re using. It’s a record of what’s actually running. And what’s running contradicts much of the public narrative.
More than 70% of image workflows on Runpod run through ComfyUI. At that level, ComfyUI isn’t just a leading tool, it’s infrastructure. The market has already decided that node-based, modular pipelines are the default for serious image generation.
For two years, Llama has dominated open-source conversation. Benchmarks, Twitter threads, conference slides, all centered on Meta’s ecosystem. But in production on our platform, Qwen is now the most deployed self-hosted LLM.
Even more striking: Llama 4 has near-zero adoption. Despite launch coverage and attention, the ecosystem hasn’t meaningfully migrated. The market is pragmatic. It optimizes for performance per dollar, latency, compatibility, and fine-tuning ecosystems. But we’ll be watching to see if the next Llama release pushes the frontier again and reshapes the landscape.
The public story around video AI focuses on generation: text-to-video breakthroughs, cinematic demos, model launches. Production behavior tells a different story: Upscaling workloads outnumber generation roughly two to one. Teams are not betting everything on a single expensive render. They generate fast, low-resolution drafts, select winners, and then allocate compute to enhancement. The model is “draft, then refine.”
The capital allocation pattern here matters: optimization is absorbing more GPU time than raw creation.
Zooming out, nearly two-thirds of Runpod users are in industries outside of AI. HealthTech and FinTech lead enterprise verticals. Usage covers every inhabitable continent. Hopper GPUs remain foundational, while Blackwell is scaling faster than any previous architecture.
Collectively, the data points to something bigger: AI has transitioned from experimental technology to global, production-grade infrastructure. And usage patterns are consolidating around performance, efficiency, and workflow control.
The full State of AI report goes deeper into these infrastructure patterns and includes forward-looking predictions grounded in the same production data.
If you want to understand where capital and compute are actually flowing, read the full report.
Author profile: Charlotte Daniels
Blog Posts

The bottleneck has moved.
.jpeg)
With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.
.jpeg)
How 1,100 researchers beat OpenAI's own baseline with 16 megabytes and 10 minutes.