Our team’s insights on building better and scaling smarter.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Alyssa Mazzina
15 April 2025
The RTX 5090 Is Here: Serve 65,000+ Tokens Per Second on RunPod
The new NVIDIA RTX 5090 is now live on RunPod. With blazing-fast inference speeds and large memory capacity, it’s ideal for real-time LLM workloads and AI scaling.
Learn how RunPod autoscaling helps teams cut costs and improve performance for both training and inference. Includes best practices and real-world efficiency gains.
GPUs still dominate AI training in 2025, but emerging hardware and hybrid infrastructure are reshaping what's possible. Here’s what GTC revealed—and what it means for you.
Llama 4 Scout and Maverick Are Here—How Do They Shape Up?
Meta’s Llama 4 models, Scout and Maverick, are the next evolution in open LLMs. This post explores their strengths, performance, and deployment on Runpod.
Built on RunPod: How Cogito Trained Models Toward ASI
San Francisco-based Deep Cogito used RunPod infrastructure to train Cogito v1, a high-performance open model family aiming at artificial superintelligence. Here’s how they did it.
Bare Metal vs. Instant Clusters: What’s Best for Your AI Workload?
Runpod now offers Instant Clusters alongside Bare Metal. This post compares the two deployment options and explains when to choose one over the other for your compute needs.