Observability, top-tier GPUs, and commitment-based savings.
Serverless Metrics page
Time-series charts for pXX latencies, queue delay, throughput, and worker states for faster debugging and tuning.
H100 on RunPod
Add NVIDIA H100 instances for higher throughput and larger model footprints.
Savings Plans
Commitment-based discounts for predictable workloads to lower effective hourly rates.
Smoother auth and multi-region serverless with persistent storage.
The New and Improved RunPod Login Experience
Streamlined sign-in and team access for faster, more consistent auth flows.
Network Volumes added to serverless
Attach persistent storage to serverless workers to retain models/artifacts across restarts and speed cold starts via caching.
Serverless Region Support
Pin or allow specific regions for endpoints to reduce latency and meet data-residency needs.
Deeper autoscaling controls, richer metrics, persistent storage, and job cancellation.
Serverless Scaling Strategies
Scale by queue delay and/or concurrency with min/max worker bounds to balance latency and cost.
Queue Delay
Expose time-in-queue as a first-class metric to drive autoscaling and SLO monitoring.
Request Count
Track success/failure totals over windows for quick health checks and alerting.
runsync
Synchronous invocation path that returns results in the same HTTP call for short-running jobs.
Network Storage beta
Region-scoped, attachable volumes shareable across pods/endpoints for model caches and datasets.
Job cancel API
Programmatically terminate queued or running jobs to free capacity and enforce client timeouts.
Serverless platform hardens with a cleaner, more capable API.
Serverless API v2
Revised request/response schema and error semantics with improved endpoints for job lifecycle and observability.
Better control over notifications and GPU allocation during contention.
Notification preferences
Configure which platform events trigger alerts to reduce noise for teams and CI systems.
GPU Priorities
Influence scheduling by marking workloads as higher priority to reduce queue time for critical jobs.
Security-first release enabling encryption for persistent data.
RunPod Now Offers Encrypted Volumes
Enable at-rest encryption for persistent volumes with no app changes. Keys are platform-managed, and encrypted volumes mount like standard volumes.