Serverless Scaling Strategies
Scale by queue delay and/or concurrency with min/max worker bounds to balance latency and cost.
Queue Delay
Expose time-in-queue as a first-class metric to drive autoscaling and SLO monitoring.
Request Count
Track success/failure totals over windows for quick health checks and alerting.
runsync
Synchronous invocation path that returns results in the same HTTP call for short-running jobs.
Network Storage beta
Region-scoped, attachable volumes shareable across pods/endpoints for model caches and datasets.
Job cancel API
Programmatically terminate queued or running jobs to free capacity and enforce client timeouts.