We've cooked up a bunch of improvements designed to reduce friction and make the.


Orchestration is the automation layer that makes compute, data, and runs dependable and cost-effective. In practice that means you declare what you want (which GPU, which image, what data) and an automation plane provisions resources, mounts volumes, runs jobs, collects metrics, and tears things down when work is done.
Good orchestration reduces cognitive overhead, speeds iteration, and directly lowers cloud spend by avoiding manual “just-in-case” provisioning and by enabling policies that prevent waste.
How this space compares to familiar tools:
ML teams benefit from orchestration that is GPU-aware, developer-friendly, and policy-driven so teams can iterate quickly without paying for avoidable waste
dstack
is an open-source, lightweight alternative to Kubernetes and Slurm — easier to operate day-to-day and built with a GPU-native design. It natively integrates with modern neo-clouds, so you can manage infrastructure on Runpod, other providers, or on-prem clusters from a single control plane.
It exposes three first-class primitives you declare in .dstack.yml
:
dstack
is declarative and CLI-first: apply a YAML and dstack apply
makes the desired state real (create/update/monitor).
It functions both as an orchestrator (scheduling & provisioning) and as a team control plane (policies, metrics, and reusable project defaults), optimized for dev workflows while supporting production tasks and services.
Poor automation, forgotten sessions and low GPU utilization multiply your effective cost per useful GPU-hour. Two compact scenarios show how this happens.
Scenario A — ~3.5x
Scenario B — 7x
Even small idle padding plus mediocre utilization compounds quickly. The fix is simple in principle: reduce paid hours (auto-shutdown), increase utilization (better pipelines / profiling), and use policies (utilization-based termination, spot strategies, team defaults).
EA case study: Electronic Arts uses dstack
to streamline provisioning, improve utilization, and cut GPU costs.
"dstack
provisions compute on demand and automatically shuts it down when no longer needed. That alone saves you over three times in cost."
-- Wah Loon Keng, Sr. AI Engineer, Electronic Arts
dstack
provides general automation — declarative provisioning, unified logs/metrics, and managed lifecycle — and layered on top are focused policy primitives you can apply per-resource or as project defaults to cut waste.
For interactive work, set inactivity_duration
so dev environments stop after a period with no attached user. Example:
type: dev-environment
name: vscode
ide: vscode
# Stop if inactive for 2 hours
inactivity_duration: 2h
resources:
gpu: H100:8
For dev environments and tasks, use utilization_policy
to terminate tasks when GPUs stay under a utilization threshold for a time window:
type: task
name: train
repos:
- .
python: 3.12
commands:
- uv pip install -r requirements.txt
- python train.py
resources:
gpu: H100:8
utilization_policy:
min_gpu_utilization: 10
time_window: 1h
To reduce hourly spend, prefer spot instances with spot_policy
and a price cap; combine with checkpointing and retries for resilience:
type: service
name: llama-2-7b-service
python: 3.12
env:
- HF_TOKEN
- MODEL=NousResearch/Llama-2-7b-chat-hf
commands:
- uv pip install vllm
- |
python -m vllm.entrypoints.openai.api_server \
--model $MODEL \
--port 8000
port: 8000
resources:
gpu: 24GB
# Use spot instances if available
spot_policy: auto
Last but not least, dstack
’s multi-cloud and hybrid support lets you route jobs to the cheapest or closest backend without changing definitions.
Runpod is natively supported as a dstack
backend. That means your dstack
server can request Runpod pods directly, letting you combine dstack
ergonomics with Runpod’s GPU portfolio.
Quick steps:
.dstack.yml
and run dstack apply
. Monitor startup with dstack logs
.Example backend snippet (~/.dstack/server/config.yml
):
projects:
- name: main
backends:
- type: runpod
creds:
type: api_key
api_key: US9XTPDIV8AR42MMINY8TCKRB8S4E7LNRQ6CAUQ9
Set various policies in your run configurations to balance cost and resilience on Runpod.
Other useful options to configure and bake into team defaults:
Find even more tips at Protips.
dstack
provides a GPU-native control plane that covers the full ML lifecycle — from development to training to inference. Its orchestration combines automation, policy-driven resource management, and utilization monitoring, helping teams eliminate idle and under-used GPUs while speeding iteration.
With Runpod’s flexible GPU options, dstack
lets teams focus on building and deploying models, turning orchestration into both efficiency and real cost savings.
The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.