
The Chips Got Faster. The Stack Didn't.
The bottleneck has moved.
Blog
We've just released a way to run Serverless code without needing to build a Docker image: check it out.
.jpeg)
One of the most common pieces of feedback we hear from developers building on Serverless is that the experience isn't as frictionless as working in a pod. Between writing Dockerfiles, building images, pushing to a registry, and wiring everything up to an endpoint, the overhead can turn a quick experiment into a multi-hour chore. Today, we're changing that with Flash: an application framework that lets you deploy GPU-accelerated Python functions to Runpod Serverless with a single decorator.
No Docker. No container orchestration. Just Python.
Flash is a Python SDK for distributed inference and orchestration on Runpod's serverless infrastructure. You write functions locally, decorate them with @remote, and Flash takes care of everything else: provisioning the serverless endpoint, selecting the GPU hardware you specify, installing your dependencies, and returning the results. Docker still runs under the hood to execute your code, but you never need to touch it.
The only things you need to get started are a Runpod account with a balance, an API key, and the runpod-flash package installed on your local machine.
The workflow is straightforward. You define a Python function with the @endpoint decorator, specifying the GPU type, worker count, and any pip dependencies your code needs. When you run the script, Flash silently creates and manages a serverless endpoint behind the scenes — you'll even see it appear in the Runpod console if you want to inspect it. Your function executes on the remote hardware and the results come back to your terminal.
Here's a minimal example that runs a matrix multiplication on a remote GPU:
Flash isn't just for standalone scripts. You can pair it with FastAPI to build full production APIs in under 50 lines of code. The pattern is clean: define your Flash-decorated worker functions in one file, mount a FastAPI router in another, and use flash run to serve the whole thing.
Then just run:
In a second terminal, send a request:
And you'll get a JSON response back from the GPU:
That's a full production API — decorator, router, and live GPU inference — in under 50 lines.
We built Flash because we believe the barrier to entry for serverless GPU computing should be as low as possible. Docker is powerful, but for many developers. Especially those prototyping, experimenting, or iterating quickly; it adds friction that slows down the creative loop.
Flash changes the economics of development too. Instead of keeping a pod running while you code and test, you can send requests to a serverless endpoint only when you're ready. You pay for compute when you use it, not while you're thinking.
And because Flash endpoints are real Runpod serverless endpoints, you get all the production benefits: autoscaling, cold start management, and GPU availability across our full fleet.
Flash is open source and ready to use today.
pip install runpod-flashThe examples repository includes walkthroughs covering everything from hello-world GPU scripts to CPU workers, mixed worker configurations, dependency management, and load-balanced endpoints with custom HTTP routes.
We're excited to make serverless development feel as natural as writing local Python. Give Flash a try, and let us know what you build.
Author profile: Brendan McKeag
Blog Posts

The bottleneck has moved.
.jpeg)
With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.
.jpeg)
How 1,100 researchers beat OpenAI's own baseline with 16 megabytes and 10 minutes.