Announcing Runpod Flash

Introducing Flash: Run GPU workloads on Runpod Serverless: No Docker required

We've just released a way to run Serverless code without needing to build a Docker image: check it out.

Introducing Flash: Run GPU workloads on Runpod Serverless: No Docker required

One of the most common pieces of feedback we hear from developers building on Serverless is that the experience isn't as frictionless as working in a pod. Between writing Dockerfiles, building images, pushing to a registry, and wiring everything up to an endpoint, the overhead can turn a quick experiment into a multi-hour chore. Today, we're changing that with Flash: an application framework that lets you deploy GPU-accelerated Python functions to Runpod Serverless with a single decorator.

No Docker. No container orchestration. Just Python.

What is Flash?

Flash is a Python SDK for distributed inference and orchestration on Runpod's serverless infrastructure. You write functions locally, decorate them with @remote, and Flash takes care of everything else: provisioning the serverless endpoint, selecting the GPU hardware you specify, installing your dependencies, and returning the results. Docker still runs under the hood to execute your code, but you never need to touch it.

The only things you need to get started are a Runpod account with a balance, an API key, and the runpod-flash package installed on your local machine.

How It Works

The workflow is straightforward. You define a Python function with the @endpoint decorator, specifying the GPU type, worker count, and any pip dependencies your code needs. When you run the script, Flash silently creates and manages a serverless endpoint behind the scenes — you'll even see it appear in the Runpod console if you want to inspect it. Your function executes on the remote hardware and the results come back to your terminal.

Here's a minimal example that runs a matrix multiplication on a remote GPU:

Build Production APIs with Flash and FastAPI

Flash isn't just for standalone scripts. You can pair it with FastAPI to build full production APIs in under 50 lines of code. The pattern is clean: define your Flash-decorated worker functions in one file, mount a FastAPI router in another, and use flash run to serve the whole thing.

Then just run:

In a second terminal, send a request:

And you'll get a JSON response back from the GPU:

That's a full production API — decorator, router, and live GPU inference — in under 50 lines.

Why Flash Matters

We built Flash because we believe the barrier to entry for serverless GPU computing should be as low as possible. Docker is powerful, but for many developers. Especially those prototyping, experimenting, or iterating quickly; it adds friction that slows down the creative loop.

Flash changes the economics of development too. Instead of keeping a pod running while you code and test, you can send requests to a serverless endpoint only when you're ready. You pay for compute when you use it, not while you're thinking.

And because Flash endpoints are real Runpod serverless endpoints, you get all the production benefits: autoscaling, cold start management, and GPU availability across our full fleet.

Get Started

Flash is open source and ready to use today.

The examples repository includes walkthroughs covering everything from hello-world GPU scripts to CPU workers, mixed worker configurations, dependency management, and load-balanced endpoints with custom HTTP routes.

We're excited to make serverless development feel as natural as writing local Python. Give Flash a try, and let us know what you build.

Author profile: Brendan McKeag

Related articles

View All

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.