Intro to WebSocket Streaming with Runpod Serverless
This follow-up to our “Hello World” tutorial walks through streaming output from a Runpod Serverless endpoint using WebSocket and base64 files.
In this followup to our 'Hello World' tutorial, we'll create a serverless endpoint that processes base64-encoded files and streams back the results. This will demonstrate how you can work with file input/output over our serverless environment by encoding the file as data within a JSON payload.
As before, this tutorial will be aimed at OSX developers.
Step 1: Creating the Project
Create your development environment.
Step 2: Creating the Handler
Create a new file called handler.py. Remember that the handler loop is how code gets executed when a worker is active. In this example, this handler will simulate image processing. Since this tutorial is to demonstrate the serverless environment more than process images, we will have it just create a static, blank image as the payload.
Step 3: Creating the Dockerfile and requirements.txt
As with the previous tutorial, we'll need to provide the Dockerfile and requirements.txt to build and push the image.
Step 4: Build and Push to DockerHub
As before, build and push your image to DockerHub, and then pull it into your endpoint.
Step 5: Running the Endpoint in Code
Here, we'll provide an example of how to interact with the endpoint in code. You'll need to provide your Runpod API key and Endpoint ID in the variables up top. Let's call this test_endpoint.py.
What this code will do is send a request to the endpoint you've created, let it process and return base64 data in a JSON payload, and return it to your script for further local processing and saving.
Run the test:
You should see output like this, along with a base64 JSON payload saved in the folder you ran the script in.
Conclusion
You've now learned how to create a Runpod serverless endpoint that can process base64-encoded files and stream results back to the client. This pattern can be extended to handle various types of file processing tasks while providing real-time feedback to users.
What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys
Whether you're already running production endpoints on Runpod or you're sizing us up for the first time, here's a plain-language tour of what Runpod Serverless does today, why it's faster and cheaper than it was six months ago, and how to deploy your first endpoint in minutes.
Beyond the Notebook: The Engineering Realities of Production AI Agents
Shift from stateless inference to stateful architectures to resolve infrastructure bottlenecks like memory management, concurrency limits, and runaway jobs in production AI agents.