Explore our credit programs for startups and researchers.

Back
Guides
June 6, 2025

Running Whisper with a UI in Docker: A Beginner’s Guide

Emmett Fear
Solutions Engineer

OpenAI’s Whisper is one of the most powerful open-source tools for automatic speech recognition (ASR). Whether you're looking to transcribe podcasts, meetings, or voice memos, Whisper delivers fast, multilingual, and highly accurate transcription. But deploying it efficiently, especially with a user interface and GPU acceleration, requires the right stack.

This article will walk you through setting up Whisper with a UI in Docker, deploying it on RunPod’s cloud GPU containers, and enhancing it for real-world applications. No DevOps team or expensive servers needed.

What Is Whisper?

Whisper is an open-source ASR system by OpenAI. It's trained on 680,000+ hours of multilingual audio data and is capable of:

  • Speech-to-text transcription
  • Language detection
  • Translation to English
  • Multilingual audio support
  • Handling diverse accents and noisy audio

Unlike typical ASR tools, Whisper generalizes well without fine-tuning, making it ideal for developers, researchers, and product teams.

Why Run Whisper in Docker?

Running Whisper in a Docker container provides several key benefits:

Portability

Your setup works the same across any machine—whether it’s your laptop, cloud GPU, or server.

Isolation

Dependencies don’t conflict with other projects or system libraries.

Easy Deployment

Once containerized, the app can be deployed across cloud platforms like RunPod, AWS, or Google Cloud in minutes.

Scalable UI Support

By integrating Gradio, we can create a web interface for easier interaction.

Why Use a RunPod?

RunPod offers GPU-accelerated container hosting that’s beginner-friendly and powerful. It provides:

  • Access to powerful GPUs (A100, H100, 3090, etc.)
  • On-demand containers (no long-term commitment)
  • Pre-configured environments for AI and ML workloads
  • Usage-based pricing
  • Simple UI for launching containers

Perfect for Whisper, which performs best with GPU acceleration.

Step-by-Step: Create Whisper UI in Docker

Start with a Dockerfile

Dockerfile
CopyEdit
FROM python:3.10-slim
RUN apt-get update && apt-get install -y ffmpeg git
RUN pip install --upgrade pip
RUN pip install git+https://github.com/openai/whisper.git gradio
COPY app.py /app/app.py
WORKDIR /app
CMD ["python", "app.py"]

ffmpeg is critical for audio format support.
gradio provides the user-friendly UI.

Create the Gradio UI Script

python
CopyEdit
import whisper
import gradio as gr
model = whisper.load_model("base")
def transcribe(audio):
result = model.transcribe(audio)
return result['text']
interface = gr.Interface(fn=transcribe, inputs="audio", outputs="text")
interface.launch(server_name="0.0.0.0", share=True)

You can replace "base" with "tiny", "small", "medium", or "large" based on GPU and performance needs.

Step 2: Run and Test Locally

  1. Build the container:

bash
CopyEdit
docker build -t whisper-ui .

  1. Run it:

bash
CopyEdit
docker run -p 7860:7860 whisper-ui

  1. Visit:
    http://localhost:7860 in your browser to test the UI.

Step 3: Push to Docker Hub

bash
CopyEdit
docker tag whisper-ui yourusername/whisper-ui
docker push yourusername/whisper-ui

This step prepares your container image for deployment on RunPod.

Step 4: Deploy on RunPod

  1. Go to RunPod Custom Containers
  2. Click Launch Custom Image
  3. Paste your container’s image name (e.g., yourusername/whisper-ui)
  4. Choose a GPU (A100 or 3090 is a good start)
  5. Set Port 7860 (default for Gradio)
  6. Click Deploy

RunPod will assign a public URL (e.g., https://container-id.runpod.io) where your Whisper UI will be live.

RunPod GPU Pricing Overview

GPU ModelSpecsHourly Cost (approx.)
A400016GB~$0.15/hr
RTX 309024GB~$0.24/hr
A10040GB~$1.85/hr
H10080GB~$2.40/hr

Pay only for what you use. Set up auto-shutdown to reduce costs.

Add Authentication to Your App

Don’t expose your transcription app to the whole internet without protection.

Option 1: Gradio Basic Auth

python
CopyEdit
interface.launch(server_name="0.0.0.0", auth=("admin", "securepass"))

Option 2: NGINX Reverse Proxy + HTTPS

Install NGINX, point it to port 7860, and secure it with Let’s Encrypt SSL.

Whisper Model Sizes & Performance

ModelSizeSpeed (GPU)Accuracy
tiny~39 MBVery FastLower
base~74 MBFastDecent
small~244 MBBalancedGood
medium~769 MBSlowerBetter
large~1.5 GBSlowestBest

Choose based on your accuracy needs and GPU memory.

Use Cases for Whisper + UI

Podcast & Video Creators

Transcribe episodes automatically and export to blogs or subtitles.

Business Teams

Turn Zoom/Meet recordings into searchable meeting notes.

Researchers & Students

Quickly transcribe lectures or interviews.

Mobile App Backend

Build voice-to-text features for accessibility or productivity apps.

Automation: Whisper + RunPod API

Use RunPod’s API to launch containers, send audio, and retrieve results without UI.

Example: Automate with Python

python
CopyEdit
import requests
# Use RunPod's API to launch container or send file
# Automate audio upload & receive transcript

Useful for SaaS, batch processing, and multi-user platforms.

Developer Tips & Best Practices

  • Always test locally before deploying.
  • Add GPU checks to ensure Whisper uses CUDA.
  • Use loggers instead of print statements for better debugging.
  • Mount persistent storage for large audio files.
  • Use ngrok or share=True in dev but avoid in production.
  • Use Docker volumes if you need to cache model weights.

Troubleshooting Common Issues

UI Not Loading?

  • Ensure Docker exposes port 7860
  • Use server_name="0.0.0.0"

Audio Not Transcribed?

  • Use supported formats: .mp3, .wav, .flac
  • Ensure ffmpeg is installed in container

GPU Not Detected?

  • Whisper defaults to CPU if no GPU is available
  • Run nvidia-smi inside the container to verify GPU

High Memory Usage?

  • Try base or small model
  • Offload files after processing

Documentation & Links

  • Whisper GitHub
  • RunPod Container Docs
  • RunPod Pricing
  • Gradio Interface Docs
  • Docker Hub

FAQ – Whisper UI + Docker + RunPod

Q1: How long does a transcription take?
On a GPU, 5-minute audio can be transcribed in under 15 seconds with the base model.

Q2: Can I run Whisper without a GPU?
Yes, but it will be significantly slower—use a tiny or base model for better speed.

Q3: Is this setup suitable for production apps?
Yes, with added auth, logging, HTTPS, and error handling, it’s production-ready.

Q4: Can I customize the UI?
Yes! Gradio supports advanced layouts, inputs (video, files), and outputs (JSON, summaries).

Q5: Is RunPod the only option for GPU containers?
No, but it’s among the simplest. Alternatives include Lambda Labs, Paperspace, and NVIDIA NGC.

Q6: How do I scale the app?
Use multiple container instances, load balancers, and background task queues (e.g., Celery).

Q7: How to save transcripts automatically?
You can modify the transcribe() function to write output to .txt or send it to a database.

Conclusion

You now have a powerful transcription app using OpenAI’s Whisper, running in a Docker container, deployed with GPU acceleration on RunPod, and accessed through a friendly web UI built with Gradio.

Whether you're building internal tools, launching a SaaS, or simply automating your own audio workflows, this setup is robust, scalable, and surprisingly easy to maintain.

Ready to Get Started?

Launch Your Whisper UI on RunPod Now – Harness GPU power for real-time, multilingual speech transcription.

Need help customizing your app? Want to integrate translation, keyword extraction, or summaries? Just ask!

Get started with RunPod 
today.
We handle millions of gpu requests a day. Scale your machine learning workloads while keeping costs low with RunPod.
Get Started