Running Whisper with a UI in Docker: A Beginner’s Guide
OpenAI’s Whisper is one of the most powerful open-source tools for automatic speech recognition (ASR). Whether you're looking to transcribe podcasts, meetings, or voice memos, Whisper delivers fast, multilingual, and highly accurate transcription. But deploying it efficiently, especially with a user interface and GPU acceleration, requires the right stack.
This article will walk you through setting up Whisper with a UI in Docker, deploying it on RunPod’s cloud GPU containers, and enhancing it for real-world applications. No DevOps team or expensive servers needed.
What Is Whisper?
Whisper is an open-source ASR system by OpenAI. It's trained on 680,000+ hours of multilingual audio data and is capable of:
- Speech-to-text transcription
- Language detection
- Translation to English
- Multilingual audio support
- Handling diverse accents and noisy audio
Unlike typical ASR tools, Whisper generalizes well without fine-tuning, making it ideal for developers, researchers, and product teams.
Why Run Whisper in Docker?
Running Whisper in a Docker container provides several key benefits:
Your setup works the same across any machine—whether it’s your laptop, cloud GPU, or server.
Dependencies don’t conflict with other projects or system libraries.
Once containerized, the app can be deployed across cloud platforms like RunPod, AWS, or Google Cloud in minutes.
By integrating Gradio, we can create a web interface for easier interaction.
Why Use a RunPod?
RunPod offers GPU-accelerated container hosting that’s beginner-friendly and powerful. It provides:
- Access to powerful GPUs (A100, H100, 3090, etc.)
- On-demand containers (no long-term commitment)
- Pre-configured environments for AI and ML workloads
- Usage-based pricing
- Simple UI for launching containers
Perfect for Whisper, which performs best with GPU acceleration.
Step-by-Step: Create Whisper UI in Docker
Dockerfile
CopyEdit
FROM python:3.10-slim
RUN apt-get update && apt-get install -y ffmpeg git
RUN pip install --upgrade pip
RUN pip install git+https://github.com/openai/whisper.git gradio
COPY app.py /app/app.py
WORKDIR /app
CMD ["python", "app.py"]
ffmpeg
is critical for audio format support.
gradio
provides the user-friendly UI.
python
CopyEdit
import whisper
import gradio as gr
model = whisper.load_model("base")
def transcribe(audio):
result = model.transcribe(audio)
return result['text']
interface = gr.Interface(fn=transcribe, inputs="audio", outputs="text")
interface.launch(server_name="0.0.0.0", share=True)
You can replace "base"
with "tiny"
, "small"
, "medium"
, or "large"
based on GPU and performance needs.
Step 2: Run and Test Locally
- Build the container:
bash
CopyEdit
docker build -t whisper-ui .
- Run it:
bash
CopyEdit
docker run -p 7860:7860 whisper-ui
- Visit:
http://localhost:7860
in your browser to test the UI.
Step 3: Push to Docker Hub
bash
CopyEdit
docker tag whisper-ui yourusername/whisper-ui
docker push yourusername/whisper-ui
This step prepares your container image for deployment on RunPod.
Step 4: Deploy on RunPod
- Go to RunPod Custom Containers
- Click Launch Custom Image
- Paste your container’s image name (e.g.,
yourusername/whisper-ui
) - Choose a GPU (A100 or 3090 is a good start)
- Set Port 7860 (default for Gradio)
- Click Deploy
RunPod will assign a public URL (e.g., https://container-id.runpod.io
) where your Whisper UI will be live.
RunPod GPU Pricing Overview
GPU Model | Specs | Hourly Cost (approx.) |
---|---|---|
A4000 | 16GB | ~$0.15/hr |
RTX 3090 | 24GB | ~$0.24/hr |
A100 | 40GB | ~$1.85/hr |
H100 | 80GB | ~$2.40/hr |
Pay only for what you use. Set up auto-shutdown to reduce costs.
Add Authentication to Your App
Don’t expose your transcription app to the whole internet without protection.
python
CopyEdit
interface.launch(server_name="0.0.0.0", auth=("admin", "securepass"))
Install NGINX, point it to port 7860
, and secure it with Let’s Encrypt SSL.
Whisper Model Sizes & Performance
Model | Size | Speed (GPU) | Accuracy |
---|---|---|---|
tiny | ~39 MB | Very Fast | Lower |
base | ~74 MB | Fast | Decent |
small | ~244 MB | Balanced | Good |
medium | ~769 MB | Slower | Better |
large | ~1.5 GB | Slowest | Best |
Choose based on your accuracy needs and GPU memory.
Use Cases for Whisper + UI
Transcribe episodes automatically and export to blogs or subtitles.
Turn Zoom/Meet recordings into searchable meeting notes.
Quickly transcribe lectures or interviews.
Build voice-to-text features for accessibility or productivity apps.
Automation: Whisper + RunPod API
Use RunPod’s API to launch containers, send audio, and retrieve results without UI.
python
CopyEdit
import requests
# Use RunPod's API to launch container or send file
# Automate audio upload & receive transcript
Useful for SaaS, batch processing, and multi-user platforms.
Developer Tips & Best Practices
- Always test locally before deploying.
- Add GPU checks to ensure Whisper uses CUDA.
- Use loggers instead of print statements for better debugging.
- Mount persistent storage for large audio files.
- Use ngrok or share=True in dev but avoid in production.
- Use Docker volumes if you need to cache model weights.
Troubleshooting Common Issues
UI Not Loading?
- Ensure Docker exposes port 7860
- Use
server_name="0.0.0.0"
Audio Not Transcribed?
- Use supported formats:
.mp3
,.wav
,.flac
- Ensure ffmpeg is installed in container
GPU Not Detected?
- Whisper defaults to CPU if no GPU is available
- Run
nvidia-smi
inside the container to verify GPU
High Memory Usage?
- Try
base
orsmall
model - Offload files after processing
Documentation & Links
- Whisper GitHub
- RunPod Container Docs
- RunPod Pricing
- Gradio Interface Docs
- Docker Hub
FAQ – Whisper UI + Docker + RunPod
Q1: How long does a transcription take?
On a GPU, 5-minute audio can be transcribed in under 15 seconds with the base
model.
Q2: Can I run Whisper without a GPU?
Yes, but it will be significantly slower—use a tiny or base
model for better speed.
Q3: Is this setup suitable for production apps?
Yes, with added auth, logging, HTTPS, and error handling, it’s production-ready.
Q4: Can I customize the UI?
Yes! Gradio supports advanced layouts, inputs (video, files), and outputs (JSON, summaries).
Q5: Is RunPod the only option for GPU containers?
No, but it’s among the simplest. Alternatives include Lambda Labs, Paperspace, and NVIDIA NGC.
Q6: How do I scale the app?
Use multiple container instances, load balancers, and background task queues (e.g., Celery).
Q7: How to save transcripts automatically?
You can modify the transcribe()
function to write output to .txt
or send it to a database.
Conclusion
You now have a powerful transcription app using OpenAI’s Whisper, running in a Docker container, deployed with GPU acceleration on RunPod, and accessed through a friendly web UI built with Gradio.
Whether you're building internal tools, launching a SaaS, or simply automating your own audio workflows, this setup is robust, scalable, and surprisingly easy to maintain.
Launch Your Whisper UI on RunPod Now – Harness GPU power for real-time, multilingual speech transcription.
Need help customizing your app? Want to integrate translation, keyword extraction, or summaries? Just ask!