Blog

Run Hugging Face spaces on Runpod!

Learn how to deploy any Hugging Face Space on Runpod using Docker, including an example with Kokoro TTS and Gradio.

Hugging Face Spaces are interactive demos that showcase AI models directly on the Hugging Face platform. They're great for experimenting with AI capabilities, but what if you want more computing power or need to run these models in your own environment? Or you want to use them as much as you want to without being rate limited?

Good news! Every Hugging Face Space can now be run using Docker, which means you can deploy them on platforms like Runpod to leverage powerful GPUs. In this guide, we'll walk through deploying Kokoro TTS (a Text-to-Speech model) via Gradio from Hugging Face to Runpod.

What is Gradio?

Gradio is a popular Python library that creates user-friendly interfaces for machine learning models. Many Hugging Face Spaces, including Kokoro TTS, use Gradio to provide an interactive web interface where you can test the model's capabilities through your browser. By the end of this tutorial, you'll have this same interface running on your Runpod instance.

Why Kokoro TTS?

We've chosen Kokoro TTS for this example because it's a powerful text-to-speech model that benefits from GPU acceleration. This makes it a perfect candidate to demonstrate how to move from Hugging Face Spaces to Runpod's more flexible, accessible computing environment.

Prerequisites

A Hugging Face account (to generate an access token)
A Runpod account with payment method set up

Setup

First, go to the Kokoro TTS Space on Hugging Face. In the upper right you'll see a pullout menu with three dots. Click Run Locally to get more info about the Docker image that drives the Space.

Hugging Face Space options menu with Run locally, Clone repository, and Duplicate this Space

Hugging Face Run locally dialog with a docker run command for the Kokoro TTS Space image

Copy this Docker command down, as it contains variables that we will need later.

In addition, you will need an access token, which you can get from your Settings page.

Log in to your Hugging Face account.
Go to https://huggingface.co/settings/tokens
Click New Token
Name your token (e.g., "Runpod Access") and select appropriate permissions
Click Generate Token and copy the generated token to a secure location

Configure your Template

Now, go to Templates in the nav bar on Runpod, and click New Template.

Runpod console sidebar with Templates selected under the Manage menu

Runpod Deploy GPU Pod page showing a custom template card and a New Template button

Set the Container image as the container image specified in the docker container. In our case this is registry.hf.space/hexgrad-kokoro-tts:latest, which we got from point 3 above
Set the "Expose HTTP Ports" to 7860, since that's the port being exposed in the Docker command above.
Enter bash -c "python app.py" as your container start command.
Open "Environment variables" in the bottom of the template, and set it to:
- key : HUGGING_FACE_HUB_TOKEN
- value : (your huggingface hub token)

Runpod template editor with the Kokoro TTS container image and HTTP port 7860 configured

Template environment variables section with HUGGING_FACE_HUB_TOKEN set to a placeholder key

Deploy a Pod with Your Template

Navigate to the Pods section in the left navigation bar
Click the Deploy button
Select a GPU type (H100 works well due to high VRAM, but you can experiment with less expensive options)
Scroll down and click Change Template
Select the template you just created ("Kokoro TTS")
Review your settings and click Deploy at the bottom

Runpod GPU selection grid listing NVIDIA GPUs like H200 SXM and RTX 4090 with hourly prices and availability

Runpod pod deployment configuration with a template selected and the GPU count slider set to 1

Connect to your Pod

After deploying, you'll be taken to the Pods screen
Click on your newly created pod to view details
Check the Logs tab to monitor startup progress
Wait until you see a message indicating the service is running on port 7860
Once ready, click the Connect button on your pod
Select the HTTP Service option (usually has port 7860)

Review the logs in your deployed pods under the Pods screen, and wait for a notification to appear that the image is up and running on port 7860.

Pod logs showing Kokoro TTS voice files downloading and the app starting on port 7860

Runpod pod details panel showing disk size, utilization bars, and a Connect button

Runpod connection options with an HTTP service ready on port 7860 and a stopped web terminal

And ta-da!, you should see your deployment within Runpod!

Kokoro TTS demo on Hugging Face Spaces with text input, voice and speed controls, and a Generate button

Conclusion

You've successfully deployed a Hugging Face Space on Runpod! This approach works for virtually any Hugging Face Space - just repeat these steps with the appropriate Docker image and port. Runpod gives you the flexibility to choose more powerful hardware when needed, allowing you to run more demanding models than what's possible directly on Hugging Face.

‍

What's new in Runpod Serverless: Faster cold starts, batch inference, and no-Docker deploys

Whether you're already running production endpoints on Runpod or you're sizing us up for the first time, here's a plain-language tour of what Runpod Serverless does today, why it's faster and cheaper than it was six months ago, and how to deploy your first endpoint in minutes.

Beyond the Notebook: The Engineering Realities of Production AI Agents

Shift from stateless inference to stateful architectures to resolve infrastructure bottlenecks like memory management, concurrency limits, and runaway jobs in production AI agents.

One Million Developers on Runpod, and the Cloud We’re Building Next

We raised a $100 million Series A. Here's what it means for you.

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started