Hot starts, batch inference, and what's next for Runpod Serverless. Webinar June 25.

Run Hugging Face spaces on Runpod!

Learn how to deploy any Hugging Face Space on Runpod using Docker, including an example with Kokoro TTS and Gradio.

Run Hugging Face spaces on Runpod!

Hugging Face Spaces are interactive demos that showcase AI models directly on the Hugging Face platform. They're great for experimenting with AI capabilities, but what if you want more computing power or need to run these models in your own environment? Or you want to use them as much as you want to without being rate limited?

Good news! Every Hugging Face Space can now be run using Docker, which means you can deploy them on platforms like Runpod to leverage powerful GPUs. In this guide, we'll walk through deploying Kokoro TTS (a Text-to-Speech model) via Gradio from Hugging Face to Runpod.

What is Gradio?

Gradio is a popular Python library that creates user-friendly interfaces for machine learning models. Many Hugging Face Spaces, including Kokoro TTS, use Gradio to provide an interactive web interface where you can test the model's capabilities through your browser. By the end of this tutorial, you'll have this same interface running on your Runpod instance.

Why Kokoro TTS?

We've chosen Kokoro TTS for this example because it's a powerful text-to-speech model that benefits from GPU acceleration. This makes it a perfect candidate to demonstrate how to move from Hugging Face Spaces to Runpod's more flexible, accessible computing environment.

Prerequisites

  • A Hugging Face account (to generate an access token)
  • A Runpod account with payment method set up

Setup

First, go to the Kokoro TTS Space on Hugging Face. In the upper right you'll see a pullout menu with three dots. Click Run Locally to get more info about the Docker image that drives the Space.

Hugging Face Space options menu with Run locally, Clone repository, and Duplicate this Space
Hugging Face Run locally dialog with a docker run command for the Kokoro TTS Space image

Copy this Docker command down, as it contains variables that we will need later.

In addition, you will need an access token, which you can get from your Settings page.

  1. Log in to your Hugging Face account.
  2. Go to https://huggingface.co/settings/tokens
  3. Click New Token
  4. Name your token (e.g., "Runpod Access") and select appropriate permissions
  5. Click Generate Token and copy the generated token to a secure location

Configure your Template

Now, go to Templates in the nav bar on Runpod, and click New Template.

Runpod console sidebar with Templates selected under the Manage menu
Runpod Deploy GPU Pod page showing a custom template card and a New Template button
  1. Set the Container image as the container image specified in the docker container. In our case this is registry.hf.space/hexgrad-kokoro-tts:latest, which we got from point 3 above
  2. Set the "Expose HTTP Ports" to 7860, since that's the port being exposed in the Docker command above.
  3. Enter bash -c "python app.py" as your container start command.
  4. Open "Environment variables" in the bottom of the template, and set it to:
    - key : HUGGING_FACE_HUB_TOKEN
    - value : (your huggingface hub token)
Runpod template editor with the Kokoro TTS container image and HTTP port 7860 configured
Template environment variables section with HUGGING_FACE_HUB_TOKEN set to a placeholder key

Deploy a Pod with Your Template

  1. Navigate to the Pods section in the left navigation bar
  2. Click the Deploy button
  3. Select a GPU type (H100 works well due to high VRAM, but you can experiment with less expensive options)
  4. Scroll down and click Change Template
  5. Select the template you just created ("Kokoro TTS")
  6. Review your settings and click Deploy at the bottom
Runpod GPU selection grid listing NVIDIA GPUs like H200 SXM and RTX 4090 with hourly prices and availability
Runpod pod deployment configuration with a template selected and the GPU count slider set to 1

Connect to your Pod

  • After deploying, you'll be taken to the Pods screen
  • Click on your newly created pod to view details
  • Check the Logs tab to monitor startup progress
  • Wait until you see a message indicating the service is running on port 7860
  • Once ready, click the Connect button on your pod
  • Select the HTTP Service option (usually has port 7860)

Review the logs in your deployed pods under the Pods screen, and wait for a notification to appear that the image is up and running on port 7860.

Pod logs showing Kokoro TTS voice files downloading and the app starting on port 7860
Runpod pod details panel showing disk size, utilization bars, and a Connect button
Runpod connection options with an HTTP service ready on port 7860 and a stopped web terminal

And ta-da!, you should see your deployment within Runpod!

Kokoro TTS demo on Hugging Face Spaces with text input, voice and speed controls, and a Generate button

Conclusion

You've successfully deployed a Hugging Face Space on Runpod! This approach works for virtually any Hugging Face Space - just repeat these steps with the appropriate Docker image and port. Runpod gives you the flexibility to choose more powerful hardware when needed, allowing you to run more demanding models than what's possible directly on Hugging Face.

Author profile: River Snow

Related articles

View All
Deploy When Available is now GA

Deploy When Available is now GA

Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.

All
The Chips Got Faster. The Stack Didn't.

The Chips Got Faster. The Stack Didn't.

Explore why faster chips have shifted the bottleneck to AI infrastructure, and what that means for teams running production workloads.

All

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.