Emmett Fear

How to Use WAN 2.5 on Runpod

April 10, 2026

WAN 2.5 is an AI video generation model from Alibaba that introduced something the WAN series hadn't offered before: native synchronized audio. Where WAN 2.2 and earlier versions generated silent video clips, WAN 2.5 produces audio and video together in a single pass, with voices, ambient sound, and background music aligned to the on-screen action.

Runpod offers a public serverless endpoint for WAN 2.5 image-to-video generation, so you can start generating clips without managing infrastructure, containers, or GPU environments. This guide covers what WAN 2.5 is, how it differs from WAN 2.2, and how to use it on Runpod today.

What is WAN 2.5?

WAN 2.5 is an AI video generation model from Alibaba, released in September 2025. It builds on WAN 2.2's foundation and adds several capabilities that expand what the model can produce:

Native audio-visual generation: WAN 2.5 produces synchronized audio and video in a single pass, including vocals, ambient sound, music, and lip-synced dialogue. No separate audio recording or manual alignment required.
1080p output at 24fps: The model supports full HD video output, a step up from WAN 2.2's 720p ceiling.
Up to 10-second clips: WAN 2.5 doubles the maximum clip duration compared to earlier WAN models, giving more room for narrative and motion development.
Text-to-video and image-to-video: Both generation modes are supported, allowing you to start from a text description or animate a reference image.
Multilingual support: The model handles multilingual prompts and can generate lip-synced content in multiple languages, including strong support for Chinese.

Unlike WAN 2.1 and WAN 2.2, which shipped with publicly downloadable model weights for local deployment, WAN 2.5 is accessed via API and managed endpoints. Runpod's serverless infrastructure hosts the model, so you get the full capability without needing to manage the underlying hardware or environment.

WAN 2.5 vs WAN 2.2: what changed?

The most significant addition in WAN 2.5 is native audio. WAN 2.2 generates high-quality silent video. WAN 2.5 generates video with synchronized sound in the same generation pass. For content that needs voiceover, ambient audio, or background music baked in, this removes an entire post-production step.

Resolution also increased. WAN 2.2 tops out at 720p. WAN 2.5 supports 1080p output at 24fps, which brings it closer to professional content standards for short-form video.

Clip duration extended from roughly 5 seconds to up to 10 seconds, giving you more flexibility for storytelling and motion-heavy scenes.

The trade-off is that WAN 2.5 is not locally deployable in the same way as WAN 2.2. There are no publicly available model weights to load into a ComfyUI template or a GPU pod. Access is through managed endpoints, which is exactly what Runpod provides.

Using WAN 2.5 on Runpod

Runpod's WAN 2.5 image-to-video endpoint lets you animate a reference image into a video clip with synchronized audio, without any setup or infrastructure management.

To get started:

Go to runpod.io and sign in or create a free account.
Navigate to the WAN 2.5 endpoint in the Runpod Hub.
Upload a reference image in the image input field. This anchors the first frame of your clip.
Write a prompt describing the motion, mood, or audio you want. For example: A woman walks through a rain-soaked street at night, ambient city sounds, soft jazz in the background.
Click Run to generate your clip. The endpoint handles GPU allocation and inference automatically.

The image-to-video mode animates your reference image forward based on the prompt, producing a clip with motion and audio output together. The more specific your prompt about movement and sound, the more control you have over the result.

Use cases for WAN 2.5

The native audio capability makes WAN 2.5 particularly suited for content that previously required multiple tools and post-production effort:

Social media content: Generate short-form clips with background music and ambient sound ready for direct upload.
Product visualization: Animate product images with cinematic camera movement and sound design.
Concept prototyping: Quickly visualize motion and audio for a scene before committing to production.
Multilingual content: Generate lip-synced video in different languages from a reference image and a localized prompt.
Educational content: Create illustrated explainers with synchronized narration and ambient audio in a single generation pass.

Using WAN 2.5 as an API

Beyond the playground interface, Runpod's WAN 2.5 endpoint is accessible via REST API for integration into applications and pipelines. Navigate to the API tab on the endpoint page for request format, authentication, and parameter documentation. You pay per generation with no idle costs, making it practical to build WAN 2.5 into production workflows without over-provisioning compute.

Wrapping up

WAN 2.5 is the first WAN model to produce synchronized audio-visual output in a single generation pass, and Runpod's public endpoint gives you access to it immediately without hardware setup or container management.

Ready to try it? Open the WAN 2.5 endpoint on Runpod and generate your first clip.

FAQ

What is WAN 2.5?

WAN 2.5 is an AI video generation model from Alibaba released in September 2025. It supports image-to-video generation with native synchronized audio output, 1080p resolution at 24fps, and clips up to 10 seconds long.

What's new in WAN 2.5 compared to WAN 2.2?

WAN 2.5 adds native audio-visual generation, producing synchronized sound and video in a single pass. It also supports 1080p output (vs 720p in WAN 2.2) and clips up to 10 seconds (vs approximately 5 seconds). WAN 2.2 remains the most recent WAN model with publicly downloadable weights for local deployment.

Can I run WAN 2.5 locally?

WAN 2.5 model weights are not publicly available for local deployment in the same way as WAN 2.1 and WAN 2.2. Access is via managed API endpoints. Runpod's public endpoint gives you full WAN 2.5 image-to-video capability without needing to manage the underlying model or infrastructure.

What is WAN 2.5 image-to-video?

WAN 2.5 image-to-video animates a reference image into a short video clip based on a text prompt, with synchronized audio generated in the same pass. You upload an image, describe the motion and sound you want, and the model produces a 1080p clip with audio included.

How does WAN 2.5 compare to other AI video generators?

WAN 2.5's main differentiator is native audio-visual generation, which competing models including WAN 2.2 do not offer natively. It produces 1080p output with up to 10 seconds of content and multilingual audio support, positioning it alongside commercial tools like Kling and Google Veo for audio-synchronized short-form video.

Articles

View All

Top 9 Fal AI Alternatives for 2026: Cost-Effective, High-Performance GPU Cloud Platforms

Discover cost-effective alternatives to Fal AI that support fast deployment of generative models, inference APIs, and custom AI workflows using scalable GPU resources.

Top 8 Vast AI Alternatives for 2026

Explore trusted alternatives to Vast AI that combine powerful GPU compute, better uptime, and streamlined deployment workflows for AI practitioners.

Top 8 Azure Alternatives for 2026

Identify Azure alternatives purpose-built for AI, offering GPU-backed infrastructure with simple orchestration, lower latency, and significant cost savings.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

Get started