Runpod × OpenAI: Parameter Golf challenge is live
You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500
You've unlocked a referral bonus!
Claim Your Bonus
Claim Bonus
Emmett Fear
Emmett Fear

How to Use WAN 2.6 on Runpod

WAN 2.6 is an AI video and image generation model from Alibaba, released in December 2025. Its headline capability is reference-to-video generation: you upload a character reference video with appearance and voice, and the model generates new scenes starring that same character. For creators, that means putting yourself, a client, or any consistent subject into AI-generated scenes with matching visuals and audio.

Runpod offers three public endpoints for WAN 2.6, covering text-to-video, image-to-video, and standalone text-to-image generation. This guide covers what WAN 2.6 is, what's new, and how to use all three endpoints on Runpod.

What is WAN 2.6?

WAN 2.6 is Alibaba's fifth-generation video generation model in the WAN series, announced December 16, 2025. It introduces a new reference-to-video model alongside comprehensive upgrades to the existing text-to-video, image-to-video, and image generation models.

Key capabilities in WAN 2.6:

  • Reference-to-video (R2V): Upload a character reference video containing both appearance and voice. WAN 2.6 generates new scenes starring that character with consistent visuals and audio. Supports single subjects, multiple subjects, people, animals, and objects.
  • Multi-shot storytelling: The model supports multi-shot video sequences with visual consistency across scenes, consistent characters, environments, and lighting for richer narratives.
  • Up to 15-second clips: Extended duration supports more complex storytelling compared to earlier WAN models.
  • Text-to-image generation: WAN 2.6 includes a dedicated text-to-image model that generates high-quality images from natural-language prompts with strong prompt adherence and clean composition.
  • Improved temporal consistency: Sharper detail retention on text, faces, and fine elements frame-to-frame with less flickering compared to WAN 2.5.
  • Audio-visual synchronization: Enhanced audio-to-video generation and sound effect alignment for more realistic scenes.

Like WAN 2.5, WAN 2.6 is not available as publicly downloadable model weights. Access is through managed endpoints, which Runpod provides across three separate capabilities.

WAN 2.6 vs WAN 2.5: what changed?

WAN 2.5 introduced native audio and 1080p output. WAN 2.6 focuses on identity consistency and storytelling depth. The reference-to-video capability is entirely new, allowing a consistent character to appear across multiple generated scenes, something WAN 2.5 could not do reliably.

Video duration extended from 10 seconds to up to 15 seconds, giving more room for multi-shot sequences. Temporal consistency also improved, with noticeably sharper detail retention in faces, text on signs, and fine textures frame-to-frame.

The addition of a dedicated text-to-image model is also new to the WAN 2.6 family, expanding it beyond video-only output for the first time.

The three WAN 2.6 endpoints on Runpod

Runpod provides public serverless endpoints for all three WAN 2.6 capabilities. Each operates independently, so you can use the one that fits your workflow.

Text-to-video

The WAN 2.6 T2V endpoint generates video clips directly from a text prompt. Describe the scene, characters, camera movement, and mood, and the model produces a clip up to 15 seconds long with audio-visual synchronization.

Good for: generating video from scratch when you don't have a reference image, prototyping scenes, or producing short-form content from text alone.

Image-to-video

The WAN 2.6 I2V endpoint animates a reference image into a video clip. Upload an image to anchor the first frame, add a prompt to describe the motion and audio, and the model generates a clip from it. The reference image provides visual grounding, making this mode particularly effective for product shots, character art, and scenes where you need a specific starting composition.

Good for: animating existing images, product visualization, bringing concept art to life, and any scenario where the opening frame matters.

Text-to-image

The WAN 2.6 T2I endpoint generates high-quality images from natural-language prompts. It's purpose-built for strong prompt adherence and clean composition, making it useful for generating reference images to feed into the I2V endpoint, or as a standalone image generation tool. Generation costs $0.03 per image.

Good for: generating reference images for I2V workflows, concept visualization, marketing assets, and any task requiring high-quality image output from a text description.

Using WAN 2.6 on Runpod

All three endpoints follow the same basic flow:

  1. Go to runpod.io and sign in or create a free account.
  2. Navigate to the relevant endpoint in the Runpod Hub using the links above.
  3. For T2V and T2I: enter your text prompt in the input field and click Run.
  4. For I2V: upload a reference image, add a text prompt describing the motion and audio, and click Run.
  5. The endpoint handles GPU allocation and inference automatically. Results appear in the preview panel when complete.

Each endpoint also exposes a REST API for integration into applications and production pipelines. Navigate to the API tab on any endpoint page for request format, parameters, and authentication details.

Use cases for WAN 2.6

  • Creator content: Put yourself or a client character into AI-generated scenes using the reference-to-video capability, with consistent appearance and voice across multiple shots.
  • Short-form video production: Generate multi-shot sequences with consistent environments and characters for social media content.
  • Product marketing: Animate product images with the I2V endpoint, or generate clean product visuals with T2I before animating them.
  • Storyboarding: Use T2I to generate scene reference images, then animate them with I2V to build out a visual narrative quickly.
  • Brand content: The improved text rendering on signs and labels makes WAN 2.6 more reliable for branded content than earlier WAN versions.

Wrapping up

WAN 2.6 expands the WAN family into reference-to-video generation, multi-shot storytelling, and dedicated text-to-image output. Runpod's three public endpoints give you access to all three capabilities today without any infrastructure setup.

Start with the endpoint that fits your workflow: T2V for video from text, I2V for animating an image, or T2I for generating a high-quality image from a prompt.

FAQ

What is WAN 2.6?

WAN 2.6 is Alibaba's AI video and image generation model released in December 2025. It introduces reference-to-video generation, multi-shot storytelling with consistent characters across scenes, clips up to 15 seconds, and a dedicated text-to-image model.

What is WAN 2.6 reference-to-video?

Reference-to-video (R2V) is WAN 2.6's flagship feature. You upload a character reference video containing both appearance and voice, and the model generates new scenes starring that character with consistent visuals and audio. It supports single subjects, multiple subjects, people, animals, and objects.

What's the difference between WAN 2.6 T2V, I2V, and T2I?

T2V (text-to-video) generates video clips from a text prompt. I2V (image-to-video) animates a reference image into a video clip based on a prompt. T2I (text-to-image) generates standalone high-quality images from a text prompt. All three are available as separate Runpod endpoints.

Can I run WAN 2.6 locally?

WAN 2.6 model weights are not publicly available for local deployment. Access is through managed API endpoints. Runpod provides three public serverless endpoints covering T2V, I2V, and T2I, accessible immediately without infrastructure setup.

What's new in WAN 2.6 compared to WAN 2.5?

WAN 2.6 adds reference-to-video generation for consistent character identity across scenes, extends maximum clip duration from 10 to 15 seconds, adds a dedicated text-to-image model, and improves temporal consistency with sharper detail retention in faces, text, and fine textures compared to WAN 2.5.

How much does WAN 2.6 cost on Runpod?

Runpod's WAN 2.6 T2I endpoint costs $0.03 per image generation. Video endpoint pricing depends on generation length and settings. All endpoints use per-request billing with no idle costs.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.