WAN 2.2 is an open-source AI video generation model developed by Alibaba that supports text-to-video and image-to-video generation on consumer GPUs. It's the second generation of the WAN series, and one of the most capable open video models available today.
This guide walks you through running WAN 2.2 on Runpod's GPU cloud using ComfyUI. No local environment setup, no driver headaches, no expensive hardware required. By the end, you'll have generated your first AI video from a text prompt or image.
What is WAN 2.2?
WAN 2.2 is Alibaba's second-generation open-source AI video model, building directly on WAN 2.1's foundation. Where WAN 2.1 proved that open-source video generation was viable, WAN 2.2 makes it practical, with sharper temporal consistency, better motion quality, and improved image-to-video fidelity.
The model uses a Mixture-of-Experts (MoE) architecture with two 14B parameter experts that activate selectively based on the signal-to-noise ratio during generation, delivering 14B-level output without running the full compute of both experts simultaneously.
The model comes in several variants:
- T2V (text-to-video): Generate video clips from a text description
- I2V (image-to-video): Animate a starting image into a moving scene
- I2V-A14B: The 14B parameter image-to-video variant, optimized for high-fidelity output
- TI2V-5B: A 5B hybrid model supporting both text and image input at 720p, runnable on a single RTX 4090
Like its predecessor, WAN 2.2 is compatible with ComfyUI workflows and runs on consumer-grade hardware. The lightweight TI2V-5B variant requires around 8 GB of VRAM. The 14B models deliver substantially better results but need 24 GB or more.
What is ComfyUI?
ComfyUI is an open-source, node-based interface for building generative AI pipelines. Instead of writing code, you connect visual nodes, each representing a model, process, or output, into a workflow graph. It's the tool of choice for power users who want granular control over their generation pipeline without giving up accessibility.
For running WAN 2.2, ComfyUI handles loading the model checkpoints, conditioning on your text or image input, running the diffusion sampler, and stitching frames into a video, all wired up in a pre-built workflow you don't need to configure from scratch.
Why run WAN 2.2 on Runpod?
Running a 14B video diffusion model locally means either owning a high-end GPU or waiting a very long time per generation. Runpod removes that constraint.
With Runpod's GPU cloud, you select the GPU you need, deploy a pre-configured environment in seconds, and pay only for the time you use. There are no idle costs, no hardware to maintain, and no ceiling on the GPU tier you can access, from RTX 3080s for quick tests to A100s for high-res production runs.
Runpod also offers public serverless endpoints for WAN 2.2 models you can use immediately without any setup:
- WAN 2.2 I2V 720p - Image-to-video generation at 720p resolution. Upload a reference image, add a prompt, and get a high-quality animated clip.
- WAN 2.2 T2V 720p LoRA - Text-to-video at 720p with LoRA support for fine-tuned style control. Useful if you want to steer generation toward a specific visual style or trained subject.
If you want full control over your workflow settings, step count, resolution, and frame count, the ComfyUI template approach below gives you that. The endpoints are the fastest way to generate a clip right now.
GPU requirements for WAN 2.2
Choose your GPU based on which model variant you want to run and your target resolution:
- 8 GB VRAM (RTX 3070, RTX 3060 12GB): TI2V-5B only, 480p
- 16-24 GB VRAM (RTX 3090, RTX 4080): 14B models at 480p
- 24 GB VRAM (RTX 4090, A5000): 14B models at 480p, TI2V-5B at 720p
- 40-80 GB VRAM (A100, H100): 14B models at 720p, batched generation, production workloads
Templates for running WAN 2.2 on Runpod
These community templates are purpose-built for WAN workloads and get you into ComfyUI without any manual setup.
One Click ComfyUI - Wan 2.1 / Wan 2.2 (CUDA 12.8) by HearmemanAI
The most comprehensive WAN template available. Supports text-to-video, image-to-video, video-to-video, Wan VACE, and Wan Fun workflows with pre-configured ComfyUI setups. Models are downloaded on first boot based on environment variables you set before deploying. Choose between 480p and 720p native model sets, or add VACE and Wan Fun models separately. Also supports CivitAI token integration for auto-downloading LoRAs and checkpoints. Note: requires CUDA 12.8, so select the appropriate GPU filter before deploying.
ComfyUI (Official Runpod template)
Runpod's first-party ComfyUI template. Ships with the latest ComfyUI build and CUDA 12.8, pre-installed with ComfyUI-Manager, KJNodes, Civicomfy, and ComfyUI-RunpodDirect. On first boot, ComfyUI copies to your workspace automatically. Wait for the "All startup tasks have been completed" log message before opening the interface. You'll add WAN 2.2 model files manually, but the environment is clean and stable.
WAN 2.2 AI Influencer by aiorbust
A specialized template targeting character and portrait animation with WAN 2.2. Well-suited for animating consistent characters or generating influencer-style video content.
Step 1: Sign up for Runpod
Go to runpod.io and create a free account. You can sign up with email or via GitHub or Google. Once you're in, add credits or a payment method so you can deploy a GPU pod.
Step 2: Deploy a template
Head to the Pods section in the left-hand nav. This is where you select your GPU and configure your environment.
Select a GPU. For most WAN 2.2 use cases, the RTX 4090 (24 GB VRAM) is the recommended starting point. It handles the 14B model at 480p comfortably and the TI2V-5B at 720p, with generation times that stay practical for iteration. If you're targeting 720p with the 14B model or running batch jobs, step up to an A100.
Choose your template. Once you've selected a GPU, scroll down to the Template section and click Change Template. Search for or select one of the WAN 2.2 templates listed above. The HearmemanAI template is the best starting point for most users, covering the most workflow types and handling model downloads automatically.
Configure environment variables. Before deploying the HearmemanAI template, set your environment variables to specify which models to download. At minimum, set download_480p_native_models to get the 1.3B T2V and 14B T2V/I2V 480p models. For 720p output, set download_720p_native_models instead.
Set your disk size to at least 20 GB, then click Deploy. Initial setup takes 5-20 minutes depending on which models you've selected.
Step 3: Open ComfyUI
Once the pod shows a running state, click Connect, then select Open port 8188. This opens the ComfyUI interface in a new browser tab.
If you're using the official Runpod ComfyUI template, wait until you see [ComfyUI-Manager] All startup tasks have been completed. in the pod logs before opening port 8188.
You'll see a node graph canvas with the pre-built WAN 2.2 workflow already connected: model loader, text or image conditioning, sampler, VAE decoder, and video output. You don't need to wire anything yourself.
Step 4: Generate your first video
Enter a prompt. Find the text conditioning node, usually labeled "CLIP Text Encode" or similar. Click it and type a description of the scene you want. Keep it specific and visual. For example:
A slow aerial shot over a misty mountain range at dawn, soft golden light breaking through the clouds
Check your settings. A good starting point:
- Frames: 16-24 (roughly 1-1.5 seconds at 16 FPS)
- Resolution: 480x856 (16:9, 480p)
- Steps: 20-30 (more steps means slower but higher quality)
Queue the prompt. Click Queue Prompt. ComfyUI will begin executing the graph. On a 4090, a 24-frame 480p generation takes roughly 1-2 minutes. For a full 5-second clip (81 frames) at 480p, expect around 4 minutes without quantization. You can monitor GPU utilization from the Runpod dashboard while you wait.
Retrieve your output. When complete, the output video will appear in the ComfyUI interface as a preview, or be saved to /workspace/ComfyUI/output. Download files via the FileBrowser on port 8080 (login: admin / adminadmin12 on the official Runpod template), or through the Runpod file manager.
Step 5: Try image-to-video (I2V)
WAN 2.2's image-to-video capability is one of its strongest features. Switch to the I2V workflow tab in ComfyUI and upload a reference image via the image loader node. This image anchors the first frame, and the model animates it forward based on your prompt.
A static photo of a city at night becomes a rain-slicked street scene. A character illustration becomes a character in motion. The I2V-A14B variant produces noticeably better results for character animation and fine detail. Use it if your GPU can support it. You can also try the WAN 2.2 I2V 720p endpoint on Runpod to run image-to-video directly without a ComfyUI setup.
Using LoRAs with WAN 2.2
WAN 2.2 supports LoRA fine-tuned weights, which let you steer video generation toward specific visual styles, characters, or trained subjects. The HearmemanAI template supports CivitAI LoRA downloads via environment variable before deployment. Once loaded, LoRA weights appear as nodes in the ComfyUI workflow and can be applied at varying strengths.
Runpod's WAN 2.2 T2V 720p LoRA endpoint also lets you test LoRA-guided text-to-video generation at 720p directly, without building a local workflow.
Pro tips
Reduce frames first, then scale up. If you're hitting memory limits or long generation times, drop to 16 frames before reducing resolution. Frames multiply memory requirements faster than resolution does.
Use frame interpolation to smooth output. Many WAN 2.2 templates include a frame interpolation workflow. Running your output through it doubles the frame rate, making even short clips feel substantially smoother without regenerating.
Save your workflows. When you dial in settings you like, export the workflow JSON from ComfyUI's menu. Re-importing it in a future session restores your exact configuration.
Attach a network volume if you're working regularly. By default, pod storage is ephemeral. A Runpod network volume keeps your outputs and custom workflows persistent across sessions.
Stop your pod when you're done. Runpod charges by the minute. Stop your pod when finished to avoid unexpected charges.
Wrapping up
WAN 2.2 is one of the most capable open-source AI video generation models available today, and Runpod is the fastest path to running it, whether through a serverless endpoint for quick generation or a full ComfyUI template for production workflows. The templates handle everything from model loading to workflow configuration, so you're generating video within minutes of deploying a pod.
Ready to try it? Start with the WAN 2.2 I2V endpoint for an instant result, or deploy a ComfyUI template for full workflow control.
FAQ
What's the difference between WAN 2.1 and WAN 2.2?
WAN 2.1 was Alibaba's initial open-source video generation release, demonstrating that high-quality text-to-video was achievable with consumer GPUs. WAN 2.2 builds on that with improved motion quality, better temporal consistency, and a more capable image-to-video pipeline. It also introduces a Mixture-of-Experts architecture that reduces computational load while maintaining output quality. Switching from WAN 2.1 workflows is straightforward, the same ComfyUI structure applies and you're just loading different model checkpoints.
What GPU do I need to run WAN 2.2 on Runpod?
The TI2V-5B model runs on 8 GB VRAM. For the 14B models at 480p, you need 16-24 GB. For 720p output with the 14B model, 40-80 GB is recommended. The RTX 4090 is the recommended starting point for most users, handling the 14B model at 480p and TI2V-5B at 720p. On Runpod you can rent exactly the GPU you need, when you need it, with no long-term commitment.
Can I run WAN 2.2 locally instead?
If you have a 24 GB GPU, yes. But running locally means managing your own Python environment, ComfyUI installation, model downloads, and dependency conflicts. The Runpod templates handle all of that automatically, making the cloud the faster starting point for most users.
What is WAN 2.2 animate?
WAN 2.2 Animate (Wan2.2-Animate-14B) is a model variant released by Alibaba in September 2025, designed specifically for character animation and replacement. It replicates holistic movement and facial expressions from a reference video or image, making it well-suited for animating portraits, character art, and illustrations with realistic motion.
What is WAN 2.2 LoRA?
LoRA (Low-Rank Adaptation) weights for WAN 2.2 are fine-tuned model additions that steer video generation toward specific styles, characters, or visual subjects without retraining the full model. You load them alongside the base WAN 2.2 checkpoint in ComfyUI, or use Runpod's WAN 2.2 T2V 720p LoRA endpoint to test LoRA-guided generation directly.
Can I use WAN 2.2 as an API instead of a UI?
Yes. Runpod's serverless GPU endpoints let you deploy any model as a REST API that autoscales in the cloud. Once you've built and tested a workflow in ComfyUI, you can containerize that setup and deploy it to Runpod Serverless. Your application sends generation requests via API, and Runpod handles scaling and billing per-request with no idle costs.


.webp)