Wan 2.6 — AI Video Generation with Multi-Shot Storytelling

Alibaba's latest visual generation model series. Generate cinematic videos up to 15 seconds with intelligent multi-shot narratives, audio-visual synchronization, and professional-grade results — from text, image, or video reference.

Available on SeedDance platform

Alibaba's Most Advanced Visual Generation Model Series

Wan 2.6 is the latest evolution of Alibaba's Wan series, unveiled in December 2025. It introduces the world's first reference-to-video model (Wan2.6-R2V), comprehensive upgrades to text-to-video and image-to-video capabilities, and intelligent multi-shot storytelling — enabling professional-grade content production for creators worldwide.

Reference-to-Video Generation

China's first R2V model. Upload a character reference video to preserve both appearance and voice, then use text prompts to generate entirely new scenes starring the same character — with consistent visuals and audio.

Intelligent Multi-Shot Storytelling

Generate multi-scene narratives with visual consistency throughout. Wan 2.6 understands scene continuity, character movement, and narrative flow to produce cinematic stories — not just isolated clips.

Audio-Visual Synchronization

Improved audio-visual sync and audio-to-video generation deliver more realistic scenes with richer, more immersive sound effects that naturally emerge from the visual content.

Extended 15-Second Duration

Generate videos up to 15 seconds long, giving creators more room to develop stories, transitions, and dramatic moments with enhanced instruction-following precision.

Why Choose Wan 2.6 for AI Video Creation

Wan 2.6 pushes the boundaries of what AI video generation can achieve, offering a complete suite of generation modes and cinematic-quality output for every type of creator.

Enhanced capabilities for generating realistic multi-person interactions with stable character identities. Consistent faces, voices, and body language across scenes make complex narrative sequences possible.

Complete Feature Set of Wan 2.6

A comprehensive suite of AI generation capabilities covering video, audio, and image creation for professional content production.

Text-to-Video (Wan2.6-T2V)

Generate cinematic video clips from natural language descriptions. Enhanced instruction-following and improved visual quality deliver professional results from even complex narrative prompts.

Image-to-Video (Wan2.6-I2V)

Animate any still image into a fluid, coherent video. Wan 2.6 preserves visual consistency with the source image while adding natural motion, camera movement, and synchronized audio.

Reference-to-Video (Wan2.6-R2V)

Upload a character reference video and generate entirely new scenes with the same character's appearance and voice. Supports people, animals, objects, and even multiple subjects together.

Up to 15-Second Video Output

Extended video duration of up to 15 seconds allows for richer storytelling, smoother transitions, and more developed narrative arcs compared to shorter-clip models.

Audio-Visual Sync Generation

Generates realistic sound effects, ambient audio, and dialogue that are naturally synchronized with the visual content — no separate audio processing required.

Multi-Shot Scene Composition

Compose videos with multiple distinct shots that maintain narrative and visual continuity. Ideal for short films, product demos, social content, and brand storytelling.

Bilingual Prompt Support

Wan 2.6 understands and accurately follows lengthy prompts in both Chinese and English, making it highly accessible for global creators with diverse content needs.

1080p HD Video Output

Generate high-definition videos at up to 1080p resolution with sharp detail, accurate colors, and cinematic-grade visual quality ready for professional use.

Frequently Asked Questions

Everything you need to know about Wan 2.6 and how to use it on SeedDance.









Start Creating with Wan 2.6 Today

Experience Alibaba's most advanced AI video generation model on SeedDance. Multi-shot storytelling, audio sync, reference-to-video, and cinematic quality — all in one platform.