ByteDance's next-generation audio model goes far beyond text-to-speech. Seed Audio 1.0 orchestrates multi-character dialogue, emotional tone, background music, and environmental sound effects from a single prompt —producing up to two minutes of finished audio in one pass.
Powered by ByteDance Seed Speech & Seed Music

Seed Audio 1.0 (also known as Doubao-Seed-Audio 1.0 in ByteDance's Doubao ecosystem) is a multimodal audio generation model from the ByteDance Seed team. Unlike conventional text-to-speech systems that convert written words into a single voice track, Seed Audio 1.0 is designed to produce complete sound scenes —the spoken line plus the world around it. Public descriptions position it as an end-to-end creative system that can synchronously arrange character dialogue, emotional delivery, dialect or accent, background music, and foley-style environmental effects in one generation pass. The model accepts text prompts and optional reference audio inputs, supports zero-shot multimodal generation, and can output up to approximately two minutes of audio while preserving timbre consistency when extending existing clips. Built on ByteDance's Seed Speech research lineage (including Seed-TTS) and the Seed-Music generation stack, Seed Audio 1.0 represents a strategic shift from isolated voice synthesis toward unified audio direction for podcasts, radio drama, short-form video, games, and interactive media.
Traditional TTS turns text into one voice. Seed Audio 1.0 targets the entire soundscape: dialogue, music, ambience, and effects layered together as a finished mix. Creators describe a scene in natural language and receive production-ready audio instead of stitching multiple tools manually.
Combine descriptive prompts with up to three reference audio clips for voice style, rhythm, or mood anchoring. Reference tags like @Audio1, @Audio2, and @Audio3 let you point the model to specific uploaded samples. Optional image references can guide tone when audio references are not used.
Generate conversations with distinct speakers, each with its own timbre and emotional arc. Seed Audio 1.0 handles turn-taking, pacing, and expressive delivery —useful for audiobooks, scripted podcasts, training scenarios, and character-driven storytelling without recording multiple voice actors.
Background music that follows narrative mood, environmental ambience such as rain or crowd noise, and action-matched sound effects can be generated alongside speech. This eliminates separate music libraries, SFX packs, and manual mixing for many prototype and content workflows.
Seed Audio 1.0 compresses what used to require a voice booth, a composer, and a sound designer into a single AI generation step —while keeping creative control through prompts and references.

Core capabilities of Seed Audio 1.0, available directly on SeedDance.
Describe characters, setting, mood, and pacing in natural language. The model renders a complete audio scene rather than a flat narration track.
Upload up to three reference clips (WAV, MP3, PCM, OGG Opus; typically up to 30 seconds and 10 MB each) and reference them in prompts with @Audio1, @Audio2, @Audio3 for voice cloning, style transfer, or rhythmic guidance.
Supply a single reference image (JPEG, PNG, WebP) to influence mood when audio references are not provided. Image and audio references cannot be used in the same generation.
Assign distinct voices to multiple speakers within one generation, supporting scripted conversations, interviews, and narrative exchanges with emotional variation.
Generate underscore music and ambient sound design synchronized with dialogue —rain, footsteps, city noise, mechanical hum, and other foley-style layers.
Produce extended audio segments in a single run, suitable for podcast intros, ad spots, game cutscenes, and short dramatic scenes without chaining dozens of micro-clips.
Common questions about Seed Audio 1.0, how it differs from TTS, and how creators can use it.
Seed Audio 1.0 redefines what AI audio generation can do—from a single voice to a complete cinematic sound scene. Start creating on SeedDance today.