PlayAI has open-sourced PlayDiffusion, an AI speech editing model available on Hugging Face under the Apache 2.0 license. This diffusion-based large language model (LLM) enables dynamic, fine-grained editing of audio speech without the need to regenerate entire audio segments. It preserves context at edit boundaries, maintains prosody and speaker consistency, and supports zero-shot voice cloning. The model generates audio efficiently, using only 20-30 tokens compared to 800-1000 tokens required by traditional autoregressive models, making it suitable for precise in-painting edits. Users can upload content, have the speech transcribed by the model, edit the text, and the audio is updated using the same voice. The release marks a notable advancement in AI-driven audio editing technology.
big new release today by @PlayAIOfficial: an open-source AI speech editor model using a diffusion-LLM architecture! 🔥🔥🔥 https://t.co/weoiwPoMVF
big new release today by @PlayAI: an open-source AI speech editor using a diffusion-LLM! 🔥🔥🔥 https://t.co/weoiwPof67
BIG DAY FOR @PlayAIOfficial 🚀 We’ve just open-sourced the first diffusion-LLM for speech! ⚡️ Generates audio in just 20-30 tokens (vs. 800-1000 for autoregressive) 🖌️ Perfect for super-fine in-painting edits & 🎙️ zero-shot voice cloning. Give it a try ⬇️ https://t.co/2nHCaz7U8E https://t.co/cob0lwUPkV