Jun 2, 05:10 PM

PlayAI Open-Sources PlayDiffusion AI Speech Editor on Hugging Face With Diffusion-LLM, 20-30 Token Generation, and Voice Cloning

PlayAI has open-sourced PlayDiffusion, an AI speech editing model available on Hugging Face under the Apache 2.0 license. This diffusion-based large language model (LLM) enables dynamic, fine-grained editing of audio speech without the need to regenerate entire audio segments. It preserves context at edit boundaries, maintains prosody and speaker consistency, and supports zero-shot voice cloning. The model generates audio efficiently, using only 20-30 tokens compared to 800-1000 tokens required by traditional autoregressive models, making it suitable for precise in-painting edits. Users can upload content, have the speech transcribed by the model, edit the text, and the audio is updated using the same voice. The release marks a notable advancement in AI-driven audio editing technology.

#PlayDiffusion #Hugging Face #Apache

Written with ChatGPT (GPT-4).

PlayAI Open-Sources PlayDiffusion AI Speech Editor on Hugging Face With Diffusion-LLM, 20-30 Token Generation, and Voice Cloning

Sources

Additional media

Similar Stories