Microsoft has released VibeVoice-1.5B, an open-source text-to-speech (TTS) model designed to generate long-form, multi-speaker conversational audio, making it particularly suitable for podcast automation. The model, which has 2.7 billion parameters, can produce up to 90 minutes of audio featuring up to four speakers with stable voices and natural conversational turns. VibeVoice-1.5B integrates Qwen2.5-1.5B with ultra-low-rate acoustic and semantic tokenizers and a diffusion head, enabling features such as cross-lingual translation, background music, spontaneous singing, and emotional expression. Developers have already created a working chat application using VibeVoice-1.5B, deployed on platforms like Hugging Face. The release aligns with broader trends in AI development, including low-code AI frameworks and hands-on training opportunities at upcoming conferences such as ODSC West 2025, which will cover AI engineering, agent operations, retrieval-augmented generation, and related technologies.
Want to build a coding agent in just 50 lines of code? Check out my live-coded demo repo — loops, tooling, editing—all in 50 lines of code. Repo + source ➡ https://t.co/rllVCjawJr #PowerShell #AI https://t.co/pyOlW86SXx
How @PiCoreTeam 's AI App Builder is changing the game even if you don't know how to code! 🕹️ https://t.co/xSwWS5srWZ
Watch @_mayurc use @zaara_ai as his ai cofounder to scope, code, and deploy a livestream clipping agent 👇 https://t.co/DHOTkjuTlC