
Meta's Fundamental AI Research (FAIR) group has unveiled four new open-source AI models and techniques. These releases include advancements for multimodal language tasks, text-to-music and audio, and detecting synthetic speech. Meanwhile, the open-source AI community has also seen significant developments with the introduction of AuraFlow v0.1. This model, part of the Aura series, is a 1024x1024 text-to-image model featuring 6.8 billion parameters, DiT Encoder blocks, a GenEval score of 0.703 with prompt enhancement, and is licensed under Apache-2.0. It is touted for its SoTA level performance.







Open-source AI strikes back with AuraFlow v0.1🤩 - 6.8B parameters model - Largest open-sourced flow-based T2I model - DiT Encoder blocks - Better instruction following - GenEval score 0.703 with prompt enhancement Congratulations - Simo, Batuhan, & Fal AI team on great release https://t.co/17gSNuXkuA
🖼 Ideogram vs AuraFlow v0.1 (Model License: Apache-2.0) Thanks to @cloneofsimo ❤ and @Fal Team ❤ 🌐page: https://t.co/Bm5LFuQ30T 🧬code: https://t.co/9w9Hbvi9vq 🔋demo: https://t.co/2F6Na9ZLxN 📦model: https://t.co/fx67Yk8Fj8 https://t.co/r3aQitASgX https://t.co/5OWNgraNyO
FunAudioLLM: A Multi-Model Framework for Natural, Multilingual, and Emotionally Expressive Voice Interactions Researchers from Alibaba Group introduced FunAudioLLM, comprising two core models: SenseVoice and CosyVoice. SenseVoice excels in multilingual speech recognition,… https://t.co/bnOePN9qfZ