Researchers from UCSD and Adobe have introduced Presto!, an AI approach to inference acceleration for score-based diffusion transformers, which reduces both sampling steps and cost per step. Concurrently, researchers from Shanghai Jiao Tong University, the University of Cambridge, and Geely Automobile Research Institute have developed F5-TTS, a fully non-autoregressive text-to-speech system based on flow matching with diffusion transformers. F5-TTS includes features such as zero-shot voice cloning, emotion-based synthesis, long-form synthesis, and speed control, and is trained on 100,000 hours of data. The system supports code-switching and is available under a CC-BY license, making it commercially permissive.
MLX: F5 TTS — MLX💡 % pip install f5-tts-mlx (Used Python 3.12) % python3 examples/generate.py --text "any text to speech" --duration 10 * when RuntimeError(f"Couldn't find appropriate backend...") occurs; % pip install soundfile Sound ON🔊 https://t.co/9yWuO1SPZC https://t.co/AafScz6iD4
MLX: F5 TTS — MLX💡 % pip install f5-tts-mlx (Used Python 3.12) % python3 examples/generate.py --text "any text to speech" --duration 10 * when RuntimeError(f"Couldn't find appropriate backend...") occurs; % pip install soundfile GitHub: https://t.co/8zIWAI98oI 🔊Sound ON https://t.co/gexxqDg2U8
MLX: Meta's MusicGen model in MLX💡 % pip install -r requirements.txt % python3 https://t.co/7xvP7Y0NLM (with the default prompt, 'happy rock', used python 3.11) % open 0.wav (to play) % python3 https://t.co/7xvP7Y0NLM --help (to see args) Let's try it. Fun to generate music. https://t.co/sqUJb9RNAF https://t.co/XwCnPsFkXJ