Mar 21, 12:29 AM

Apple MLX 0.7.0 Hits 30.72 Tokens/s, Launches Pico MLX Server for AI

Apple has made significant improvements to its MLX technology, with the latest version 0.7.0 achieving 30.72 tokens per second, up from 21.19 tokens per second in version 0.2.0, as demonstrated in a performance test using the command 'python -m mlx_lm.generate --prompt "write the real story of Albert Einstein." --model mlx-community/Nous-Hermes-2-Mixtral-8x7B-DPO-4bit --max-tokens 100'. Additionally, the introduction of Pico MLX Server offers a graphical frontend that allows users to download and start multiple AI models locally on their Macs. This server, which is now live, can be integrated with any chat client that adheres to the OpenAI API standard, including PicoGPT. The community is encouraged to explore MLX for AI and Large Language Models, noting its application in Python for inferencing, including plain and chat completion, while also being mindful of some identified pitfalls.

#Apple #MLX #Albert Einstein #Pico MLX Server #Macs #OpenAI API #PicoGPT #AI #Large Language Models #Python

Written with ChatGPT (GPT-4).

Sources

Additional media

Image #1 for story apple-mlx-0-7-0-hits-30-72-tokens-s-launches-pico-mlx-server

Apple MLX 0.7.0 Hits 30.72 Tokens/s, Launches Pico MLX Server for AI

Sources

Additional media

Similar Stories