
Apple has made significant improvements to its MLX technology, with the latest version 0.7.0 achieving 30.72 tokens per second, up from 21.19 tokens per second in version 0.2.0, as demonstrated in a performance test using the command 'python -m mlx_lm.generate --prompt "write the real story of Albert Einstein." --model mlx-community/Nous-Hermes-2-Mixtral-8x7B-DPO-4bit --max-tokens 100'. Additionally, the introduction of Pico MLX Server offers a graphical frontend that allows users to download and start multiple AI models locally on their Macs. This server, which is now live, can be integrated with any chat client that adheres to the OpenAI API standard, including PicoGPT. The community is encouraged to explore MLX for AI and Large Language Models, noting its application in Python for inferencing, including plain and chat completion, while also being mindful of some identified pitfalls.
Quick Start Guide to Large Language Models — Strategies and Best Practices for Using #LLMs: https://t.co/UEgcGVDN5X ————— #BigData #DataScience #AI #NLProc #NeuralNetworks #DeepLearning #MachineLearning #Algorithms https://t.co/f7YVMNU4kK
Pico MLX server is live! https://t.co/U88EPFsx5W
Run Apple MLX from your menu bar. Introducing Pico MLX Server, a graphical frontend to download and start multiple(!) AI models locally on your Mac. You can use it with any chat client you like (e.g. @PicoGPT) that uses the OpenAI API standard. https://t.co/zY7JdayzXV
