Hugging Face has integrated MLX LM, enabling users to run over 5,000 large language models (LLMs) locally on Apple Silicon devices at maximum speed without relying on cloud services. This integration allows Mac users to access these models directly through the Hugging Face Hub by simply clicking "Use this model." Additionally, users can now spin up OpenAI-compatible servers directly from the model page. Separately, WebLLM has been introduced as a browser-based solution that supports full GPU acceleration and OpenAI API compatibility, allowing LLMs to run directly in the browser with features such as streaming, JSON-mode, and function-calling.
BONUS: you can spin up OpenAI compatible servers directly from the model page too! 🤯 https://t.co/lN2SdPcGHM https://t.co/GfdY492XW3
Let's goo! Starting today you can access 5000+ LLMs powered by MLX directly from Hugging Face Hub! 🔥 All you need to do is click `Use this model` from any compatible model \o/ That's it, all you need to get blazingly fast intelligence right at your terminal! What would you https://t.co/wDDfbKx8tq
Two awesome new MLX + Hugging Face hub integrations. It's easier than ever to get started running models locally: https://t.co/NNp0mEkks8 https://t.co/hqvXvYsSGU