
Apple and its collaborators are making significant strides in machine learning with the introduction of MLXServer and updates to MLX Swift. MLXServer, a new project announced by Mustafa (@maxaljadery) and Siddharth, offers an easy way for developers to work with Large Language Models (LLMs) locally, providing HTTP endpoints for text generation, chat, converting models, and more. It is designed to be easily set up via 'pip install mlxserver' and is optimized for Apple's metal, indicating a focus on performance. Concurrently, Apple's MLX Swift has been updated to include new features such as fused attention (from @argmaxinc) and fast quantized kernels, as well as enhanced flexibility and efficiency in model fine-tuning through LoRA support. These updates suggest Apple's ambition to position MLX as a solid competitor to TensorFlow and PyTorch, especially with its unified memory model that supports parallel operations and automatic dependency insertions. The MLX Swift update also includes a 4-bit Mistral 7B model that runs efficiently on M1 chips, highlighting Apple's commitment to optimizing machine learning operations on its hardware. This is further supported by the ability to perform lightweight fine-tuning on GPU or TPU, and improvements such as compilation, better data packing, and gradient checkpointing make fine-tuning a 4-bit Mistral 7B model on an 8GB M1 chip quite feasible.



(Q)LoRA in MLX LM is also faster and more memory efficient thanks to: - compilation - better data packing - gradient checkpointing pip install -U mlx-lm Fine-tuning 4-bit Mistral 7B on an 8GB (!) M1 is actually quite doable: https://t.co/WhFBDQLDHi
(Q)LoRA in MLX LM is a lot more flexible now: tune layers, rank, scale, and more. pip install -U mlx-lm Example config: https://t.co/0SzXyddDdb Thanks to Chimezie https://t.co/3EPLGxAys9 for the addition! https://t.co/XH0wVQgmiN
MLX Swift is updated with fused attention (from @argmaxinc) and fast quantized kernels. LLM example here: https://t.co/Qjo1DWwqfI A 4-bit Mistral 7B runs quite fast for thousands of tokens on my M1: https://t.co/zi3uxJxE3C