Hugging Face has released SmolLM3, a 3-billion-parameter multilingual language model that supports long-context processing up to 128,000 tokens and features dual-mode reasoning capabilities. Trained on 11.2 trillion tokens, SmolLM3 matches the performance of larger 4-billion-parameter models such as Google's Gemma3 and surpasses models like Llama-3.2-3B and Qwen2.5-3B. The model supports six languages and is designed for efficient inference, demonstrated by its fast performance on Apple’s M4 Max chip. Hugging Face has also open-sourced the training methodology, which utilizes public datasets and frameworks. Additionally, SmolLM3 has day-zero support in the mlx-lm library, which includes performance improvements and support for fine-tuning with low learning rates using LORA. The release is part of a broader update to mlx-lm, which now includes several new models from Baidu, Microsoft, TII, Google, OpenBMB, and Apple.
SmolLM3 LORA fine-tuning on MLX unlocked! Use a small LR! lr_schedule: name: cosine_decay warmup: 0 # 0 for no warmup warmup_init: 2e-5 # 0 if not specified arguments: [2e-5, 500, 1e-5] # passed to scheduler Learning after few tiers! These new small models https://t.co/S1NTN4fSsq https://t.co/VaKsTgq9Fo
With the research, a 14B‑parameter model holds >76% accuracy even on inputs that balloon to 3.5M tokens, all while costing only O(N) in compute.🤯 LLMs usually freeze or slow down as soon as a prompt spills past their context window. MemAgent turns that long prompt into https://t.co/4EKWbCQVE2
Huggingface 发布 SmolLM3 3B LLM 性能上超越 Llama-3.2-3B 和 Qwen2.5-3B,同时与更大的 4B 模型 Gemma3 持平 除了开源模型本身外,还开源了使用公共数据集和训练框架训练模型的方法 - 3B 模型在 11T Toekn 上训练 - 具备双模式推理的指令模型 -支持 6 种语言的多语言 - 最长支持 128K 上下文 https://t.co/lYCaGEcJjT