🏷️:Activation-Informed Merging of Large Language Models 🔗:https://t.co/OxtfCbW41F https://t.co/W4UdWY573A
🏷️:On Teacher Hacking in Language Model Distillation 🔗:https://t.co/bqybXFt6Fv https://t.co/mCiDjLc8Bh
🏷️:SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model 🔗:https://t.co/QfVCxia5Gd https://t.co/N2uqk5ueYE
Hugging Face has released the SmolLM2 paper, detailing the development of a state-of-the-art small language model (LM) that boasts 1.7 billion parameters and was trained on 11 trillion tokens. The new model demonstrates superior performance compared to other recent small LMs, including Qwen2.5 (1.5 billion parameters) and Llama3.2 (1 billion parameters). The paper emphasizes the importance of data in training small models, highlighting how the right data mixtures and training strategies can enhance model capabilities. Key insights include the creation of custom datasets and adaptive training methods that contribute to SmolLM2's effectiveness.