Meta has introduced a new pretraining framework for large language models (LLMs) called CoCoMix, which enhances the model's ability to predict next tokens by integrating continuous concepts. This method combines discrete next token prediction with continuous concepts learned from a pretrained sparse autoencoder. According to the research, CoCoMix is more sample efficient and consistently outperforms traditional next token prediction methods, knowledge distillation, and the use of pause tokens. The framework aims to improve model interpretability by directly incorporating concepts during training, leading to better performance on various benchmarks.
Meta AI Introduces CoCoMix: A Pretraining Framework Integrating Token Prediction with Continuous Concepts CoCoMix integrates token prediction with the modeling of continuous concepts derived from hidden states of a pretrained model. The method employs a Sparse Autoencoder (SAE)… https://t.co/AER1HGuMBU
LLM Pretraining with Continuous Concepts Propose modification to new token prediction by mixing in continuous concepts from autoencoders during LLM training. More sample efficient & performs better on benchmarks, while making models more interpretable via direct concept… https://t.co/MBfnX2xEuC
🥥🌪️ Introducing CoCoMix - a LLM pretraining framework that predicts concepts and mixes them into its hidden state to improve next token prediction. 📈 More sample-efficient and outperforms next token prediction, knowledge distillation, and inserting pause tokens. 🔬Boosts… https://t.co/EpITll3VcH