Check out our new work GemFilter: Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper: https://t.co/93GgFvuWrm Code: https://t.co/5gkfrpCalq 🚀 2.4x faster inference for long-context LLMs 💾 30% lower GPU memory consumption https://t.co/lmGKq4ueg8
Introducing 💎GemFilter💎, a simple, training-free and broadly applicable approach to address long-context bottleneck, accelerating LLM inference and reducing GPU memory consumption with 1000x token reduction 📘 Paper: https://t.co/qGiU9VyypG 🧠 Code: https://t.co/tUtCshkXgy https://t.co/ciArQjGhnl
🏷️:Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale 🔗:https://t.co/7gA5Z07tu8 https://t.co/yuOs9Sq8LE
Chinese research lab GAIR has introduced a new framework called GAIR-ProX, designed to refine pre-training data in large language models (LLMs). This framework, which treats data refinement as a programming task, allows models to perform fine-grained operations such as string normalization. The approach is efficient, enabling even small models with 0.3 billion parameters to refine data with human-like accuracy. ProX achieves significant speedup, up to 20 times faster training in general and math domains, and reduces memory footprint by 88.16%. Additionally, the framework supports collaborative edge AI fine-tuning through the Pluto and Charon (PAC) framework, achieving up to 8.64X speedup and employing parallel adapters to enhance performance.