Sources
Shafiq JotyIntroducing 💎GemFilter💎, a simple, training-free and broadly applicable approach to address long-context bottleneck, accelerating #LLM inference and reducing GPU memory consumption with 1000x token reduction 📘 Paper: https://t.co/qGiU9VyypG 🧠 Code: https://t.co/tUtCshkXgy https://t.co/GjB3z0yeps
SumitMultilingual Evaluation of Long Context Retrieval and Reasoning Evaluates long-context LLMs across five languages, revealing significant performance drops with increased context length, multiple target sentences, and lower-resource languages. 📝https://t.co/oRQkt27hYt https://t.co/W2RfB01AjJ
Pavlo Molchanov🚀 @NeurIPSConf Spotlight! 🥳 Imagine fine-tuning an LLM with just a sparsity mask! In our latest work, we freeze the LLM and use 2:4 structured sparsity to learn binary masks for each linear layer. Thanks to NVIDIA Ampere’s 2:4 sparsity, we can achieve up to 2x compute… https://t.co/la6fxUpxOM









