Recent advancements in large language models (LLMs) have focused on addressing the 'lost-in-the-middle' challenge, where LLMs struggle to fully utilize information in long contexts. Microsoft researchers, in collaboration with academics from Xi'an Jiaotong University and Peking University, have developed a new training method, dubbed IN2, which enhances the ability of LLMs to process and utilize extensive contextual data effectively. This method has been applied to the Mistral-7B model, demonstrating significant improvements in handling long textual inputs. Additionally, Google's research on 'many-shot' in-context learning with LLMs shows promising results in improving model performance across various tasks without the need for extensive human-generated data. The LLama-3 8B model, developed by Gradient_AI_ and sponsored by CrusoeEnergy, now extends its context length from 8k to over 160K, with less than 200M tokens needed for effective training.
From paper - "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" 🔥 Perplexity (PPL) of Llama-2-7b-chat using grouped attention on PG19 with different group size. The red dotted line indicates the PPL of the original Llama-2-7b-chat on 4k sequence. The purple… https://t.co/5xlnVLHwK2 https://t.co/u4m5f5YCB9
The self-extend paper is really becoming important - "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" 🔥 📌 Extend existing LLMs’ context window without any fine-tuning 📌 One feasible way to avoid the O.O.D. ( out-of-distribution) problems by caused unseen… https://t.co/XOvttXNEQN
Paper - "Let's Think Dot by Dot: Hidden Computation in Transformer Language Models" 📌 Investigates whether LLMs can leverage additional tokens purely for computational benefits, independent of the specific content or meaning of those tokens. This challenges the assumption that… https://t.co/oaFAKvhLWu