Recent developments in fine-tuning large language models (LLMs) focus on enhancing their performance in multi-turn conversations and long document summarization. A new blog post details the process of fine-tuning the Llama 3.1 8B model using conversation data, specifically the CoQA dataset. This method incorporates instruction loss masking, resulting in a twofold improvement in exact match scores for outputs. Additionally, a technical deep dive outlines how to effectively fine-tune LLMs to handle long context tasks, addressing challenges associated with scaling fine-tuning to longer contexts. A separate cookbook also highlights advancements in summarizing long documents, demonstrating that the Llama 3.1 8B model outperforms larger 70B models in summarization tasks involving documents of up to 32,000 tokens. The collective efforts of various researchers contribute to these findings, showcasing the ongoing evolution of LLM capabilities.
Improving Summarization for long documents using long-context fine-tuning! New Cookbook: Long Document Summarization + Evaluation. We fine-tune Llama 3.1 8B to improve summarization of documents 32k tokens long and show outperformance over 70B models! https://t.co/mqrqlg7cbv
New stuff we've been working on! LLMs suck at long context tasks - we use fine-tuning to solve this! We show code + experiments used to improve performance of LLMs on long document summarization and repetition. Combined work with @iamgrigorev and @m_ryabinin! https://t.co/5xlTp3FAj5
How do you get LLMs to use the full context length accurately? New Blogpost: Long Context Fine-Tuning! In this deep dive we cover: 🎛️ How to fine-tune with long context data. ⛔️ Problems with scaling fine-tuning to longer contexts. 🐍 Two full code examples! Read below! https://t.co/znrJ9yPuB2