Meta AI has introduced a significant advancement in coding large language models (LLMs) with the development of Reinforcement Learning with Execution Feedback (RLEF). This technique integrates execution feedback at training time to enhance performance at inference time. The approach has been successfully applied to fine-tune Llama 3.1 models, with the 8B model surpassing GPT-4 on DeepMind’s CodeContests and the 70B model achieving state-of-the-art results. Additionally, the method has been validated through extensive evaluations, including on SWE-bench, demonstrating its effectiveness in improving LLMs for code generation tasks. The evaluations were conducted using a cloud-based infrastructure that speeds up evaluations by 30x.
How to train long-context LMs? (and beat Llama-3.1 🏆) Many takeaways from our new paper! - Focus on diverse & reliable evaluations (not just perplexity) - Find good sources of long data and high-quality short data - ... A 🧵 on how we produced ProLong, a SoTA 8B 512K model https://t.co/xsRDCQpNUE
Meta with another solid looking RLHF paper: RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning This is how big labs improve math etc. Funny because I wrote about "RLCF" in April of 2023. We're slowly plodding along in open RLHF. https://t.co/mA0xVf8laa
I'm really excited both about our new evaluation framework for fast parallelized evaluation of LLMs as agents, and our new results evaluating SOTA LLMs on SWE-bench. Check this post out for both of them. https://t.co/tcTBRzGw9P