Oct 10, 09:21 PM

LLMs as In-Context Reinforcement Learners with Self-Evaluation and Adaptive Compute: Microsoft Research

Recent research has highlighted significant advancements in the field of Large Language Models (LLMs). A study by Monea et al. from Cornell University, EPFL, and Harvard University, published in 2024, demonstrates that LLMs can act as in-context reinforcement learners. However, these models face challenges in exploration, which researchers aim to address through supervised fine-tuning on full exploration trajectories. Additionally, adaptive inference-time compute LLMs have shown the ability to predict their performance mid-generation, outperforming traditional reward modeling approaches through self-evaluation. This adaptive sampling method allocates computational resources based on query difficulty, potentially saving computation by early pruning of less promising paths. Microsoft Research has also been involved in steering LLMs between code execution and textual reasoning.

#Large Language Models #Monea #Cornell University #EPFL #Harvard University #Microsoft Research

Written with ChatGPT (GPT-4o).