Recent research from Tsinghua University and Google has examined the effects of reinforcement learning (RL) and its variants, including reinforcement learning with human feedback (RLHF) and reinforcement learning with value reranking (RLVR), on the reasoning capabilities of large language models (LLMs). The studies suggest that these RL techniques do not enhance the reasoning capacity of LLMs beyond what is already present in the base models. Instead, RL fine-tuning appears to reduce the creativity of these models. Google's research team described LLMs as "greedy agents," highlighting their tendency to select the first adequate option, overuse familiar answers, and struggle to convert knowledge into action effectively. This behavior persists even after RL fine-tuning, which aims to improve decision-making abilities. Some experts argue that further progress in reasoning may require adopting neuro-symbolic and memory-based methods rather than relying solely on RL approaches. Meanwhile, there is some criticism that companies like DeepMind should focus more on practical applications such as AlphaFold, which models biological structures, instead of investing in projects that upgrade AI capabilities in less impactful ways.
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Schmied et al.: https://t.co/XAiW0HzO8q #ArtificialIntelligence #DeepLearning #MachineLearning https://t.co/tlGYYsc69G
Google’s latest paper just threw shade at its own AI: “LLMs are Greedy Agents.” The research team reveals that even giant models chase the first decent option, over-use familiar answers, and freeze when it’s time to turn knowledge into action. The twist? A dose of RL https://t.co/r3iHq9477u
Deepmind should stick to meaningful products like AlphaFold instead of focusing on hobby projects like this one. We need to model mitochondria and peptides, not upgrade CGI from Richie Rich. https://t.co/HS5M8aLbHZ https://t.co/fLsaSJdtO2