Apr 23, 05:46 PM

Google and Tsinghua Study Finds RL Reduces Creativity in LLMs, Describes Models as Greedy Agents

Recent research from Tsinghua University and Google has examined the effects of reinforcement learning (RL) and its variants, including reinforcement learning with human feedback (RLHF) and reinforcement learning with value reranking (RLVR), on the reasoning capabilities of large language models (LLMs). The studies suggest that these RL techniques do not enhance the reasoning capacity of LLMs beyond what is already present in the base models. Instead, RL fine-tuning appears to reduce the creativity of these models. Google's research team described LLMs as "greedy agents," highlighting their tendency to select the first adequate option, overuse familiar answers, and struggle to convert knowledge into action effectively. This behavior persists even after RL fine-tuning, which aims to improve decision-making abilities. Some experts argue that further progress in reasoning may require adopting neuro-symbolic and memory-based methods rather than relying solely on RL approaches. Meanwhile, there is some criticism that companies like DeepMind should focus more on practical applications such as AlphaFold, which models biological structures, instead of investing in projects that upgrade AI capabilities in less impactful ways.

#Tsinghua University #Google #DeepMind #AlphaFold

Written with ChatGPT (GPT-4).