DeepMind has introduced a new approach to language model training using Scalable Inverse Reinforcement Learning (IRL). This method presents an effective alternative to traditional supervised Maximum Likelihood Estimation (MLE) in the fine-tuning pipeline, resulting in more robust reward functions and increased performance and diversity of model generations. The foundation of this approach lies in imitation learning, which is considered a reinforcement learning problem. Compared to supervised learning, IRL better exploits sequential structure, online data, and further extracts rewards. The insights were shared in a recent paper by GoogleDeepMind.
Imitation is the foundation of #LLM training. And it is a #ReinforcementLearning problem! Compared to supervised learning, RL -here inverse RL- better exploits sequential structure, online data and further extracts rewards. Beyond thrilled for our @GoogleDeepMind paper! A… https://t.co/qAzwSxxA35
Hybrid Imitation-Learning Motion Planner for Urban Driving. https://t.co/GOV9OUjLGe
🔊 ... " you can and should use imitation learning [..] to get started." https://t.co/NWNWWV5nHF