A new paper from Google DeepMind and collaborators from HK University, UC Berkeley, and NYU compares the effectiveness of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) in the training of large language models (LLMs) and vision-language models (VLMs). The research indicates that SFT is primarily focused on memorization, akin to a student learning by rote from examples, while RL enhances generalization across various tasks, allowing models to adapt better to new challenges. The study highlights that while SFT stabilizes model outputs, RL is crucial for improving adaptability and performance in tasks requiring unique solutions, such as mathematics and logic. This research underscores the importance of both methods in developing advanced AI systems.
Aligning large language models (LLMs) with a given set of values will become a key visible priority this year for AI labs. Without alignment, systems act in ways that clash with societal, financial, or communal norms. Alignment ensures they stay within the boundaries we expect.… https://t.co/njTqrJ60XS
.@OpenAI Deep Research might be the beginning of the end for Wikipedia and I think that's fine. We talk a lot about the AI alignment problem, but aligning people is hard too. Wikipedia is a great example of this.
Reinforcement learning applied to LMs will push the SOTA performance of ML on the benchmarks that contain problems with only one correct solution (math, logic, coding), which are huge use cases on their own. However, this will result in us not having even more of a clue why they…