OpenAI has announced the introduction of a new Reinforcement Learning Fine-Tuning (RLFT) API, aimed at enhancing model customization for developers. This new feature allows users to fine-tune their own models using Open Instruct, the repository utilized for training the Tulu 3 model. The initiative expands the Reinforcement Learning with Verifiable Rewards (RLVR) framework to a broader range of domains, improving answer extraction capabilities. Additionally, the company has launched an expanded Reinforcement Fine-Tuning Research Program, which enables developers to tailor AI models for specific tasks by training them on datasets that can include dozens to thousands of high-quality tasks and evaluating the responses against reference answers. This move signifies OpenAI's strategic focus on specialization in AI model training.
OpenAI announced a new RL finetuning API. You can do this on your own models with Open Instruct -- the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a… https://t.co/VEBdH8AR28
OpenAI is finally playing the specialization game now. Depending on how they do it, could be a very wise (certainly non-AGI-flavored!) decision. Late alignment (to a task and domain) is all you need? https://t.co/XRbON890QQ
OpenAI announced an expanded Reinforcement Fine-Tuning Research Program, allowing developers to customize AI models for domain-specific tasks by training them on datasets ranging from dozens to thousands of high-quality tasks and evaluating responses against reference answers -… https://t.co/zhtKPAtjzE