OpenAI has launched Reinforcement Fine-Tuning (RFT) for its o4-mini reasoning model, enabling users to optimize AI model behavior through custom reward functions and task-specific grading. RFT incorporates chain-of-thought reasoning, making it particularly effective for complex and technical domains such as tax and accounting. This new feature aims to make reinforcement learning more flexible and accessible, allowing enterprises to fine-tune their own versions of the o4-mini model. Alongside RFT, OpenAI has also introduced supervised fine-tuning for GPT-4.1 nano, expanding customization options for AI models. The company has published an official guide detailing when to use its various AI models, including GPT-4o for everyday tasks, GPT-4.5 for creative and emotional tasks, o4-mini for fast reasoning, and o4-mini-high for advanced technical reasoning. However, recent tests have revealed that the GPT o3 and o4-mini models exhibit hallucination rates of 33% and 48%, respectively, indicating a high frequency of generating inaccurate or fabricated responses. Despite these challenges, the introduction of RFT is seen as a powerful tool for improving model performance in narrow tasks to expert levels.
OpenAI guide, when to use each model. GPT-4o – Everyday genius: brainstorming, emails, voice, files, images, more. GPT-4.5 – Creative + emotional: posts, stories, empathetic writing. o4-mini – Fast + STEM-y: quick code + data tasks. o4-mini-high – Slower but smarter: advanced https://t.co/KzIqb0kqe9
Reinforcement fine-tuning for o4-mini is a very powerful unlock to get models to be even more capable. The tricky part is understanding how to format your data and create a grader for training. Like early prompt design, you have to think a lot about how the model sees
OpenAI Launches Reinforcement Fine-Tuning on o4-mini for Custom Model Optimization #ReinforcementLearning #AIModels #OpenAI #CustomAI #MachineLearning https://t.co/rtVzRgsWcw https://t.co/qJhLRSJWQz