Recent discussions among researchers highlight the advancements in deep learning, particularly the scaling of reinforcement learning (RL) methods. A researcher from OpenAI noted that the o3 model represents an evolution of the o1 paradigm, which effectively utilized RL and other techniques to enhance computational efficiency. This scaling approach has led to significant improvements in performance without the need for groundbreaking new breakthroughs. The conversation also touched on the computational costs associated with these models, with one researcher suggesting that the o1 paradigm made the Orion model more viable. The ongoing debates among experts suggest a consensus on the effectiveness of scaling in deep learning applications.
o3 is just a scaling up of o1's RL approach, according to Nat from OpenAI. Nat also says o1 is "just" an LLM, & roon confirms it's plain autoregression. This is also how open source QwQ & R1 work. Can we stop debating inference time MCTS now? And appreciate how cool this is? https://t.co/k1JcYpWW3p https://t.co/NPJ0lZ6BWP
The o1 paradigm made Orion actually worth its computational cost for OpenAI. o3 “feels” like the o1 paradigm (RL + CoT + test-time compute) applied to Orion as the base model. Still debatable if tree search is involved. Francois thinks yes. But can we really know without…
This statement by the OpenAI @__nmca__ researcher is the most shocking. How did o3 become good? They just kept scaling RL! No new big breakthrough, just more of the same. That's crazy! In a way, scale is all you need! (a little exaggerated for sure). How can we say that we have… https://t.co/POqL6sJfaD