OpenAI's latest model, o1-preview, has demonstrated superior performance in clinical reasoning tasks, achieving an accuracy rate of approximately 80% compared to 30% for physicians on a set of 143 challenging diagnoses from the New England Journal of Medicine. This model significantly enhances differential diagnosis generation and diagnostic management reasoning, outperforming traditional prompting techniques and even surpassing the capabilities of previous models like GPT-4 with Medprompt. In addition to this breakthrough, OpenAI has announced the launch of its advanced reasoning model, o1, which is designed to be faster and more powerful than its predecessor, o1-preview. The company has also made API access to the o1 model available to third-party developers. Furthermore, OpenAI is preparing to unveil a new reasoning model, referred to as o3, which is expected to enhance reasoning capabilities even further. The anticipation surrounding the o3 model has sparked discussions about its potential impact on AI in healthcare and other fields.
o3 is probably the 'merge' of the o line and the GPT line and is the frontier model they started training earlier this year. Wild guess.
Grok was made to guess how o3 will score in benchmarks: """ Given the speculative nature of "o3" and the pattern of improvement seen in previous OpenAI models, here's a wild guess on potential benchmark scores: MMLU (Massive Multitask Language Understanding): If "o1" has shown…
o3, my gut says, is several months away. It is probably what GPT-5 was going to be.