Oct 1, 03:15 AM

OpenAI's o1-preview Model Shows 83.3% Success Rate Across Multiple Tasks

OpenAI's o1-preview model has been evaluated extensively in a recent study, revealing its strong performance across various tasks. The model demonstrates an 83.3% success rate in solving complex competitive programming problems. It also excels in generating coherent and accurate radiology reports, performing high school-level mathematical reasoning tasks, and handling chip design tasks. Despite its high performance, the o1-preview model is noted to be slow and expensive. The evaluation, detailed in the DevQualityEval v0.6 and a 280-page PDF, provides insights into the opportunities and challenges of using OpenAI's o1-preview and o1-mini models for generating quality code.

#OpenAI #DevQualityEval

Written with ChatGPT (GPT-4o).

Sources

Alessio Fanelli@FanaHOVA
1 year ago
OpenAI workloads time distribution and minimum target latency per model. https://t.co/SkOUeA3OnF
Boris Power@BorisMPower
1 year ago
Good evaluation of o1-preview for a variety of tasks! https://t.co/4Nnxi20W2p
elvis@omarsar0
1 year ago
Nice study providing a comprehensive evaluation of OpenAI's o1-preview LLM. Shows strong performance across many tasks: - competitive programming - generating coherent and accurate radiology reports - high school-level mathematical reasoning tasks - chip design tasks -… https://t.co/ASNxyJxKp2

Additional media

Image #1 for story openai-s-o1-preview-model-shows-83-3-success-rate-across-multiple-tasks-9cb4f6e5

Image #2 for story openai-s-o1-preview-model-shows-83-3-success-rate-across-multiple-tasks-9cb4f6e5

OpenAI's o1-preview Model Shows 83.3% Success Rate Across Multiple Tasks

Sources

Additional media

Similar Stories