Recent evaluations of AI models have revealed that Anthropic's Claude 3.5 Sonnet is outperforming OpenAI's O1-pro, particularly in coding tasks and other scientific disciplines. Users have noted that the performance gap between the two models is narrowing, with some suggesting that O1-pro is superior for coding applications. Feedback indicates that while O1-pro has improved significantly compared to its previous version, O1 Preview, Claude 3.5 Sonnet remains competitive, especially for general discussions and idea generation. Users have expressed that O1-pro is the best model they have used for coding, highlighting its ability to handle complex codebases effectively. Overall, the ongoing evaluations suggest that both models have their strengths, with Claude 3.5 Sonnet excelling in certain areas and O1-pro showing marked improvements in coding capabilities.
a crude set of everything I've understood so far after playing with o1/o1 pro we've come close to the limits of arbitrary questions that can test the quality of models properly. so far in my tests, these are the areas where o1 pro does better #1 consider it pass@N level, high…
o1-pro is probably the best model i've used for coding, hands down i gave it a pretty complicated codebase and asked it to refactor while referencing docs the difference between claude/gemini/o1 and o1 pro is night and day. first time in a while i've been this impressed.… https://t.co/eAQlEMvFN8
Ok, new o1 (not pro) passes my vibe test. Actually decent tbh. I still think Sonnet is better for general discussion of ideas, but o1 can turn them into mostly working code better than anything else so far