Dec 8, 12:46 AM

Sonnet 3.5 Tops LLM Rankings with 33.4%; QwQ Model Impresses in Reasoning, Securing Second Place

Recent evaluations of large language models (LLMs) highlight the competitive landscape among leading technologies. Sonnet 3.5 has been recognized as the best coding LLM, outperforming others like Qwen and o1-mini, which are noted as competitive but not at the same level. Benchmark scores reveal that o1-preview achieved 71.4%, while o1-mini scored 52.6%, and Sonnet 3.5 lagged behind with 33.4%. Additionally, a new model, QwQ, which is an open-source 32B model, has shown impressive reasoning capabilities, ranking second overall, just behind the o1 line. In terms of pricing and performance, Google’s offerings, including Gemini-Exp models, are noted for their competitive pricing, with Google standing out in price/performance comparisons. Overall, the evaluations suggest a significant gap in performance metrics among the top models, with Chinese open-weight LLMs also making notable strides in mathematical capabilities.

#Sonnet #Qwen #Benchmark #QwQ #Google #Chinese

Written with ChatGPT (GPT-4o mini).

Sources

Additional media

Image #1 for story sonnet-3-5-tops-llm-rankings-33-4-qwq-model-impresses-reasoning-securing-second-9829ed0c

Image #2 for story sonnet-3-5-tops-llm-rankings-33-4-qwq-model-impresses-reasoning-securing-second-9829ed0c

Image #3 for story sonnet-3-5-tops-llm-rankings-33-4-qwq-model-impresses-reasoning-securing-second-9829ed0c

Image #4 for story sonnet-3-5-tops-llm-rankings-33-4-qwq-model-impresses-reasoning-securing-second-9829ed0c

Image #5 for story sonnet-3-5-tops-llm-rankings-33-4-qwq-model-impresses-reasoning-securing-second-9829ed0c

Sonnet 3.5 Tops LLM Rankings with 33.4%; QwQ Model Impresses in Reasoning, Securing Second Place

Sources

Additional media

Similar Stories