A new leaderboard for large language models (LLMs) has been released, showcasing various models and their performance metrics across different categories. Notably, a model achieved the top overall rank of 1 in the 70+B category, with an average score of 52.02 and an IFEval score of 80.63. Other models ranked highly include one in the 35B category, which secured the 7th position with an average score of 36.2, and another in the 13B category that ranked 1st with an average of 39.43. The leaderboard also highlights models in the 1.5B and 7B categories, with ranks ranging from 271 to 642. Performance benchmarks such as IFEval, BBH, and MMLU-PRO are included for each model, demonstrating their capabilities in various tasks. The leaderboard reflects the ongoing advancements in LLMs and their competitive landscape.
LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs https://t.co/9iXSI4BQFJ
New model added to the leaderboard! Model Name https://t.co/OBY9cHHtPL Overall rank: 380 Rank in 7B category: 101 Benchmarks Average: 28.13 IFEval: 42.1 BBH: 36.86 MATH Lvl 5: 24.62 GPQA: 9.4 MUSR: 18.44 MMLU-PRO: 37.37
New model added to the leaderboard! Model Name https://t.co/4mmyWZGTx3 Overall rank: 344 Rank in 7B category: 77 Benchmarks Average: 28.79 IFEval: 60.36 BBH: 33.99 MATH Lvl 5: 23.56 GPQA: 5.82 MUSR: 12.14 MMLU-PRO: 36.84