New model added to the leaderboard! Model Name https://t.co/2nCGgNTX2Y Overall rank: 1853 Rank in 1.5B category: 238 Benchmarks Average: 3.8 IFEval: 12.14 BBH: 4.44 MATH Lvl 5: 0.91 GPQA: 0.0 MUSR: 3.76 MMLU-PRO: 1.58
New model added to the leaderboard! Model Name https://t.co/tQKAo9vAfO Overall rank: 1771 Rank in 3B category: 184 Benchmarks Average: 5.17 IFEval: 22.54 BBH: 1.58 MATH Lvl 5: 0.0 GPQA: 1.01 MUSR: 4.52 MMLU-PRO: 1.36
New model added to the leaderboard! Model Name https://t.co/TND24WPMNb Overall rank: 1754 Rank in ?B category: 49 Benchmarks Average: 5.46 IFEval: 21.07 BBH: 3.15 MATH Lvl 5: 0.3 GPQA: 1.34 MUSR: 5.11 MMLU-PRO: 1.82
A team from MIT has developed a model that achieves a score of 61.9% on the ARC-AGI-PUB benchmark using an 8 billion parameter language model and Test-Time-Training (TTT), via Localllama. This marks a significant improvement over the previous record of 42%. The model's performance highlights advancements in the field of large language models (LLMs) and their application in complex reasoning tasks.