i’ve wanted a top tier benchmark since lmsys died slowly and painfully is this it: https://t.co/1ahlo49QHs credit: @AIExplainedYT
id seen this somewhere but hadn’t realised it was @AIExplainedYT this will likely become the best llm benchmark in all of texas. nice kol. https://t.co/OkVMNHwv5e
Philip (@AIExplainedYT) got fed up with all these poor-quality benchmarks and made one himself If you watch even a handful of his videos you'll know AI Explained is not impressed with the popular LLM benchmarks, particularly MMLU and HellSwag. So Philip has produced his own… https://t.co/0jsY6RXjoC

A new benchmark for evaluating large language models (LLMs) has been developed by Philip, known as AI Explained. This benchmark has garnered attention for its potential to outperform existing standards, particularly MMLU and HellSwag, which have been criticized for their quality. Users have expressed their appreciation for the benchmark, noting that it aligns well with real-world experiences and fills a gap left by previous benchmarks. The benchmark is expected to gain traction, especially in Texas, as it aims to provide a more reliable assessment of LLM performance.