
A team of AI researchers, including Yann LeCun, has announced the launch of LiveBench, a new general-purpose live LLM benchmark. LiveBench aims to address the limitations of existing LLM benchmarks by using contamination-free test data and objective scoring. Developed in collaboration with Abacus AI, LiveBench is designed to be lightweight and easy to run, featuring around 200 questions per category. This benchmark is unique in that it presents new challenges that models cannot simply memorize, making it a more robust tool for evaluating AI models.
A team of AI researchers/academics, incl. @ylecun developed a new open LLM benchmark called LiveBench. It evaluates models using contamination-free test data and objective scoring. I spoke w/some of its creators: @micahgoldblum & folks from @abacusai. https://t.co/V75Jp6rNFz
LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring: Yann LeCun and other researchers have developed LiveBench, an open AI benchmark evaluating models using challenging, contamination-free… https://t.co/A7CygWiM8e #AI #AIbenchmarking
LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring https://t.co/pYKQFRAIKx https://t.co/1So8uyes1C
