Scale AI is introducing high-quality private datasets and paid annotators for rankings to enhance the evaluation field. The new public leaderboard addresses issues like contaminated evaluation sets and rater quality. Scale's fully private LLM leaderboard ensures integrity by featuring models only once. The investment in high-quality evaluations and benchmarks is crucial for understanding model utility. Scale's efforts in private evaluations and leaderboards are praised for shaping the AI frontier.
These days models are evolving faster than ever, and to progress the field, we definitely need more high-quality evals. Check out new leaderboards from @scale_AI! https://t.co/uvSqPeOxHw
Another way to evaluate LLMs from @alexandr_wang and team. Any check this out yet? https://t.co/f877Ho54hK
As predicted, scale is entering the LLM eval game, but with a private (read: non trainable on) evals for frontier models! This is great, very trusted resource, in addition to LMSys, Reddit Vibes, X shitpoasting and broken open evals. Will cover tomorrow on @thursdai_pod ! https://t.co/cp2a4FrHTL