
Scale AI has introduced a new private leaderboard for evaluating large language models (LLMs), marking a significant development in the field of AI. This leaderboard aims to provide a more accurate and trusted evaluation of public models by using clean, third-party assessments. The initiative addresses concerns about overfitting and contamination in existing benchmarks like MMLU. Additionally, the release of K2-65B, a fully open-source LLM with 65 billion parameters, has been announced. K2-65B is noted for its transparency, reproducibility, and superior performance compared to Llama 2 70B. The model includes all necessary components for open-source AGI, such as model checkpoints, code, logs, and data, and is released under the Apache 2.0 license. LLM360 and SnowflakeDB have been active in supporting this open-source AI initiative, and the LMSys Arena is also seen as a complementary benchmark.





🎉 Congratulations to an awesome fully open source model, by the m-a-p team! Paper: 📎https://t.co/BEjwEZlqJA Includes great info on: -Data Curation -Infra details -Intermediate checkpoints -Scaling law LLM360 is happy to work with this thriving community on open source AI. https://t.co/swpqDMp54E
Two new fully open-source models today🔥 Open: - Code to train models - Data for pretraining - Checkpoints - Intermediate results MAP-Neo https://t.co/96hwshYu3q K2 - https://t.co/hSEYIrCbWw
Really enjoyed reading "What We Learned from a Year of Building with LLMs" https://t.co/81inhIy1Ml ⭐️One key part of it about evals that stuck out at me: the importance of pairwise comparisons This doesn't mean scoring two models individually and then comparing the scores.… https://t.co/GDAvjzmER2