Handy feature in Open LLM leaderboard so you can check model consumption during evaluation. TLDR: not all costly models are big, and not all models are worth the extra cost of running them. https://t.co/TPAJn0bUY6
Check out how much CO₂ we spent doing model evaluations in the Open LLM Leaderboard 🔍 -> It's an easy way to see which models have the best CO₂ inference cost to performance ratio! (not all big models are worth it, but @Alibaba_Qwen models are imo in a sweet spot 👏) https://t.co/M1vswayK8p
🌱 CO₂ calculations on the Open LLM Leaderboard! You can now check CO₂ emissions for each model evaluation! Track which models are greener and make sustainable choices🌍 🔗 Leaderboard: https://t.co/ecrYahipwt 📄 Docs: https://t.co/5DEiNomCnr https://t.co/25otuvMavX
Portkey AI is currently processing over 12 billion large language model (LLM) tokens daily through its AI Gateway. The company has analyzed the performance of its top models by mapping LMSys scores against output token costs. Additionally, new insights into model efficiency have emerged from the Open LLM Leaderboard, which now includes CO₂ emissions data for each model evaluation. This feature allows users to assess the environmental impact of models, highlighting which ones offer the best CO₂ inference cost to performance ratio. Notably, models from Alibaba Qwen have been identified as performing well in this regard. The leaderboard also emphasizes that higher costs do not necessarily correlate with better performance, as smaller models may have unpredictable inference costs.