Global-MMLU: A World-class Benchmark Redefining Multilingual AI by Bridging Cultural and Linguistic Gaps for Equitable Evaluation Across 42 Languages and Diverse Contexts https://t.co/7x69rfxg4D #GlobalMMLU #MultilingualAI #CulturalSensitivity #AIInnovation #LanguageEvaluatio… https://t.co/lTYzhYL661
Global-MMLU: A World-class Benchmark Redefining Multilingual AI by Bridging Cultural and Linguistic Gaps for Equitable Evaluation Across 42 Languages and Diverse Contexts Global-MMLU🌍 seeks to correct these imbalances by introducing a dataset spanning 42 languages, encompassing… https://t.co/rVoLjOQLsA
🏷️:Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement 🔗:https://t.co/ScFnXNnpAr https://t.co/DXRGkb0Odw
Recent advancements in multilingual language models have been highlighted by several research initiatives. Cohere has released the Global-MMLU, a multilingual evaluation dataset that spans 42 languages, integrating machine-translated MMLU questions with professional translations and crowd-sourced post-edits, and includes cultural sensitivity annotations for 2,850 questions per language. Additionally, the Aya Expanse model, developed by Cohere, supports 23 languages and is available with open weights. This model features 8 billion and 32 billion parameters and has demonstrated significant improvements in machine translation capabilities, outperforming the second-best Gemma model. Other notable contributions include the Marco-LLM, which focuses on cross-lingual enhancement through massive multilingual training, and ALMA, a new technique for generating synthetic data that aligns large language models (LLMs) with minimal annotation. These developments underscore the ongoing efforts to improve the capabilities and cultural sensitivity of AI systems in multilingual contexts.