Jul 16, 07:02 AM

Qwen2 Language Models Released with Comprehensive Technical Report, Including 72B Parameters and Wukong 7B Scoring 41.6 on MMLU-Pro

Qwen2, a suite of foundational and instruction-tuned language models, has been released with a comprehensive technical report. The models range from 0.5 to 72 billion parameters, including dense models and a Mixture-of-Experts model. Notably, the Qwen2-500M model was trained on 12 trillion tokens, the highest for a model of its size. The Qwen2 Wukong 7B model scored 41.6 on MMLU-Pro, utilizing a custom FA2 implementation on an AMD 8xMi300x node. The report details pretraining, post-training / rlhf notes, and experimental results, focusing on multilingual capabilities in 30 languages, a tokenizer with 151k vocabulary, and dual chunk attention with YARN for long context. The data annotation pipelines emphasize human-in-the-loop, ontology, and diversity. Despite the high-quality data remaining private, the Magpie method has extracted a large collection of instruction data for further research. The Qwen-2 72B model can be deployed on the Inferless serverless platform, offering 17.83 tokens/sec average generation speed, 24.79 sec latency for 512 tokens, and 35.59 seconds average cold start time.

#Qwen2 #Qwen #Qwen2 Wukong #FA2 #AMD #Magpie #Inferless

Written with ChatGPT (GPT-4o).