
Qwen2, a suite of foundational and instruction-tuned language models, has been released with a comprehensive technical report. The models range from 0.5 to 72 billion parameters, including dense models and a Mixture-of-Experts model. Notably, the Qwen2-500M model was trained on 12 trillion tokens, the highest for a model of its size. The Qwen2 Wukong 7B model scored 41.6 on MMLU-Pro, utilizing a custom FA2 implementation on an AMD 8xMi300x node. The report details pretraining, post-training / rlhf notes, and experimental results, focusing on multilingual capabilities in 30 languages, a tokenizer with 151k vocabulary, and dual chunk attention with YARN for long context. The data annotation pipelines emphasize human-in-the-loop, ontology, and diversity. Despite the high-quality data remaining private, the Magpie method has extracted a large collection of instruction data for further research. The Qwen-2 72B model can be deployed on the Inferless serverless platform, offering 17.83 tokens/sec average generation speed, 24.79 sec latency for 512 tokens, and 35.59 seconds average cold start time.
Very cool technical report on Qwen2! > Focus on multilingual capabilities in 30 languages including pretraining data, tokenizer (151k vocab) and evaluations > Dual Chunk Attention with YARN for long context > Ablations showing no significant gain after 7 trillion pretraining…
Qwen 2’s tech report is out! However, their high-quality data remains private. Want to use Qwen 2’s data for post-training research and build other cool projects? 📢 Good news! Using our 🐦⬛ Magpie method, we have extracted a large collection of instruction data from Qwen 2 and…
Elevate your text generation with Qwen-2 72B and deploy on our Inferless serverless platform🚀 ⚡ Experience superb efficiency: 🔹17.83 tokens/sec average generation speed 🔹24.79 sec latency for 512 tokens 🔹35.59 seconds average cold start time 🔗Link: https://t.co/YG1x4yYBtm https://t.co/4Mb2uJKTsK










