DeepSeek AI has officially launched DeepSeek-V2.5, an advanced AI model that merges the capabilities of DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724. The new model features 238 billion parameters with 160 experts and 21 billion active parameters, utilizing a Mixture of Experts (MoE) architecture. It is optimized for coding and offers enhanced writing, instruction-following, and human preference alignment. DeepSeek-V2.5 boasts a 128k context length and includes native function calling and JSON mode. The model has shown significant improvements in various benchmarks, including a 76.3% score on Arena Hard and a 50.52% score on AlpacaEval 2.0. It also has a notable win rate against GPT-4o and GPT-4o-mini. The model requires 80GB*8 GPUs for BF16 inference and is available as an open source download on Huggingface. DeepSeek-V2.5 was released by the Chinese AI unicorn, DeepSeek AI.
DeepSeek-V2.5 Released by DeepSeek-AI: A Cutting-Edge 238B Parameter Model Featuring Mixture of Experts (MoE) with 160 Experts, Advanced Chat, Coding, and 128k Context Length Capabilities Read our full take on this: https://t.co/zMqzwARE5y DeepSeek-AI has released… https://t.co/sBik4L4gDQ
DeepSeek-V2.5 Released by DeepSeek-AI: A Cutting-Edge 238B Parameter Model Featuring Mixture of Experts (MoE) with 160 Experts, Advanced Chat, Coding, and 128k Context Length Capabilities https://t.co/Z4CStj3Y6j
Wake up DeepSeek v2.5 officially released.🔥 🧠 238B params, 21B active (MoE architecture) 📏 128K context window 🚀 Arena Hard 76.3%, Alpaca Eval 50.52% In their internal Chinese evaluations, DeepSeek-V2.5 shows a significant improvement in win rates against GPT-4o mini and… https://t.co/98xoq1ktqr