Sep 6, 01:15 PM

DeepSeek AI Launches Advanced DeepSeek-V2.5 Model with 238B Parameters and 128k Context Length

DeepSeek AI has officially launched DeepSeek-V2.5, an advanced AI model that merges the capabilities of DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724. The new model features 238 billion parameters with 160 experts and 21 billion active parameters, utilizing a Mixture of Experts (MoE) architecture. It is optimized for coding and offers enhanced writing, instruction-following, and human preference alignment. DeepSeek-V2.5 boasts a 128k context length and includes native function calling and JSON mode. The model has shown significant improvements in various benchmarks, including a 76.3% score on Arena Hard and a 50.52% score on AlpacaEval 2.0. It also has a notable win rate against GPT-4o and GPT-4o-mini. The model requires 80GB*8 GPUs for BF16 inference and is available as an open source download on Huggingface. DeepSeek-V2.5 was released by the Chinese AI unicorn, DeepSeek AI.

#DeepSeek AI #Mixture of Experts #MoE #JSON #Arena Hard #AlpacaEval #GPT #Huggingface #Chinese AI

Written with ChatGPT (GPT-4o).