Feb 15, 06:48 PM

Deepseek R1 Becomes Most Liked Model on Hugging Face with Over 10 Million Downloads and 77 Tokens/s Performance

Deepseek R1 has achieved the status of the most liked model on Hugging Face shortly after its release, with over 10 million downloads of its variants. The DeepSeek team recommends specific settings for optimal performance, including no system prompt and a temperature setting of 0.6. Additionally, SGLang, an open-source LLM inference engine, has implemented multi-token prediction for Deepseek R1, resulting in a 1.76x speedup, bringing its processing rate to 77 tokens per second. The popularity of Deepseek has also contributed to SGLang nearing 10,000 stars on GitHub, reflecting its growing adoption and the successful implementation of a production-level LLM serving engine. In terms of performance, small LLMs have shown improved speeds, with Qwen 0.5B generating 1,000 tokens at a rate of 510 tokens per second on M4 Max hardware, and over 150 tokens per second on the iPhone 16 Pro.

#Deepseek #Hugging Face #DeepSeek #SGLang #Deepseek R1 #GitHub #Qwen #M4 Max #iPhone 16 Pro

Written with ChatGPT (GPT-4o mini).