
Groq has announced new state-of-the-art Tool Use models available in 8B and 70B parameters, outperforming Claude 3.5 Sonnet in function calling. The 8B model achieves a processing speed of 1050 tokens per second, while the 70B model processes 330 tokens per second. These models can be accessed on the Groq Console or downloaded from Huggingface. The 8B model has reached the #1 position on BFCL, beating all other models, including proprietary ones. Additionally, DeepSeek has introduced a new model, DeepSeek-V2-Chat, with 236B total parameters and 21B active parameters, which can run at FP16 on 8x80GB GPUs. This model shows significant improvements in performance, excelling in both arena hard and bigbench hard benchmarks.
DeepSeek just dropped a new leading model on LMSYS! Same model as DeepSeek-V2-Chat but different checkpoint (236B total params, 21B active params). You should be able to run @ FP16 on 8x80GB GPUs. Not a "home setup" but still ~half of what LLaMA-3-405B would need. https://t.co/ZKHBrXmkvN https://t.co/0vv6Mha03S
deepseek having such improvements in their general model so quickly is mind-blowing. brilliant performance on both arena hard and bigbench hard. truly OSS king until llama3.1 405B replaces it (?) https://t.co/fW0geK4ein https://t.co/wqZ87p6r6u
New Deepseek model and they did it again! https://t.co/bUOkwe5EY7
