Groq has launched a new Llama 3.1 70B endpoint, achieving over 6 times faster output tokens per second compared to their current endpoint and over 20 times the median of other providers. The new endpoint achieves 1,665 output tokens per second by leveraging speculative decoding. Groq's 14nm V1 LPU is currently delivering great performance and price for inference, with a 4nm V2 expected next year. Additionally, GroqCloud has made Llama-3.1-70B-specdec available to play with for everyone on Dev Tier.
250 T/s —> 1665 T/s 😎 Any app powered by Groq with 70b spec decoding would now run ~6x faster⚡️, the power of Groq speed! https://t.co/Z2PtRahvMH
Groq has launched a new Llama 3.1 70B endpoint with >6X faster output tokens/s than their current endpoint and >20X the median of other providers @GroqInc's new endpoint achieves 1,665 output tokens/s through leveraging speculative decoding. Speculative decoding is an inference… https://t.co/eAs9OHIK6Z
Friday Fun: You wanted even faster inference for your apps, so we just dropped llama-3.1-70B-specdec on GroqCloud - now available to play with for everyone on Dev Tier. 🏁 https://t.co/KAYt2LXnrE