Groq Inc. has made significant strides in AI inference speed, achieving approximately 1,150 tokens per second on the Llama 3 8B model with an 8k context window. This performance is nearly 10 times faster than GPT-4o, which has recently experienced a slowdown. The advancements in Groq's technology are attributed to their 14nm design, which allows for faster processing. SambaNova AI also boasts impressive speeds with their Reconfigurable DataFlow AI chips, achieving 1,000 tokens per second on 16 sockets. The competition between Groq and SambaNova in AI inference is intensifying, with both companies pushing the boundaries of AI performance. Another noteworthy mention is the 1,084 tokens per second achieved by another competitor.
Groq is now reporting ~1,150 tokens/s on Llama 3 8B in its chat interface! We look forward to confirming these results on @GroqInc's API over the coming days and seeing the tokens/s over time line chart go up and to the right ↗️. If so, this would represent the fastest language… https://t.co/ZHmMS2On82
Powered by Groq 😀 https://t.co/f0wihTd49u
L3 8B in production running at 1157t/s/u with the full 8k context window. Only at @GroqInc. 🫡 https://t.co/Lb8yYFXvr0