Groq Inc., a tech company, is making headlines with its Mixtral technology, delivering an impressive processing speed of nearly 483 to 500 tokens per second (tok/s). This breakthrough in speed is significantly impacting the user experience (UX) for Large Language Models (LLMs), offering instantaneous responses and enabling new use-cases. The technology's affordability is also noteworthy, with costs reported at 27 cents per 1 million tokens, making it cheaper than GPT 3.5, and another pricing point mentioned at $0.8 per 1 million tokens. Founded by former Google TPU members, Groq's innovation is seen as a potential game-changer in AI development, challenging the dominance of traditional GPUs and opening up possibilities for real-time conversations with AI models. The company's public demo showcased an AI Answers Engine capable of generating factual, cited answers in less than a second. Industry observers are excited about the implications for performance and UX, highlighting Groq's role in overcoming previous bottlenecks in cost and latency in LLMs. Additionally, Google Gemini Ultra, a closed-source technology, can handle 500k tokens, contrasting with Groq's open approach.
So did some backcalculation. Running Mixtral on groq at 400 tokens/second would require 400 LPUs. With 400 A100 (same cost for now) you get… up to 13,000 tokens/second. Obviously not all one instance but, that’s even better in practice: serve different models for different users https://t.co/3rzaBsQYpZ
Probably the first operation cost analysis of owning @GroqInc hardware to run Llama2-70b. First of all, let me say I am a big fan of Groq. Great performance, great potential. The below is just a showcase how challenging things might be when rivaling the industry lead, but given…
Companies who can buy 576 of these GroqCard can achieve impossibly fast tokens / second ( Mixtral 8x7B-32k with 500 T/s ) https://t.co/BILKl2WSVh