
The Gemma fine-tune based on openchat-3.5-0106 data and method (C-RLFT) has shown almost the same performance as the Mistral-based version. Groq has made Gemma 7B Instruct available on GroqChat with API access, offering the world's fastest inference performance. Groq's Gemma 7B API has set a new speed record in LLM inference APIs with 814 tokens/s throughput, competitively priced at $0.1/M tokens. Groq's speed is impressive but heavily rate-limited, with different limits for various plans. Despite the performance of LLama2-7B, Mistral-7B, and Gemma-7B models, there is uncertainty about the 13B models' potential performance improvements.

"Even compared to other cloud installs of Gemma the Groq installation is impressively fast. It beats out ChatGPT, Claude 3 or Gemini in response time" -- @RyanMorrisonJer https://t.co/UBPHJwzpE1
500 Tokens a Second - Google SWOT in 0.45 Seconds vs 30 Seconds We tested Groq comparing its speed against our own SWOT Summary Prompts (DM me if you want the Prompt) Groq vs. Anthropic's New Claude 3 Opus🤺 @Flowise_AI has just integrated @GroqInc into their platform, and…
So, with LLama2-7B, Mistral-7B, and Gemma-7B, we are quite confident we cannot squeeze more general domain performance by feeding it more pretrained tokens for example. But, if I'm not mistaken, we don't have such certainty about 13B models. If OpenChat, which is a finetune of a…