
Fireworks AI announced its Spring 2024 platform updates, focusing on enhanced production usage at scale. The updates include faster, more production-ready serverless models, notably the Mixtral Instruct and Llama 70B, with speeds reaching up to 300 tokens per second. Additionally, the company has optimized the Mixtral 8x7B offering, achieving up to 200 tokens/second, positioning it second only to Groq in terms of speed. This optimization also comes with a significant reduction in output token pricing, now at $0.5/M input & output, effectively reducing the previous pricing to 1/3. The updates have been positively received, with benchmarks indicating the Mixtral model's improved speeds and consistency, alongside newly reduced pricing. Groq's technology, particularly the GroqRacks, has been highlighted for serving the Mixtral 8x7B at approximately 500 tokens/second. Furthermore, it was noted that even without Groq chips, Mixtral can still run at nearly 300 tokens/second on 8xH100 with gpt-fast, thanks to hardware provided by Kurumuz.
Weekend experiment: @GroqInc mixtral-8x7b-32768 @MistralAI looks really good to perform post YouTube transcript correction, especially if you pass as context the video description: Mixtral properly corrects Rock, Grock into Groq. Shipping soon at https://t.co/RAol3CVVmp https://t.co/VODQrhMD5X
Amazing speed from @GroqInc now integrated into @graphlit. Comparing Mistral 8x7b between Groq and Mistral APIs. 🤯From same MP3 transcript, I ran two prompts: #1: 1.85x faster #2: 4.34x faster https://t.co/DZsPMjMAr1
Faster Mixtral speeds from our spring update are starting to register in benchmarks! Check out Mixtral on Fireworks for the fastest widely available speed, the best consistency and newly reduced pricing! https://t.co/u8DLVLglKc https://t.co/99Yhro8wAP


