Mar 1, 09:26 PM

Fireworks AI Spring 2024: Faster Mixtral & Llama Models, Prices Cut

Fireworks AI announced its Spring 2024 platform updates, focusing on enhanced production usage at scale. The updates include faster, more production-ready serverless models, notably the Mixtral Instruct and Llama 70B, with speeds reaching up to 300 tokens per second. Additionally, the company has optimized the Mixtral 8x7B offering, achieving up to 200 tokens/second, positioning it second only to Groq in terms of speed. This optimization also comes with a significant reduction in output token pricing, now at $0.5/M input & output, effectively reducing the previous pricing to 1/3. The updates have been positively received, with benchmarks indicating the Mixtral model's improved speeds and consistency, alongside newly reduced pricing. Groq's technology, particularly the GroqRacks, has been highlighted for serving the Mixtral 8x7B at approximately 500 tokens/second. Furthermore, it was noted that even without Groq chips, Mixtral can still run at nearly 300 tokens/second on 8xH100 with gpt-fast, thanks to hardware provided by Kurumuz.

#Fireworks AI #Spring #Mixtral Instruct #Llama 70B #Mixtral #Groq #GroqRacks #Kurumuz

Written with ChatGPT (GPT-4).

Sources

Additional media

Image #1 for story fireworks-ai-spring-2024-faster-mixtral-llama-models-prices

Image #2 for story fireworks-ai-spring-2024-faster-mixtral-llama-models-prices

Image #3 for story fireworks-ai-spring-2024-faster-mixtral-llama-models-prices

Fireworks AI Spring 2024: Faster Mixtral & Llama Models, Prices Cut

Sources

Additional media

Similar Stories