Several new AI models have been announced recently, showcasing advancements in performance and capabilities. Together AI has released the Llama 3.1 Nemotron 70B Instruct model, which is optimized by NVIDIA and has achieved notable scores in alignment benchmarks: 85.0 on Arena Hard, 57.6 on AlpacaEval 2 LC, and 8.98 MT-Bench. LocalAI introduced multiple models, including Rombos-Coder-V2.5-Qwen-32b, which promises improved performance, and a finetuned model named 'mistral-nemo-prism-12b' aimed at reducing archaic language. Additionally, LocalAI has launched 'Cobalt', a math-instruct model based on Llama 3.1 8B. SambaNovaAI has reported significant advancements with its Llama 3.1 405B model, achieving speeds of 200 tokens per second, more than double the performance of other providers. Groq Inc. highlighted their Llama 3 70B model running at over 2200 tokens per second, indicating strong competition in the AI inference market. These developments reflect a growing trend towards optimizing AI models for faster and more efficient performance.
🚀🎉 New model alert! Introducing "Cobalt", a math-instruct model built on Llama 3.1 8b. Install it with `local-ai run llama3.1-8b-cobalt` #LocalAI #AIModel #NewRelease
New model alert! Check out "celestial-harmony-14b-v1.0-experimental-1016-i1"! 🚀 This experimental model is a merge of pre-trained language models. Learn more and install via local-ai run celestial-harmony-14b-v1.0-experimental-1016-i1 #LocalAI #machinelearning #NLPmodels
In the words of @sundeep "Ok, fine we'll do spec decode too 😉" Check out the video: Llama 3 70B running at >2200 tokens per second. Great performance and price for inference with our 14nm V1 LPU. ...and 4nm V2 is coming next year! https://t.co/dKU2CWv03c