Recent advancements in AI research have demonstrated that training large language models (LLMs) can be significantly more cost-effective than previously believed. A collaborative effort between CSAIL, myshell_ai, and other entities has introduced JetMoE, an open-source Llama-2-level model, which has been trained for under $0.1 million. This development challenges the conventional approach taken by companies like OpenAI and Meta, which spend billions of dollars on training their models. JetMoE-8B, trained with a 96×H100 GPU cluster for two weeks, not only utilizes public datasets but also outperforms Meta AI's LLaMA2-7B in terms of performance. The model, boasting 8 billion total and 2.2 billion active parameters, represents a significant step forward in making LLMs more accessible and affordable for a broader range of users and researchers.
JetMoE-8B, an AI model that achieves performance comparable to Meta AI's LLaMA2-7B despite being trained with less than $0.1 million, which is significantly less than the multi-billion-dollar training resources of LLaMA2. The model is open and academia-friendly, utilizing only… https://t.co/8aCIcfDstD https://t.co/6LZ0QlPMba
It will get super interesting once more people and companies can afford to train LLMs from scratch or even easily and cost-effectively fine-tune the large existing ones. "JetMoE-8B is trained with less than $ 0.1 million cost but outperforms LLaMA2-7B from Meta AI, who has… https://t.co/lBHYQOAaIz
Looks to be super interesting if can be implemented for all cases. ✨ "JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars" 📌 trained with less than $ 0.1 million (a 96×H100 GPU cluster for 2 weeks) but outperforms LLaMA2-7B 📌 only uses public datasets for training, 📌… https://t.co/tcHxObEAiI