
Recent advancements in machine learning have made it possible to train large language models (LLMs) more efficiently and cost-effectively. A notable development is the training of a 7 billion parameter model, Llama 7B, on a single consumer-grade GPU, specifically the RTX 4090 with 24GB memory, achieving over an 82.5% reduction in memory requirements for optimizer states during training. This breakthrough, alongside the introduction of the FSDP/QLoRA project, enables the training of even larger models, up to 70 billion parameters, on home computers using consumer gaming GPUs like two Nvidia 4090s, requiring 140GB of RAM. The FSDP/QLoRA project, described as a collaboration among Tim Dettmers, Hugging Face, and Mobius Labs, has been integrated into the Mixtral LLM fine-tuning library, further facilitating the training of these massive models on gaming GPUs. Additionally, the cost of training LLMs is projected to decrease, with estimates ranging from $2 to $50 million. The GaLore algorithm, a pre-release version of which is available, proposes a memory-efficient training method by Gradient Low-Rank Projection.
Train 7B model with a single GPU with 24GB memory This repo contains the pre-release version of GaLore algorithm, proposed by GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection. https://t.co/9W6o3GDXh9
This is amazing - our fave LLM fine tuning library has integrated FSDP/QLoRA already! Mixtral training on gaming GPUs - that's so cool... ๐ https://t.co/M1KnuICxtF
You can now train a 70b language model at home An #opensource system, based on FSDP and QLoRA, that can train a 70b model on two 24GB GPUs. https://t.co/UAlG6wEPlD


