Mar 7, 10:34 PM

Training 7B to 70b LLMs on Home PCs with RTX 4090, FSDP/QLoRA Now Cheaper

Recent advancements in machine learning have made it possible to train large language models (LLMs) more efficiently and cost-effectively. A notable development is the training of a 7 billion parameter model, Llama 7B, on a single consumer-grade GPU, specifically the RTX 4090 with 24GB memory, achieving over an 82.5% reduction in memory requirements for optimizer states during training. This breakthrough, alongside the introduction of the FSDP/QLoRA project, enables the training of even larger models, up to 70 billion parameters, on home computers using consumer gaming GPUs like two Nvidia 4090s, requiring 140GB of RAM. The FSDP/QLoRA project, described as a collaboration among Tim Dettmers, Hugging Face, and Mobius Labs, has been integrated into the Mixtral LLM fine-tuning library, further facilitating the training of these massive models on gaming GPUs. Additionally, the cost of training LLMs is projected to decrease, with estimates ranging from $2 to $50 million. The GaLore algorithm, a pre-release version of which is available, proposes a memory-efficient training method by Gradient Low-Rank Projection.

#Llama 7B #Tim Dettmers #Hugging Face #Mobius Labs #Mixtral LLM #GaLore

Written with ChatGPT (GPT-4).

Sources

Additional media

Image #1 for story training-7b-to-70b-llms-on-home-pcs-rtx-4090-fsdp-qlora-now

Image #2 for story training-7b-to-70b-llms-on-home-pcs-rtx-4090-fsdp-qlora-now

Image #3 for story training-7b-to-70b-llms-on-home-pcs-rtx-4090-fsdp-qlora-now

Training 7B to 70b LLMs on Home PCs with RTX 4090, FSDP/QLoRA Now Cheaper

Sources

Additional media

Similar Stories