A breakthrough in Large Language Model (LLM) training, dubbed GaLore, has been achieved, allowing the training of a 7B model on a single consumer-grade GPU, specifically the Nvidia RTX 4090 with 24GB of memory. This innovation represents an 82.5% reduction in memory requirements for storing optimizer states during training. GaLore's approach diverges from traditional methods by projecting the weight gradient, which is naturally low-rank, instead of assuming a low-rank weight structure. This advancement not only makes LLM training more accessible by reducing the need for extensive hardware but also promises a reduction in training costs, with projections suggesting that training LLMs could cost between $2 million to $50 million.
LLMs Will Be Cheaper And Cheaper To Train Applying some brand new techniques, you can literally train a 7B model with one GPU! 7B models will likely hit GPT 3.5 performance in the next couple of months! All said and done, you can train LLMs with just $2-50M That’s it!…
Thanks @_akhaliq for promoting our work! With GaLore, now it is possible to pre-train a 7B model in NVidia RTX 4090s with 24G memory! 🤔How? Instead of assuming low-rank weight structure like LoRA, we show that the weight gradient is naturally low-rank and thus can be… https://t.co/DaR5L23Rc3
Thanks @_akhaliq for promoting our work! With GaLore, now it is possible to pre-train a 7B model in NVidia RTX 4090 with 24G memory! How? Instead of assuming low-rank weight structure like LoRA, we show that the weight gradient is naturally low-rank and thus can be projected… https://t.co/DaR5L23Rc3