Recent advancements in Large Language Models (LLMs) training have focused on memory efficiency and reducing the size of weights and optimizer states. GaLore, a new approach, allows for pre-training a 7B model with 24G memory on NVidia RTX 4090s by utilizing low-rank weight gradient projection. This method has shown significant memory reduction and improved performance compared to previous techniques. Collaborative efforts and open-source community contributions have further enhanced the efficiency of training large models, making it accessible even on consumer GPUs like RTX 4090.
Model weight sharding 🤝 QLoRA = ❤️ Incredible technical achievement with huge practical impact because now anyone with 2 gaming GPUs can fine-tune 70B param models at home. Congrats to everyone involved! https://t.co/x0A1L8QHQR
Superb 🔥 82.5% reductions in memory even for pretraining. ------ "instead of expressing the weight matrix as low rank, which leads to a big performance degradation during pretraining, we instead express the gradient weight matrix as low rank without performance degradation,… https://t.co/4nubVT6HyP
This is so cool! Making HQQ and QLoRA work with FSDP opens so many new doors for people to finetune large models like Llama 70b on consumer GPUs like RTX 4090 :) And HQQ combines the best of both worlds - high acc like activation aware quant, yet also super fast like bitsanbytes! https://t.co/f5zG1fTl20