
Huggingface Transformers has officially integrated GaLore, a new feature that allows users to finetune models with 20% less VRAM usage. This integration, supported by contributions from the Huggingface team and community members, enables the combination of GaLore with ORPO using Axolotl, promising more efficient model training. Further enhancements and integrations are planned, aiming to reduce the optimizer state memory footprint by 82.5% without compromising performance. This development is expected to facilitate state-of-the-art (SOTA) results on consumer-grade hardware, making advanced model training more accessible.





Galore officially supported in @huggingface transformers ! check out the blogpost below for more details ! https://t.co/oJfF6wTBSD
š„ Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware! ā¬ļø 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank. š¬ Read the blog: https://t.co/Qh6SWiu1AJ https://t.co/3SuHsXWuSH
Now Huggingface Transformer incorporates GaLore! 20% less memory with exact the same loss curve as full-parameter tuning š https://t.co/AGUeJ8nwiW