UnslothAI has announced the availability of vision fine-tuning for the Llama-3.2-Vision-11B model on Google Colab, enabling users to finetune vision language models (VLMs) at double the speed and with 50% less VRAM usage, while maintaining accuracy. This update also supports other models including Pixtral, Qwen2 VL, and various Llava variants. Users can expect a performance improvement of 1.3x to 2x faster for each model. Additionally, the latest version of OmniVision-968M has been enhanced based on user feedback, showcasing improvements in art descriptions and complex image handling. These advancements are part of a broader trend in optimizing large language models (LLMs) for efficiency and cost-effectiveness, as highlighted by the recent updates from LLM Compressor, which aims to reduce inference times and costs with minimal accuracy trade-offs.
You can finetune Llama-3.2-Vision-11B for free on Colab now! Unsloth finetunes VLMs 2x faster, with 50% less VRAM, 6x longer context - with no accuracy loss. Documentation: https://t.co/wGCYz6HKaW GitHub: https://t.co/pHcwaR5VTa
LLM Compressor optimizes LLMs for faster inference and lower costs with minimal accuracy trade-offs. GitHub: https://t.co/FMZqtkRe9T Here’s @mgoin_ on what’s new in v0.3.0: https://t.co/gATwkzjcdO
We’ve just improved OmniVision-968M based on your feedback! 🚀 The latest updates are now live as a preview in our @huggingface Space powered by @Gradio: https://t.co/qTwYacK7ov Here’s what’s improved (examples in the thread): 1️⃣ Art Descriptions 2️⃣ Complex Images 3️⃣ Anime 4️⃣…