Sources
- Zain
How do you teach an LLM to carry long form conversations and not get confused by all the details? To learn how we can improve fine-tuning over long form conversational data I fine-tuned a bunch of models on the CoQA dataset and 2x'd performance! Full code notebook below🔽 https://t.co/MHlbEZQXHd
- Vaibhav (VB) Srivastav
yo! @NVIDIAAIDev finally released the weights for Hymba-1.5B - outperforms Llama, Qwen, and SmolLM2 with 6-12x less training trained ONLY on 1.5T tokens > massive reductions in KV cache size and improved throughput > combines Mamba and Attention in a hybrid parallel… https://t.co/H5qxTpUX16
- Together AI
New Cookbook: Fine-tuning Llama 3.1 8B on Conversation Data In this notebook we fine-tune Llama 3.1 8B on the CoQA multi-turn conversation dataset with loss masking and show a significant improvement in performance! Read below: https://t.co/aB3gQp2JsX