Sources
elvisArcee-SuperNova is another cool example of applying model distillation on Llama-3.1-405B. Key takeaways: - They use a level logit compression technique to overcome the hardware requirements needed to distill such a big model. - Took about 5 days to distill into… https://t.co/zdOeHGBQmG
Philipp SchmidFirst distilled Llama 3.1 released by @arcee_ai! 🦙 SuperNova is a distilled reasoning Llama 3.1 70B & 8B! 👀 Arcee distilled @AIatMeta Llama 3.1 405B using offline knowledge distillation and combined it with RLHF and model merging to create new #1 open LLMs. SuperNova 70B is… https://t.co/ZoWmZoMR3Q
Philipp SchmidFirst distilled Llama 3.1 released by @arcee_ai! 🦙 SuperNova is a distilled reasoning Llama 3.1 70B & 8B! 👀 @arcee_ai distilled @AIatMeta Llama 3.1 405B using offline knowledge distillation and combined it with RLHF and model merging to create new #1 open LLMs. SuperNova 70B is…



