Sources
xjdrMoD + Modality specific MoE + early fusion?! I would be so happy if L4 (or at least a large model variant) ends up adopting this architecture (i know, i know, one thing at a time, etc, but we could still stand to get a bit more creative in the attention layer ... ) https://t.co/di0mSa3jfV
Akshat ShrivastavaWith Chameleon we showed that early fusion mixed modal LLMs can deliver strong improvements over unimodal and late fusion alternatives, however with this paradigm shift how do we rethink our core model architecture to optimize for native multimodality and efficiency? We… https://t.co/CV5CzoxZ11
Armen AghajanyanIf you were interested in my cryptic posts on how to train Chameleon-like models up to 4x faster, check out our MoMa paper which covers a detailed overview of most of our architectural improvements. tl;dr adaptive compute in 3-dim, modality, width, depth. https://t.co/fvffkIjs2I

