
Two new techniques, LLoCO and Megalodon, have been introduced to address challenges in large language models (LLMs). LLoCO focuses on learning long contexts offline to reduce computational and memory overhead, achieving speed-up and cost reduction in document QA. Megalodon, on the other hand, offers efficient LLM pretraining and inference with unlimited context length, aiming to scale to long sequences and improve long-context modeling.
Checkout Megalodon: a new alternative architecture of transformers: - head-by-head comparison at the scale of 7B and 2T tokens showing lower ppl - unlimited ctx len - constant KV cache at inference Exciting work by @MaxMa1987 @violet_zct @_xiaomengy_ Ckpts available soon! https://t.co/ANBvICY2ta
Thrilled be part of #Megalodon team, an effective model in LLMs for handling unlimited context lengths efficiently, both training and inference. The mode was rigirously compared to LLAMA with *7B* model *2T* tokens, derisked for large scale training. Congrats to the team! https://t.co/jwNTToxntz
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head… https://t.co/0rgjJ9qDea




