[CL] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? J Lee, A Chen, Z Dai, D Dua... [Google DeepMind] (2024) https://t.co/31IR2MfoBl - Long-context language models (LCLMs) like Chinchilla and PaLM have shown promise in revolutionizing AI by eliminating… https://t.co/Zi9gZUUTEt
Can long-context models replace retrievers, RAG & SQL? We evaluate them on smaller-scale versions of these tasks and compare them to specialized models in same settings. We found *prompting* LLM perform surprisingly well, generalizing across text, multimodal & other settings! https://t.co/1c15QaztDs
Ever wondered if long-context language models can also master image, video, and multimodal retrieval? 🌟 Dive into our latest work LOFT! We benchmarked various long-context language models on million-token level retrieval, RAG, and SQL tasks across text, vision, and audio 🚀 #AI… https://t.co/SSMI2csiCf




Google DeepMind has introduced a new benchmark called LOFT to evaluate the performance of long-context language models (LCLMs) across various tasks. The LOFT benchmark consists of six long-context task categories, including retrieval, multi-hop reasoning, and SQL. The study reveals that LCLMs can rival state-of-the-art retrieval and RAG systems but still face challenges in complex reasoning and compositional tasks. The evaluation includes real-world tasks requiring a million-token context and extends to multimodal retrieval involving text, vision, and audio. Additionally, LCLMs were tested on smaller-scale versions of these tasks, showing that prompting LLMs perform surprisingly well and generalize across settings. Models like Chinchilla and PaLM are highlighted for their potential in revolutionizing AI by eliminating the need for specialized systems.