NVIDIA has introduced the UltraLong-8B series of language models, designed to handle extensive sequences of text with capacities of 1 million, 2 million, and 4 million tokens. Developed in collaboration with researchers from the University of Illinois Urbana-Champaign (UIUC), these models aim to enhance the efficiency of processing long-context language tasks. The UltraLong-8B models are open-sourced, allowing users to run them locally, and they demonstrate strong performance across various benchmarks, including MMLU, GSM8K, and HumanEval. This advancement is part of a broader trend in artificial intelligence, as enterprises seek to leverage longer context windows for improved usability and performance, despite challenges such as increased latency and costs associated with larger models.
A look at the limits of large context LLMs, as their increased latency, higher costs, and reduced usability result in diminishing returns for enterprises (VentureBeat) https://t.co/cRYgTJkqT7 https://t.co/HiawNsi5i4 https://t.co/ZOzeer2dpR
LLMs are still limited by short context windows. @NVIDIAAI just open-sourced models that extend Llama-3.1-8B to 4M tokens — while passing MMLU, GSM8K, and HumanEval. Efficient, reproducible, and strong across both short and long contexts. 🧵 https://t.co/n1Df1wmeMh
Llama-3.1-8B-UltraLong: 4M context — that's 30× longer than the base model. It enables full-document, multi-document, or long-reasoning tasks with no chunking. @NVIDIAAI To run it locally, click "Use this model" on @huggingface and select Jan: https://t.co/ovK0PYWZ5t