On October 1, 2024, NVIDIA announced the release of NVLM-1.0-D-72B, an open-source frontier-class multimodal large language model (LLM) with a decoder-only architecture. The model achieves state-of-the-art results on vision-language tasks and text-only tasks, rivaling leading proprietary models such as GPT-4, Llama 3-V 405B, and InternVL 2. NVLM-D-72B demonstrates impressive performance in math and coding evaluations, comparable to Llama 3.1 405B, and includes vision capabilities. The model and inference scripts are available on Hugging Face, and inference can be run with the latest version of transformers. A research paper detailing the model is also available. NVIDIA plans to release training code and additional models, NVLM-1.0-X and NVLM-1.0-H, in the near future, and encourages users to stay tuned for further updates.
Nvidia NVLM a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models and open-access models (e.g., Llama 3-V 405B and InternVL 2). https://t.co/iInwudtUw8
Nvidia details NVLM 1.0, a family of LLMs that is led by the 72B parameter NVLM-D-72B and can handle vision and language tasks while enhancing text-only tasks (@michaelfnunez / VentureBeat) https://t.co/kWziB4T3WZ 📫 Subscribe: https://t.co/OyWeKSRpIM https://t.co/hTCCeE1zrK
Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4: Nvidia has released NVLM 1.0, a powerful open-source AI model that rivals GPT-4 and Google’s systems, marking a major breakthrough in… https://t.co/PxWbIsgyRL #AI #Automation