Oct 1, 08:03 PM

NVIDIA Releases Open-Source NVLM-D-72B Multimodal LLM with SOTA Performance, Available on Hugging Face

On October 1, 2024, NVIDIA announced the release of NVLM-1.0-D-72B, an open-source frontier-class multimodal large language model (LLM) with a decoder-only architecture. The model achieves state-of-the-art results on vision-language tasks and text-only tasks, rivaling leading proprietary models such as GPT-4, Llama 3-V 405B, and InternVL 2. NVLM-D-72B demonstrates impressive performance in math and coding evaluations, comparable to Llama 3.1 405B, and includes vision capabilities. The model and inference scripts are available on Hugging Face, and inference can be run with the latest version of transformers. A research paper detailing the model is also available. NVIDIA plans to release training code and additional models, NVLM-1.0-X and NVLM-1.0-H, in the near future, and encourages users to stay tuned for further updates.

#NVIDIA #Llama #InternVL #NVLM #Hugging Face

Written with ChatGPT .

Sources

Additional media

Image #1 for story nvidia-releases-open-source-nvlm-d-72b-multimodal-llm-sota-performance-available-1b1edbae

Image #2 for story nvidia-releases-open-source-nvlm-d-72b-multimodal-llm-sota-performance-available-1b1edbae

Image #3 for story nvidia-releases-open-source-nvlm-d-72b-multimodal-llm-sota-performance-available-1b1edbae

NVIDIA Releases Open-Source NVLM-D-72B Multimodal LLM with SOTA Performance, Available on Hugging Face

Sources

Additional media

Similar Stories