NVIDIA has launched the Llama Nemotron Nano VL, an 8-billion parameter vision-language model optimized for advanced document understanding tasks. This model has achieved the top position on the OCRBench V2 leaderboard, a comprehensive bilingual benchmark featuring four times more tasks than previous versions. The Llama Nemotron Nano VL excels in extracting diverse information from complex documents, including tables, charts, diagrams, and video frames, all while operating efficiently on a single GPU. Additionally, NVIDIA's Parakeet-TDT-0.6B-v2 speech AI model currently ranks first on the Hugging Face ASR leaderboard, highlighting the company's advancements in both document processing and speech recognition technologies.
Our NVIDIA Parakeet-TDT-0.6B-v2 is currently #1 on the @huggingface ASR leaderboard 🏆, alongside four other top-ranking Parakeet models.🦜 Explore how these #opensource speech AI models are setting new benchmarks for accuracy, speed, and versatility. Tech blog with details https://t.co/A8uEjGxFNL
NVIDIA just released Llama-Nemotron-Nano-VL-8B-V1, an 8B vision model that reads dense documents, charts, and video frames. It's #1 on OCRBench V2 (English), with layout and OCR fused end-to-end. https://t.co/Hg5mYYYgu6
NVIDIA AI Releases Llama Nemotron Nano VL: A Compact Vision-Language Model Optimized for Document Understanding NVIDIA has introduced Llama Nemotron Nano VL, a vision-language model (VLM) designed to address document-level understanding tasks with efficiency and precision. Built https://t.co/RZwjFuzYdC