Cohere has launched Aya Vision, a new multimodal AI model designed to understand text and images in 23 languages. Available on Hugging Face, Aya Vision comes in two versions: 8B and 32B parameters. The model is noted for its efficiency, outperforming larger competitors in vision-language tasks while using fewer resources. It features dynamic resizing and Pixel Shuffle technology to enhance image processing. Aya Vision can perform tasks such as image captioning, answering image-based questions, and multilingual translation. The release aims to make advancements in vision-language models accessible to the research community, and just two days post-launch, it is already trending on Hugging Face. Cohere collaborated with Kaggle to provide open weights for the model, further emphasizing its commitment to research accessibility.
Just 2 days after launch, Aya Vision is trending on @huggingface 🔥🔥 We launched open-weights with the goal of making VLM breakthroughs accessible to the research community - so exciting to see such a positive response. https://t.co/GQLAlTyrov https://t.co/0ZuEzNuHmv
We are very excited to partner with the team at @kaggle in releasing Aya Vision as open-weights for the research community. 🎉 It’s been a pleasure working with the Kaggle team to make this happen. 🌍 Available here: https://t.co/9heiphlHFZ https://t.co/dGG6BZzWv4
AI researchers launch AnySense app for gathering visual training data #robotics #automation #AI #RoboticsAINews https://t.co/Zs6q09hGIH https://t.co/ZmYVZBglf0