
OpenVLA, a new open-source vision-language-action (VLA) model, has been released. Developed from Llama-2 and incorporating Dino features, OpenVLA boasts 7 billion parameters and is trained on 970,000 robot episodes from the Open X-Embodiment dataset. It outperforms existing models like RT-2-X and Octo in zero-shot evaluations while being nearly 10 times smaller than RT-2-X. The model is designed for efficient inference and fine-tuning on a single GPU, utilizing quantization and LoRA. OpenVLA's code, data, and weights are fully available online, including a PyTorch codebase and models on HuggingFace, making it a significant step forward in accessible large-scale robotic learning. The project is expected to drive advancements in both academic and industry settings.
[RO] OpenVLA: An Open-Source Vision-Language-Action Model https://t.co/eCfjgsTeqB - OpenVLA is a 7B parameter open-source vision-language-action model (VLA) trained on 970k robot episodes from the Open X-Embodiment dataset. - It sets a new state-of-the-art for generalist robot… https://t.co/79S0R0eJx4
The OpenVLA project is finally out! Robotics has also been revolutionized by foundation models, but until now, the field did not have open access to any high-quality ones to build on top. I believe this project will open the door for academic and industry advances in robotics. https://t.co/RM68Ck8Svg
Really excited to share OpenVLA! - state-of-the-art robotic foundation model - outperforms RT-2-X in our evals, despite being nearly 10x smaller - code + data + weights open-source Webpage: https://t.co/Y0XU6kX3hl https://t.co/wqQbgG5z8I
