🏷️:Latent Action Pretraining from Videos 🔗:https://t.co/izL7pAUOiw https://t.co/kjLi3Kb5jy
[RO] Latent Action Pretraining from Videos S Ye, J Jang, B Jeon, S Joo... [Microsoft Research & NVIDIA] (2024) https://t.co/qP2Et5w9D2 https://t.co/rM396ivSWK
Check out our new work on learning latent action spaces as an effective unsupervised pretraining mechanism for Vision-Language-Action models! https://t.co/bUXkkQNmn3
Researchers have introduced several advancements in AI and machine learning. One notable project is Optima, which focuses on optimizing effectiveness and efficiency for LLM-based multi-agent systems by improving communication and inference scaling laws. Another significant development is LAPA (Latent Action Pretraining from Videos), a method that learns from internet-scale videos without action labels. LAPA, developed by Microsoft Research and NVIDIA, outperforms state-of-the-art models trained with robotic action labels on real-world manipulation tasks and is 30 times more efficient than conventional Vision-Language-Action pretraining. Additionally, TapeAgents is a new framework designed to streamline agent development, with a focus on agent distillation for task-oriented dialogue.