ViPRA: Video Prediction for Robot Actions
PositiveArtificial Intelligence
ViPRA, a new framework for robot learning, addresses the challenge of training robots using actionless videos, which are abundant but often lack labeled actions. By employing a video-language model, ViPRA predicts future visual observations and latent actions, which serve as intermediate representations of scene dynamics. This method allows for the training of robots using only 100 to 200 teleoperated demonstrations, significantly reducing the need for costly action annotations. The framework not only enables smooth, high-frequency continuous control up to 22 Hz but also demonstrates impressive performance gains, outperforming strong baselines with a 16% improvement on the SIMPLER benchmark and a 13% enhancement in real-world manipulation tasks. ViPRA's approach supports generalization across different robotic embodiments, marking a significant step forward in the field of robotic control and learning.
— via World Pulse Now AI Editorial System