X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale

arXiv — cs.CVFriday, December 5, 2025 at 5:00:00 AM
  • The introduction of X-Humanoid marks a significant advancement in the field of embodied AI, enabling the transformation of human videos into humanoid representations at scale. This generative video editing approach utilizes the Wan 2.2 model, adapting it for the human-to-humanoid translation task, which is crucial for training intelligent robots. A scalable data creation pipeline has been established to generate over 17 hours of paired human-humanoid videos, addressing the need for diverse training data.
  • This development is pivotal for enhancing the capabilities of humanoid robots, as it allows for more effective policy training through the use of robotized human videos. By overcoming the limitations of existing methods that primarily overlay robot arms on egocentric videos, X-Humanoid aims to facilitate complex full-body motion handling and improve scene occlusion management, thus broadening the application of humanoid robots in various environments.
  • The emergence of X-Humanoid aligns with ongoing innovations in video generative models and interactive world simulations, highlighting a trend towards more sophisticated AI systems capable of real-time interaction and control. The integration of advanced technologies, such as Unreal Engine and interactive world models, underscores the growing intersection of AI and robotics, which seeks to enhance user experience and operational efficiency in virtual environments.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
RELIC: Interactive Video World Model with Long-Horizon Memory
PositiveArtificial Intelligence
RELIC has been introduced as an innovative interactive world model that integrates real-time long-horizon streaming, consistent spatial memory, and precise user control, addressing the challenges faced by existing models in achieving these functionalities simultaneously.