PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis

arXiv — cs.CVMonday, October 27, 2025 at 4:00:00 AM
PhysWorld is an innovative framework designed to enhance the simulation of deformable objects in robotics, virtual reality, and augmented reality. By synthesizing realistic physics-based models from limited real-world video data, it addresses a significant challenge in creating accurate world models. This advancement is crucial as it opens up new possibilities for more interactive and realistic simulations, making it easier for developers and researchers to create applications that require a deep understanding of object dynamics.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
STONE: Pioneering the One-to-N Backdoor Threat in 3D Point Cloud
PositiveArtificial Intelligence
Backdoor attacks represent a significant risk to deep learning, particularly in critical 3D applications like autonomous driving and robotics. Current methods primarily focus on static one-to-one attacks, leaving the more versatile one-to-N backdoor threat largely unaddressed. The introduction of STONE (Spherical Trigger One-to-N Backdoor Enabling) marks a pivotal advancement, offering a configurable spherical trigger that can manipulate multiple output labels while maintaining high accuracy in clean data.
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
PositiveArtificial Intelligence
The integration of Large Language Models (LLMs) with 3D vision is revolutionizing robotic perception and autonomy. This approach enhances robotic sensing technologies, allowing machines to understand and interact with complex environments using natural language and spatial awareness. The review discusses the foundational principles of LLMs and 3D data, examines critical 3D sensing technologies, and highlights advancements in scene understanding, text-to-3D generation, and embodied agents, while addressing the challenges faced in this evolving field.
Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion
PositiveArtificial Intelligence
Wave-Former is a new method for high-accuracy 3D shape reconstruction of completely occluded everyday objects. Utilizing millimeter-wave (mmWave) wireless signals, it can penetrate common obstructions and reflect off hidden items. Unlike previous methods that faced limitations in coverage and noise, Wave-Former employs a physics-aware shape completion model to infer full 3D geometry. Its innovative three-stage pipeline connects raw wireless signals with advancements in vision-based shape completion, enhancing applications in robotics, augmented reality, and logistics.
Understanding World or Predicting Future? A Comprehensive Survey of World Models
NeutralArtificial Intelligence
The article discusses the growing interest in world models, particularly in the context of advancements in multimodal large language models like GPT-4 and video generation models such as Sora. It provides a comprehensive review of the literature on world models, which serve to either understand the current state of the world or predict future dynamics. The review categorizes world models based on their functions: constructing internal representations and predicting future states, with applications in generative games, autonomous driving, robotics, and social simulacra.
Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature Interactions
PositiveArtificial Intelligence
Higher-order Neural Additive Models (HONAMs) have been introduced as an advancement over Neural Additive Models (NAMs), which are known for their predictive performance and interpretability. HONAMs address the limitation of NAMs by effectively capturing feature interactions of arbitrary orders, enhancing predictive accuracy while maintaining interpretability, crucial for high-stakes applications. The source code for HONAM is publicly available on GitHub.
Bridging Hidden States in Vision-Language Models
PositiveArtificial Intelligence
Vision-Language Models (VLMs) are emerging models that integrate visual content with natural language. Current methods typically fuse data either early in the encoding process or late through pooled embeddings. This paper introduces a lightweight fusion module utilizing cross-only, bidirectional attention layers to align hidden states from both modalities, enhancing understanding while keeping encoders non-causal. The proposed method aims to improve the performance of VLMs by leveraging the inherent structure of visual and textual data.
SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces?
NeutralArtificial Intelligence
The article discusses the introduction of SurfaceBench, a new benchmark for symbolic surface discovery in machine learning. This benchmark addresses the challenge of equation discovery from data, which is crucial for understanding complex physical and geometric phenomena. SurfaceBench includes 183 tasks across 15 categories of symbolic complexity, featuring various equation representation forms and synthetic three-dimensional data. It aims to improve upon existing benchmarks that often focus on scalar functions and rely on inadequate metrics.
Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning
PositiveArtificial Intelligence
The paper titled 'Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning' introduces a new method called Bias-REstrained Prefix Representation FineTuning (BREP ReFT). This approach aims to enhance the mathematical reasoning capabilities of models by addressing the limitations of existing Representation finetuning (ReFT) methods, which struggle with mathematical tasks. The study demonstrates that BREP ReFT outperforms both standard ReFT and weight-based Parameter-Efficient finetuning (PEFT) methods through extensive experiments.