Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion

arXiv — cs.CVWednesday, November 19, 2025 at 5:00:00 AM

Was this article worth reading? Share it

Recommended Readings
RISE: Single Static Radar-based Indoor Scene Understanding
PositiveArtificial Intelligence
The paper introduces RISE, a novel benchmark and system for indoor scene understanding using single static radar technology. Traditional optical sensors face challenges such as occlusions and privacy concerns, while mmWave radar offers privacy but struggles with low spatial resolution. RISE leverages multipath reflections, typically considered noise, to extract geometric information. The proposed Bi-Angular Multipath Enhancement technique models the angles of arrival and departure to recover ghost reflections, enhancing the detection of invisible structures and improving layout reconstruction.
GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction
PositiveArtificial Intelligence
The Geometry-guided Multi-View Diffusion Model (GeoMVD) has been proposed to enhance multi-view image generation, addressing challenges in maintaining cross-view consistency and producing high-resolution outputs. This model utilizes geometric information extraction techniques, including depth maps and normal maps, to create images that are structurally consistent and rich in detail. The advancements in this model hold significant implications for applications in computer vision, such as 3D reconstruction and augmented reality.
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
PositiveArtificial Intelligence
The integration of Large Language Models (LLMs) with 3D vision is revolutionizing robotic perception and autonomy. This approach enhances robotic sensing technologies, allowing machines to understand and interact with complex environments using natural language and spatial awareness. The review discusses the foundational principles of LLMs and 3D data, examines critical 3D sensing technologies, and highlights advancements in scene understanding, text-to-3D generation, and embodied agents, while addressing the challenges faced in this evolving field.
STONE: Pioneering the One-to-N Backdoor Threat in 3D Point Cloud
PositiveArtificial Intelligence
Backdoor attacks represent a significant risk to deep learning, particularly in critical 3D applications like autonomous driving and robotics. Current methods primarily focus on static one-to-one attacks, leaving the more versatile one-to-N backdoor threat largely unaddressed. The introduction of STONE (Spherical Trigger One-to-N Backdoor Enabling) marks a pivotal advancement, offering a configurable spherical trigger that can manipulate multiple output labels while maintaining high accuracy in clean data.
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views
PositiveArtificial Intelligence
The article presents Uni-Hand, a universal hand motion forecasting framework designed for egocentric views. This framework addresses challenges in hand trajectory prediction methods, such as insufficient prediction targets and entangled hand-head motion. By utilizing multi-modal inputs and incorporating vision-language fusion, it aims to enhance applications in augmented reality and human-robot interaction. The framework forecasts hand waypoints in both 2D and 3D spaces, improving the accuracy of motion predictions.
Understanding World or Predicting Future? A Comprehensive Survey of World Models
NeutralArtificial Intelligence
The article discusses the growing interest in world models, particularly in the context of advancements in multimodal large language models like GPT-4 and video generation models such as Sora. It provides a comprehensive review of the literature on world models, which serve to either understand the current state of the world or predict future dynamics. The review categorizes world models based on their functions: constructing internal representations and predicting future states, with applications in generative games, autonomous driving, robotics, and social simulacra.
TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types
PositiveArtificial Intelligence
TEyeD is the world's largest unified public dataset of eye images, featuring over 20 million images collected using seven different head-mounted eye trackers, including devices integrated into virtual and augmented reality systems. The dataset encompasses a variety of activities, such as car rides and sports, and includes detailed annotations like 2D and 3D landmarks, semantic segmentation, and gaze vectors. This resource aims to enhance research in computer vision, eye tracking, and gaze estimation.
SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces?
NeutralArtificial Intelligence
The article discusses the introduction of SurfaceBench, a new benchmark for symbolic surface discovery in machine learning. This benchmark addresses the challenge of equation discovery from data, which is crucial for understanding complex physical and geometric phenomena. SurfaceBench includes 183 tasks across 15 categories of symbolic complexity, featuring various equation representation forms and synthetic three-dimensional data. It aims to improve upon existing benchmarks that often focus on scalar functions and rely on inadequate metrics.