arXiv:2509.11171v2 Announce Type: replace 
Abstract: Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, assessing voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer from high computational cost and slow convergence when handling large-scale, complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, the Semantic-guided Gaussian Initialization (SGI) module leverages dual-branch 3D scene representations to locate focal voxels as anchors to guide efficient Gaussian initialization. Then, the Physical-aware Harmonics Enhancement (PHE) module incorporates semantic spherical harmonics to model physical-aware contextual details and promote semantic-geometry consistency through focal distribution alignment, generating SSC results with realistic details. Extensive experiments and analyses on the popular SemanticKITTI and SSCBench-KITTI-360 benchmarks validate the effectiveness of SPHERE. The code is available at https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025.

إطار SPHERE، الذي تم تقديمه في منشور حديث على arXiv، يعزز إكمال المشهد السيميائي ثلاثي الأبعاد القائم على الكاميرات من خلال دمج تمثيلات الفوكسل والغوس. تتناول هذه الطريقة قيود الأساليب الحالية، التي تكافح لالتقاط التفاصيل الهندسية الواقعية وتعاني من تكاليف حسابية عالية. تم التحقق من فعالية SPHERE من خلال تجارب شاملة على معايير مثل SemanticKITTI و SSCBench-KITTI-360، مما يمثل تقدمًا كبيرًا في تكنولوجيا القيادة الذاتية.

El marco SPHERE, presentado en una reciente publicación de arXiv, mejora la finalización semántica de escenas 3D basada en cámaras al integrar representaciones voxel y gaussiana. Este enfoque aborda las limitaciones de los métodos existentes, que luchan por capturar detalles geométricos realistas y sufren de altos costos computacionales. La efectividad de SPHERE ha sido validada a través de experimentos extensivos en benchmarks como SemanticKITTI y SSCBench-KITTI-360, marcando un avance significativo en la tecnología de conducción autónoma.

Le cadre SPHERE, présenté dans une récente publication sur arXiv, améliore la complétion sémantique de scènes 3D basée sur des caméras en intégrant des représentations voxel et gaussienne. Cette approche répond aux limites des méthodes existantes, qui peinent à capturer des détails géométriques réalistes et souffrent de coûts computationnels élevés. L'efficacité de SPHERE a été validée par des expériences approfondies sur des benchmarks tels que SemanticKITTI et SSCBench-KITTI-360, marquant une avancée significative dans la technologie de conduite autonome.

The SPHERE framework, introduced in a recent arXiv publication, enhances camera-based 3D Semantic Scene Completion (SSC) by integrating voxel and Gaussian representations. This approach addresses limitations of existing methods, which struggle with capturing realistic geometric details and suffer from high computational costs. SPHERE's effectiveness has been validated through extensive experiments on benchmarks like SemanticKITTI and SSCBench-KITTI-360, marking a significant advancement in autonomous driving technology.

SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion

Was this article worth reading? Share it

Ready to build your own newsroom?