arXiv:2511.17207v1 Announce Type: new 
Abstract: Recent advances in dense 3D reconstruction enable the accurate capture of local geometry; however, integrating them into SLAM is challenging due to drift and redundant point maps, which limit efficiency and downstream tasks, such as novel view synthesis. To address these issues, we propose SING3R-SLAM, a globally consistent and compact Gaussian-based dense RGB SLAM framework. The key idea is to combine locally consistent 3D reconstructions with a unified global Gaussian representation that jointly refines scene geometry and camera poses, enabling efficient and versatile 3D mapping for multiple downstream applications. SING3R-SLAM first builds locally consistent submaps through our lightweight tracking and reconstruction module, and then progressively aligns and fuses them into a global Gaussian map that enforces cross-view geometric consistency. This global map, in turn, provides feedback to correct local drift and enhance the robustness of tracking. Extensive experiments demonstrate that SING3R-SLAM achieves state-of-the-art tracking, 3D reconstruction, and novel view rendering, resulting in over 12% improvement in tracking and producing finer, more detailed geometry, all while maintaining a compact and memory-efficient global representation on real-world datasets.

SING3R-SLAM هو إطار جديد يعتمد على Gaussian SLAM RGB الكثيف الذي يدمج إعادة بناء ثلاثية الأبعاد المتسقة محليًا مع تمثيل غاوسي عالمي، مما يعالج التحديات المتعلقة بالانحراف وخرائط النقاط الزائدة. تتيح هذه الطريقة المبتكرة رسم خرائط ثلاثية الأبعاد بكفاءة ومرونة لمجموعة متنوعة من التطبيقات، مما يعزز هندسة المشهد ودقة أوضاع الكاميرا.

SING3R-SLAM es un nuevo marco de SLAM RGB denso basado en Gaussianas que integra reconstrucciones 3D localmente consistentes con una representación gaussiana global, abordando desafíos como la deriva y los mapas de puntos redundantes. Este enfoque innovador permite un mapeo 3D eficiente y versátil para diversas aplicaciones, mejorando la geometría de la escena y la precisión de las poses de la cámara.

SING3R-SLAM est un nouveau cadre SLAM RGB dense basé sur un modèle gaussien qui intègre des reconstructions 3D localement cohérentes avec une représentation gaussienne globale, abordant les défis liés à la dérive et aux cartes de points redondantes. Cette approche innovante permet un mappage 3D efficace et polyvalent pour diverses applications, améliorant la géométrie de la scène et la précision des poses de caméra.

SING3R-SLAM is a newly proposed Gaussian-based dense RGB SLAM framework that integrates locally consistent 3D reconstructions with a global Gaussian representation, addressing challenges in drift and redundant point maps. This innovative approach enables efficient and versatile 3D mapping for various applications, enhancing scene geometry and camera pose accuracy.

SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors

arXiv:2511.18386v1 Announce Type: new 
Abstract: We have introduced SegSplat, a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding. By constructing a compact semantic memory bank from multi-view 2D foundation model features and predicting discrete semantic indices alongside geometric and appearance attributes for each 3D Gaussian in a single pass, SegSplat efficiently imbues scenes with queryable semantics. Our experiments demonstrate that SegSplat achieves geometric fidelity comparable to state-of-the-art feed-forward 3D Gaussian Splatting methods while simultaneously enabling robust open-set semantic segmentation, crucially \textit{without} requiring any per-scene optimization for semantic feature integration. This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments, vital for advancing robotic interaction, augmented reality, and other intelligent systems.

تم تقديم SegSplat كإطار جديد يجمع بين إعادة البناء ثلاثي الأبعاد السريعة والتفسير الدلالي ذو المفردات المفتوحة. يقوم ببناء بنك ذاكرة دلالية مضغوطة من ميزات نموذج 2D متعدد المناظر ويتنبأ بمؤشرات دلالية منفصلة إلى جانب الخصائص الهندسية لكل غاوسي ثلاثي الأبعاد في تمريرة واحدة، مما يعزز كفاءة دمج الدلالات في المشاهد.

SegSplat se ha presentado como un nuevo marco que combina la reconstrucción 3D rápida y de avance con la comprensión semántica de vocabulario abierto. Construye un banco de memoria semántica compacto a partir de características 2D de múltiples vistas y predice índices semánticos discretos junto con atributos geométricos para cada Gaussiano 3D en un solo paso, mejorando la eficiencia de la integración semántica de escenas.

SegSplat a été introduit comme un nouveau cadre qui combine la reconstruction 3D rapide et feed-forward avec une compréhension sémantique à vocabulaire ouvert. Il construit une banque de mémoire sémantique compacte à partir de caractéristiques 2D multi-vues et prédit des indices sémantiques discrets ainsi que des attributs géométriques pour chaque Gaussien 3D en un seul passage, améliorant ainsi l'efficacité de l'intégration sémantique des scènes.

SegSplat has been introduced as a novel framework that combines rapid, feed-forward 3D reconstruction with open-vocabulary semantic understanding. It constructs a compact semantic memory bank from multi-view 2D features and predicts discrete semantic indices alongside geometric attributes for each 3D Gaussian in a single pass, enhancing the efficiency of scene semantic integration.

SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation

arXiv:2510.20776v2 Announce Type: replace 
Abstract: We introduce Cupid, a generative 3D reconstruction framework that jointly models the full distribution over both canonical objects and camera poses. Our two-stage flow-based model first generates a coarse 3D structure and 2D-3D correspondences to estimate the camera pose robustly. Conditioned on this pose, a refinement stage injects pixel-aligned image features directly into the generative process, marrying the rich prior of a generative model with the geometric fidelity of reconstruction. This strategy achieves exceptional faithfulness, outperforming state-of-the-art reconstruction methods by over 3 dB PSNR and 10% in Chamfer Distance. As a unified generative model that decouples the object and camera pose, Cupid naturally extends to multi-view and scene-level reconstruction tasks without requiring post-hoc optimization or fine-tuning.

تم تقديم Cupid كإطار لإعادة البناء ثلاثي الأبعاد التوليدي الذي يقوم بنمذجة توزيع الكائنات القياسية وأوضاع الكاميرا. يقوم هذا النموذج القائم على التدفق في مرحلتين بإنشاء هيكل ثلاثي الأبعاد خشن وتقدير أوضاع الكاميرا، يليه مرحلة تحسين تدمج ميزات الصورة المتوافقة مع البكسل، مما يحقق أداءً متفوقًا مقارنة بالطرق الحالية.

Se ha presentado Cupid como un marco de reconstrucción 3D generativa que modela la distribución de objetos canónicos y poses de cámara. Este modelo basado en flujo en dos etapas genera una estructura 3D gruesa y estima las poses de cámara, seguido de una etapa de refinamiento que integra características de imagen alineadas por píxeles, logrando un rendimiento superior en comparación con los métodos existentes.

Cupid a été introduit comme un cadre de reconstruction 3D génératif qui modélise la distribution des objets canoniques et des poses de caméra. Ce modèle basé sur un flux en deux étapes génère une structure 3D grossière et estime les poses de caméra, suivi d'une étape de raffinement qui intègre des caractéristiques d'image alignées sur les pixels, atteignant des performances supérieures par rapport aux méthodes existantes.

Cupid has been introduced as a generative 3D reconstruction framework that models the distribution of canonical objects and camera poses. This two-stage flow-based model generates a coarse 3D structure and estimates camera poses, followed by a refinement stage that integrates pixel-aligned image features, achieving superior performance compared to existing methods.

SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors

Was this article worth reading? Share it