arXiv:2511.15706v2 Announce Type: replace 
Abstract: Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally, we leverage the recent DINOv3 foundation model along with multiple other insights to make the model more robust and unbiased. In our extensive set of experiments we show that the resulting novel matcher sets a new state-of-the-art, being significantly more accurate than its predecessors. Code is available at https://github.com/Parskatt/romav2

تقدم الورقة 'RoMa v2: Harder Better Faster Denser Feature Matching' تحسينات في مطابقة الميزات الكثيفة، وهي تقنية أساسية لتقدير المطابقات بين صور مشاهد ثلاثية الأبعاد. يقدم المؤلفون بنية جديدة للمطابقة ودالة خسارة، مما يعزز أداء النموذج في السيناريوهات المعقدة. بالإضافة إلى ذلك، يقومون بتنفيذ عملية تدريب على مرحلتين لتسريع التدريب وتقليل استخدام الذاكرة، مستفيدين من نموذج DINOv3 الأساسي.

El artículo 'RoMa v2: Harder Better Faster Denser Feature Matching' presenta avances en el emparejamiento de características densas, una técnica crucial para estimar correspondencias entre imágenes de escenas 3D. Los autores introducen una nueva arquitectura de emparejamiento y una función de pérdida, mejorando el rendimiento del modelo en escenarios complejos. Además, implementan un proceso de entrenamiento en dos etapas para acelerar el entrenamiento y reducir el uso de memoria, aprovechando el modelo base DINOv3.

L'article 'RoMa v2: Harder Better Faster Denser Feature Matching' présente des avancées dans le domaine de l'appariement de caractéristiques denses, une technique essentielle pour estimer les correspondances entre les images de scènes 3D. Les auteurs introduisent une nouvelle architecture d'appariement et une fonction de perte, améliorant ainsi les performances du modèle dans des scénarios complexes. De plus, ils mettent en œuvre un processus de formation en deux étapes pour accélérer l'entraînement et réduire l'utilisation de la mémoire, en s'appuyant sur le modèle de base DINOv3.

The paper 'RoMa v2: Harder Better Faster Denser Feature Matching' presents advancements in dense feature matching, a technique crucial for estimating correspondences between images of 3D scenes. The authors introduce a new matching architecture and loss function, enhancing model performance in complex scenarios. Additionally, they implement a two-stage training process to accelerate training and reduce memory usage, leveraging the DINOv3 foundation model to achieve these improvements.

RoMa v2: Harder Better Faster Denser Feature Matching

arXiv:2601.08078v1 Announce Type: new 
Abstract: Deep learning-based automatic medical image segmentation plays a critical role in clinical diagnosis and treatment planning but remains challenging in few-shot scenarios due to the scarcity of annotated training data. Recently, self-supervised foundation models such as DINOv3, which were trained on large natural image datasets, have shown strong potential for dense feature extraction that can help with the few-shot learning challenge. Yet, their direct application to medical images is hindered by domain differences. In this work, we propose DINO-AugSeg, a novel framework that leverages DINOv3 features to address the few-shot medical image segmentation challenge. Specifically, we introduce WT-Aug, a wavelet-based feature-level augmentation module that enriches the diversity of DINOv3-extracted features by perturbing frequency components, and CG-Fuse, a contextual information-guided fusion module that exploits cross-attention to integrate semantic-rich low-resolution features with spatially detailed high-resolution features. Extensive experiments on six public benchmarks spanning five imaging modalities, including MRI, CT, ultrasound, endoscopy, and dermoscopy, demonstrate that DINO-AugSeg consistently outperforms existing methods under limited-sample conditions. The results highlight the effectiveness of incorporating wavelet-domain augmentation and contextual fusion for robust feature representation, suggesting DINO-AugSeg as a promising direction for advancing few-shot medical image segmentation. Code and data will be made available on https://github.com/apple1986/DINO-AugSeg.

تم اقتراح إطار عمل جديد يسمى DINO-AugSeg لتحسين تقسيم الصور الطبية في سيناريوهات قليلة الأمثلة من خلال الاستفادة من ميزات ذاتية الإشراف تعتمد على DINOv3. يتناول هذا النهج تحدي ندرة بيانات التدريب المعلّمة في البيئات السريرية، باستخدام تعزيز الميزات على مستوى الموجات ودمج المعلومات السياقية لتحسين دقة التقسيم عبر مختلف طرق التصوير مثل التصوير بالرنين المغناطيسي والأشعة المقطعية.

Se ha propuesto un nuevo marco llamado DINO-AugSeg para mejorar la segmentación de imágenes médicas en escenarios de pocos ejemplos, aprovechando características auto-supervisadas basadas en DINOv3. Este enfoque aborda el desafío de la escasez de datos de entrenamiento anotados en entornos clínicos, utilizando una augmentación de características a nivel de wavelet y una fusión guiada por información contextual para mejorar la precisión de la segmentación en diversas modalidades de imagen como MRI y CT.

Un nouveau cadre nommé DINO-AugSeg a été proposé pour améliorer la segmentation d'images médicales en quelques exemples en s'appuyant sur des caractéristiques auto-supervisées basées sur DINOv3. Cette approche répond au défi du manque de données d'entraînement annotées dans les milieux cliniques, en utilisant une augmentation des caractéristiques au niveau des ondelettes et une fusion guidée par des informations contextuelles pour améliorer la précision de la segmentation à travers diverses modalités d'imagerie telles que l'IRM et le CT.

A novel framework named DINO-AugSeg has been proposed to enhance few-shot medical image segmentation by leveraging DINOv3-based self-supervised features. This approach addresses the challenge of limited annotated training data in clinical settings, utilizing wavelet-based feature-level augmentation and contextual information-guided fusion to improve segmentation accuracy across various imaging modalities such as MRI and CT.

Exploiting DINOv3-Based Self-Supervised Features for Robust Few-Shot Medical Image Segmentation

arXiv:2601.00705v2 Announce Type: replace 
Abstract: We introduce RGS-SLAM, a robust Gaussian-splatting SLAM framework that replaces the residual-driven densification stage of GS-SLAM with a training-free correspondence-to-Gaussian initialization. Instead of progressively adding Gaussians as residuals reveal missing geometry, RGS-SLAM performs a one-shot triangulation of dense multi-view correspondences derived from DINOv3 descriptors refined through a confidence-aware inlier classifier, generating a well-distributed and structure-aware Gaussian seed prior to optimization. This initialization stabilizes early mapping and accelerates convergence by roughly 20\%, yielding higher rendering fidelity in texture-rich and cluttered scenes while remaining fully compatible with existing GS-SLAM pipelines. Evaluated on the TUM RGB-D and Replica datasets, RGS-SLAM achieves competitive or superior localization and reconstruction accuracy compared with state-of-the-art Gaussian and point-based SLAM systems, sustaining real-time mapping performance at up to 925 FPS. Project page:https://breeze1124.github.io/rgs-slam-project-page/

يمثل تقديم RGS-SLAM تقدمًا كبيرًا في تكنولوجيا التوطين ورسم الخرائط المتزامنة (SLAM)، حيث يحل محل مرحلة التكثيف التقليدية المعتمدة على المتبقيين بنهج بدء تشغيل كثيف لمرة واحدة. يستخدم هذا الإطار الجديد موصّفات DINOv3 ومصنفًا واعيًا للثقة لتوليد بذور غاوسية قوية للت优化، مما يعزز استقرار رسم الخرائط وسرعة التقارب بنسبة تقارب 20%.

La introducción de RGS-SLAM marca un avance significativo en la tecnología de localización y mapeo simultáneos (SLAM), reemplazando la etapa de densificación tradicional por un enfoque de inicialización densa de un solo disparo. Este nuevo marco utiliza descriptores DINOv3 y un clasificador de inliers consciente de la confianza para generar una semilla gaussiana robusta para la optimización, mejorando la estabilidad del mapeo y la velocidad de convergencia en aproximadamente un 20%.

L'introduction de RGS-SLAM représente une avancée significative dans la technologie de la localisation et de la cartographie simultanées (SLAM), remplaçant l'étape de densification traditionnelle par une approche d'initialisation dense en une seule fois. Ce nouveau cadre utilise des descripteurs DINOv3 et un classificateur d'inliers conscient de la confiance pour générer une graine gaussienne robuste pour l'optimisation, améliorant la stabilité de la cartographie et la vitesse de convergence d'environ 20 %.

The introduction of RGS-SLAM marks a significant advancement in simultaneous localization and mapping (SLAM) technology, replacing the traditional residual-driven densification stage with a one-shot dense initialization approach. This new framework utilizes DINOv3 descriptors and a confidence-aware inlier classifier to generate a robust Gaussian seed for optimization, enhancing mapping stability and convergence speed by approximately 20%.

RoMa v2: Harder Better Faster Denser Feature Matching

Was this article worth reading? Share it

LucidQuery AI

Fakeface

LexiStock AI

Deptho.ai

Attentive AI

Supametas.AI

Ready to build your own newsroom?