arXiv:2511.18264v1 Announce Type: new 
Abstract: Existing satellite video tracking methods often struggle with generalization, requiring scenario-specific training to achieve satisfactory performance, and are prone to track loss in the presence of occlusion. To address these challenges, we propose SatSAM2, a zero-shot satellite video tracker built on SAM2, designed to adapt foundation models to the remote sensing domain. SatSAM2 introduces two core modules: a Kalman Filter-based Constrained Motion Module (KFCMM) to exploit temporal motion cues and suppress drift, and a Motion-Constrained State Machine (MCSM) to regulate tracking states based on motion dynamics and reliability. To support large-scale evaluation, we propose MatrixCity Video Object Tracking (MVOT), a synthetic benchmark containing 1,500+ sequences and 157K annotated frames with diverse viewpoints, illumination, and occlusion conditions. Extensive experiments on two satellite tracking benchmarks and MVOT show that SatSAM2 outperforms both traditional and foundation model-based trackers, including SAM2 and its variants. Notably, on the OOTB dataset, SatSAM2 achieves a 5.84% AUC improvement over state-of-the-art methods. Our code and dataset will be publicly released to encourage further research.

تم تقديم SatSAM2 كأداة تتبع فيديو فضائي بدون تدريب مسبق، حيث يستفيد من قدرات SAM2 ويعتمد تقنيات قائمة على مرشح كالمان لتحسين أداء التتبع في الصور الفضائية، خاصة في الظروف الصعبة مثل الحجب.

SatSAM2 se ha presentado como un rastreador de video satelital sin necesidad de entrenamiento previo, aprovechando las capacidades de SAM2 e incorporando técnicas basadas en el filtro de Kalman para mejorar el rendimiento del seguimiento en imágenes satelitales, especialmente en condiciones desafiantes como la oclusión.

SatSAM2 a été introduit comme un traqueur vidéo satellite sans apprentissage préalable, tirant parti des capacités de SAM2 et intégrant des techniques basées sur le filtre de Kalman pour améliorer les performances de suivi dans les images satellites, en particulier dans des conditions difficiles telles que l'occlusion.

SatSAM2 has been introduced as a zero-shot satellite video tracker that leverages the capabilities of SAM2 and incorporates Kalman Filter-based techniques to enhance tracking performance in satellite imagery, particularly in challenging conditions such as occlusion.

SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors

arXiv:2511.19425v1 Announce Type: new 
Abstract: The rapid rise of large-scale foundation models has reshaped the landscape of image segmentation, with models such as Segment Anything achieving unprecedented versatility across diverse vision tasks. However, previous generations-including SAM and its successor-still struggle with fine-grained, low-level segmentation challenges such as camouflaged object detection, medical image segmentation, cell image segmentation, and shadow detection. To address these limitations, we originally proposed SAM-Adapter in 2023, demonstrating substantial gains on these difficult scenarios. With the emergence of Segment Anything 3 (SAM3)-a more efficient and higher-performing evolution with a redesigned architecture and improved training pipeline-we revisit these long-standing challenges. In this work, we present SAM3-Adapter, the first adapter framework tailored for SAM3 that unlocks its full segmentation capability. SAM3-Adapter not only reduces computational overhead but also consistently surpasses both SAM and SAM2-based solutions, establishing new state-of-the-art results across multiple downstream tasks, including medical imaging, camouflaged (concealed) object segmentation, and shadow detection. Built upon the modular and composable design philosophy of the original SAM-Adapter, SAM3-Adapter provides stronger generalizability, richer task adaptability, and significantly improved segmentation precision. Extensive experiments confirm that integrating SAM3 with our adapter yields superior accuracy, robustness, and efficiency compared to all prior SAM-based adaptations. We hope SAM3-Adapter can serve as a foundation for future research and practical segmentation applications. Code, pre-trained models, and data processing pipelines are available.

يمثل تقديم SAM3-Adapter تقدمًا كبيرًا في تكيف نموذج Segment Anything 3، حيث يركز بشكل خاص على التحديات المتعلقة بتجزئة الأجسام المتخفية، واكتشاف الظلال، وتجزئة الصور الطبية. يهدف هذا الإطار الجديد إلى تحسين أداء النموذج في هذه السيناريوهات المعقدة، مع معالجة القيود التي واجهتها الإصدارات السابقة من التكنولوجيا.

La introducción de SAM3-Adapter marca un avance significativo en la adaptación del modelo Segment Anything 3, enfocándose específicamente en los desafíos de la segmentación de objetos camuflados, la detección de sombras y la segmentación de imágenes médicas. Este nuevo marco busca mejorar el rendimiento del modelo en estos escenarios complejos, abordando las limitaciones enfrentadas por las iteraciones anteriores de la tecnología.

L'introduction de SAM3-Adapter marque une avancée significative dans l'adaptation du modèle Segment Anything 3, ciblant spécifiquement les défis de la segmentation d'objets camouflés, de la détection d'ombres et de la segmentation d'images médicales. Ce nouveau cadre vise à améliorer les performances du modèle dans ces scénarios complexes, en s'attaquant aux limitations rencontrées par les itérations précédentes de la technologie.

The introduction of SAM3-Adapter marks a significant advancement in the adaptation of the Segment Anything 3 model, specifically targeting challenges in camouflage object segmentation, shadow detection, and medical image segmentation. This new framework aims to enhance the model's performance in these complex scenarios, addressing limitations faced by previous iterations of the technology.

SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation

arXiv:2511.16766v1 Announce Type: new 
Abstract: Scalable Vector Graphics (SVGs) are central to modern design workflows, offering scaling without distortion and precise editability. However, for single object SVGs, generating multi-view consistent SVGs from a single-view input remains underexplored. We present a three stage framework that produces multi-view SVGs with geometric and color consistency from a single SVG input. First, the rasterized input is lifted to a 3D representation and rendered under target camera poses, producing multi-view images of the object. Next, we extend the temporal memory mechanism of Segment Anything 2 (SAM2) to the spatial domain, constructing a spatial memory bank that establishes part level correspondences across neighboring views, yielding cleaner and more consistent vector paths and color assignments without retraining. Finally, during the raster to vector conversion, we perform path consolidation and structural optimization to reduce redundancy while preserving boundaries and semantics. The resulting SVGs exhibit strong geometric and color consistency across views, significantly reduce redundant paths, and retain fine structural details. This work bridges generative modeling and structured vector representation, providing a scalable route to single input, object level multi-view SVG generation and supporting applications such as asset creation and semantic vector editing.

تم تقديم إطار عمل جديد يسمى SVG360، مما يتيح إنشاء رسومات متجهة قابلة للتطوير (SVG) متعددة المناظر مع اتساق هندسي ولوني من إدخال SVG واحد. تتضمن هذه العملية رفع الإدخال النقطي إلى تمثيل ثلاثي الأبعاد، وإنشاء تطابقات على مستوى الأجزاء عبر المناظر، وتحسين المسارات المتجهة أثناء التحويل.

Se ha presentado un nuevo marco llamado SVG360, que permite la generación de gráficos vectoriales escalables (SVG) de múltiples vistas con consistencia geométrica y de color a partir de una única entrada SVG. Este proceso implica elevar la entrada rasterizada a una representación 3D, establecer correspondencias a nivel de partes entre vistas y optimizar los caminos vectoriales durante la conversión.

Un nouveau cadre nommé SVG360 a été introduit, permettant la génération de graphiques vectoriels évolutifs (SVG) multi-vues avec une cohérence géométrique et colorimétrique à partir d'une seule entrée SVG. Ce processus implique de soulever l'entrée rasterisée vers une représentation 3D, d'établir des correspondances au niveau des parties entre les vues et d'optimiser les chemins vectoriels lors de la conversion.

A new framework named SVG360 has been introduced, enabling the generation of multi-view Scalable Vector Graphics (SVGs) with geometric and color consistency from a single SVG input. This process involves lifting the rasterized input to a 3D representation, establishing part-level correspondences across views, and optimizing vector paths during conversion.

SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors

Was this article worth reading? Share it

Video Toolkit

LangWatch

Videolulu