SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM

arXiv — cs.CV•Wednesday, November 26, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new framework called SAM-MI has been introduced to enhance open-vocabulary semantic segmentation (OVSS) by effectively integrating the Segment Anything Model (SAM) with OVSS models. This framework addresses challenges such as SAM's tendency to over-segment and the difficulties in combining fixed masks with labels, utilizing a Text-guided Sparse Point Prompter for faster mask generation and Shallow Mask Aggregation to reduce over-segmentation.
The development of SAM-MI is significant as it improves the efficiency and accuracy of semantic segmentation tasks, which are crucial for various applications in computer vision, including object recognition and image analysis. By addressing the limitations of previous methods, SAM-MI positions itself as a valuable tool for researchers and practitioners in the field.
This advancement reflects a broader trend in artificial intelligence where models are increasingly being refined to enhance their capabilities in specific tasks. The integration of SAM with other frameworks, such as those focusing on few-shot segmentation and medical image analysis, highlights the ongoing efforts to improve model performance and adaptability across diverse applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CV13 hours ago

SAM Guided Semantic and Motion Changed Region Mining for Remote Sensing Change Captioning

PositiveArtificial Intelligence

The recent study introduces a novel approach to remote sensing change captioning by utilizing the Segment Anything Model (SAM) to enhance the extraction of region-level representations and improve the description of changes between two remote sensing images. This method addresses limitations in existing techniques, such as weak region awareness and limited temporal alignment, by integrating semantic and motion-level change regions into the captioning framework.

Read full article

via arXiv — cs.CV

arXiv — cs.CV13 hours ago

ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images

PositiveArtificial Intelligence

A new framework named ReSAM has been proposed to enhance the Segment Anything Model (SAM) for remote sensing images, addressing the challenges posed by domain shifts and sparse annotations. This self-prompting, point-supervised method employs a Refine-Requery-Reinforce loop to progressively improve segmentation quality without the need for full-mask supervision. The approach has been evaluated on benchmark datasets including WHU, HRSID, and NWPU VHR-10.

Read full article

via arXiv — cs.CV

arXiv — cs.CV13 hours ago

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

PositiveArtificial Intelligence

A new approach named ELE-SAM has been developed to adapt the Segment Anything Model (SAM) for Power Transmission Corridor Hazard Segmentation (PTCHS). This adaptation focuses on improving the segmentation of transmission equipment and surrounding hazards, which is crucial for ensuring the safety of electric power transmission. The method incorporates a Context-Aware Prompt Adapter and a High-Fidelity Mask Decoder to enhance performance in complex backgrounds.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos

PositiveArtificial Intelligence

Image Diffusion Models have demonstrated emergent temporal propagation capabilities in videos, showcasing their potential to enhance video generation and editing processes. This development highlights the growing sophistication of AI technologies in visual media.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models

PositiveArtificial Intelligence

RADSeg has been introduced as a novel approach to open-vocabulary semantic segmentation (OVSS), leveraging the agglomerative vision foundation model RADIO to enhance performance across multiple metrics, including mean Intersection over Union (mIoU) and computational efficiency. This method addresses the limitations of existing models that either depend on limited training data or require extensive computational resources.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting

PositiveArtificial Intelligence

A new framework named SPROUT has been introduced for nuclear instance segmentation, eliminating the need for training and annotations. This method utilizes histology-informed priors to create slide-specific reference prototypes, which help in aligning features and improving segmentation accuracy in computational pathology.

Read full article

via arXiv — cs.CV

arXiv — cs.CV3 days ago

Granular Computing-driven SAM: From Coarse-to-Fine Guidance for Prompt-Free Segmentation

PositiveArtificial Intelligence

A new framework called Granular Computing-driven SAM (Grc-SAM) has been introduced to enhance prompt-free image segmentation, addressing limitations in the existing Segmentation Anything Model (SAM). Grc-SAM employs a coarse-to-fine approach, improving foreground localization and enabling high-resolution segmentation through adaptive feature extraction and fine patch partitioning.

Read full article

via arXiv — cs.CV

arXiv — cs.CV3 days ago

CellFMCount: A Fluorescence Microscopy Dataset, Benchmark, and Methods for Cell Counting

PositiveArtificial Intelligence

A new dataset named CellFMCount has been introduced, consisting of 3,023 images from immunocytochemistry experiments, which includes over 430,000 manually annotated cell locations. This dataset aims to address the challenges of accurate cell counting in biomedical research, particularly in cancer diagnosis and immunology, where traditional manual counting methods are labor-intensive and prone to errors.

Read full article

via arXiv — cs.CV