UINO-FSS: Unifying Representation Learning and Few-shot Segmentation via Hierarchical Distillation and Mamba-HyperCorrelation

arXiv — cs.CVThursday, November 20, 2025 at 5:00:00 AM
  • UINO
  • The development of UINO
  • This innovation reflects a broader trend in AI towards creating more adaptable and efficient models, as seen in related works that address segmentation granularity and semantic diversity, indicating a growing emphasis on enhancing model capabilities in complex environments.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Deep Learning for Accurate Vision-based Catch Composition in Tropical Tuna Purse Seiners
PositiveArtificial Intelligence
Purse seiners are essential in tuna fishing, accounting for about 69% of the global catch of tropical tuna. To enhance monitoring, Regional Fisheries Management Organizations have mandated the use of electronic monitoring (EM) alongside traditional observers. However, the identification of tuna species remains challenging for AI systems, which require balanced training data. This study highlights the difficulties experts face in distinguishing between bigeye tuna and yellowfin tuna using EM-captured images.
Unbiased Semantic Decoding with Vision Foundation Models for Few-shot Segmentation
PositiveArtificial Intelligence
The paper presents an Unbiased Semantic Decoding (USD) strategy integrated with the Segment Anything Model (SAM) for few-shot segmentation tasks. This approach aims to enhance the model's generalization ability by extracting target information from both support and query sets simultaneously, addressing the limitations of previous methods that relied heavily on explicit prompts. The study highlights the potential of USD in improving segmentation accuracy across unknown classes.
Multi-Text Guided Few-Shot Semantic Segmentation
PositiveArtificial Intelligence
Recent advancements in few-shot semantic segmentation using CLIP-based methods have highlighted limitations in capturing the semantic diversity of complex categories. The proposed Multi-Text Guided Few-Shot Semantic Segmentation Network (MTGNet) addresses these issues by employing a dual-branch framework that integrates multiple textual prompts, enhancing segmentation performance through refined textual priors and improved cross-modal optimization of visual priors.
D-PerceptCT: Deep Perceptual Enhancement for Low-Dose CT Images
PositiveArtificial Intelligence
D-PerceptCT is a new architecture designed to enhance the quality of Low Dose Computed Tomography (LDCT) images, which are commonly used in medical imaging but often suffer from poor quality due to reduced radiation doses. Traditional enhancement methods can lead to excessive smoothing and loss of important details. D-PerceptCT aims to improve image quality by preserving perceptually relevant features, inspired by the Human Visual System. It includes a Visual Dual-path Extractor (ViDex) to integrate semantic information for better image clarity.
Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning
PositiveArtificial Intelligence
Deep neural networks excel in visual recognition tasks but their large parameter counts hinder practical applications. One-shot pruning has emerged as a solution to reduce model size without retraining. However, aggressive pruning often leads to significant accuracy drops. Existing optimizers like SAM and CrAM help mitigate this issue but require additional computations. The proposed Variance Amplifying Regularizer (VAR) increases parameter variance during training, enhancing pruning robustness and maintaining accuracy.
LENS: Learning to Segment Anything with Unified Reinforced Reasoning
PositiveArtificial Intelligence
LENS is a new reinforcement-learning framework designed for text-prompted image segmentation, enhancing visual understanding crucial for applications in human-computer interaction and robotics. Unlike traditional supervised methods, LENS incorporates explicit chain-of-thought reasoning during testing, improving generalization to unseen prompts. By utilizing a 3-billion-parameter vision-language model, LENS achieves an average cIoU of 81.2% on benchmark datasets, surpassing existing fine-tuning methods.
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity
PositiveArtificial Intelligence
The Segment Anything Model (SAM) has gained popularity as a vision foundation model, but it struggles with controlling segmentation granularity, often requiring manual refinement by users. To overcome this challenge, UnSAMv2 has been introduced, allowing segmentation at any granularity without human annotations. This model builds on the divide-and-conquer strategy of its predecessor, UnSAM, by identifying numerous mask-granularity pairs and implementing a new granularity control embedding for precise segmentation scale management. The model demonstrates effectiveness with only 6,000 unlabeled …