UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The introduction of UltraSam marks a significant advancement in automated ultrasound image analysis, a field often hindered by anatomical complexity and a scarcity of annotated data. By compiling the US-43d dataset, which includes over 280,000 images and segmentation masks for more than 50 anatomical structures, researchers have created a robust foundation for training the UltraSam model. This model, an adaptation of the Segment Anything Model (SAM), demonstrates vastly improved performance in prompt-based segmentation across three diverse public datasets. Furthermore, an UltraSam-initialized Vision Transformer has outperformed models initialized with ImageNet, SAM, and MedSAM in various downstream segmentation and classification tasks. This progress not only showcases UltraSam's foundational capabilities but also highlights its potential for fine-tuning in various medical imaging applications, ultimately enhancing the accuracy and efficiency of ultrasound diagnostics.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization
PositiveArtificial Intelligence
The article introduces ERMoE, a new Mixture-of-Experts (MoE) architecture designed to enhance model capacity by addressing challenges in routing and expert specialization. ERMoE reparameterizes experts in an orthonormal eigenbasis and utilizes an 'Eigenbasis Score' for routing, which stabilizes expert utilization and improves interpretability. This approach aims to overcome issues of misalignment and load imbalances that have hindered previous MoE architectures.
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
PositiveArtificial Intelligence
The article introduces Autoregressive Representation Alignment (ARRA), a novel training framework designed to enhance text-to-image generation in autoregressive large language models (LLMs) without altering their architecture. ARRA achieves this by aligning the hidden states of LLMs with visual representations from external models through a global visual alignment loss and a hybrid token. Experimental results demonstrate that ARRA significantly reduces the Fréchet Inception Distance (FID) for models like LlamaGen, indicating improved coherence in generated images.
Enhanced Structured Lasso Pruning with Class-wise Information
PositiveArtificial Intelligence
The paper titled 'Enhanced Structured Lasso Pruning with Class-wise Information' discusses advancements in neural network pruning methods. Traditional pruning techniques often overlook class-wise information, leading to potential loss of statistical data. This study introduces two new pruning schemes, sparse graph-structured lasso pruning with Information Bottleneck (sGLP-IB) and sparse tree-guided lasso pruning with Information Bottleneck (sTLP-IB), aimed at preserving statistical information while reducing model complexity.
Unifying Segment Anything in Microscopy with Vision-Language Knowledge
PositiveArtificial Intelligence
The paper titled 'Unifying Segment Anything in Microscopy with Vision-Language Knowledge' discusses the importance of accurate segmentation in biomedical images. It highlights the limitations of existing models in handling unseen domain data due to a lack of vision-language knowledge. The authors propose a new framework, uLLSAM, which utilizes Multimodal Large Language Models (MLLMs) to enhance segmentation performance. This approach aims to improve generalization capabilities across cross-domain datasets, achieving notable performance improvements.
Heterogeneous Complementary Distillation
NeutralArtificial Intelligence
Heterogeneous Complementary Distillation (HCD) is a proposed framework aimed at improving knowledge distillation (KD) between different neural network architectures, specifically from Vision Transformer (ViT) to ResNet18. Traditional KD methods struggle with the disparities in spatial feature representations, leading to inefficiencies. HCD seeks to address these challenges by integrating complementary features from both teacher and student models to enhance the alignment of representations in shared logits.
UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations
PositiveArtificial Intelligence
Unified Heterogeneous Knowledge Distillation (UHKD) is a proposed framework that enhances knowledge distillation (KD) by utilizing intermediate features in the frequency domain. This approach addresses the limitations of traditional KD methods, which are primarily designed for homogeneous models and struggle in heterogeneous environments. UHKD aims to improve model compression while maintaining accuracy, making it a significant advancement in the field of artificial intelligence.
RiverScope: High-Resolution River Masking Dataset
PositiveArtificial Intelligence
RiverScope is a newly developed high-resolution dataset aimed at improving the monitoring of rivers and surface water dynamics, which are crucial for understanding Earth's climate system. The dataset includes 1,145 high-resolution images covering 2,577 square kilometers, with expert-labeled river and surface water masks. This initiative addresses the challenges of monitoring narrow or sediment-rich rivers that are often inadequately represented in low-resolution satellite data.