PET2Rep: Towards Vision-Language Model-Drived Automated Radiology Report Generation for Positron Emission Tomography

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The introduction of PET2Rep marks a pivotal advancement in the automation of radiology report generation for positron emission tomography (PET), a vital imaging technique in oncology and neurology. Traditional report creation is labor-intensive and time-consuming, which can hinder clinical decision-making. Recent developments in vision-language models (VLMs) have shown promise in medical applications, yet their use in PET imaging has been limited. PET2Rep addresses this gap by providing a large-scale benchmark dataset that uniquely captures whole-body image-report pairs with metabolic information. This dataset not only facilitates the evaluation of VLMs in generating accurate and informative reports but also introduces new clinical efficacy metrics to assess the quality of radiotracer uptake descriptions in key organs. By bridging the existing gaps in PET imaging resources, PET2Rep is set to enhance the efficiency and effectiveness of radiology practices.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification
PositiveArtificial Intelligence
A new framework for cascading multi-agent anomaly detection in surveillance systems has been introduced, utilizing vision-language models and embedding-based classification to enhance real-time performance and semantic interpretability. This approach integrates various methodologies, including reconstruction-gated filtering and object-level assessments, to address the complexities of detecting anomalies in dynamic visual environments.
VMMU: A Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark
NeutralArtificial Intelligence
The introduction of VMMU, a Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark, aims to assess the capabilities of vision-language models (VLMs) in interpreting and reasoning over visual and textual information in Vietnamese. This benchmark includes 2.5k multimodal questions across seven diverse tasks, emphasizing genuine multimodal integration rather than text-only cues.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about