Multi Head Attention Enhanced Inception v3 for Cardiomegaly Detection

arXiv — cs.CV•Wednesday, November 26, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new approach utilizing multi-head attention and the Inception v3 model has been developed for the automatic detection of cardiomegaly through X-ray images. This method integrates deep learning tools and attention mechanisms, enhancing the accuracy and efficiency of diagnosing cardiovascular diseases by leveraging a robust data collection phase and preprocessing techniques to improve image quality.
This advancement is significant as it addresses the growing need for precise and automated diagnostic tools in healthcare, particularly in the early detection of cardiomegaly, which can lead to serious cardiovascular conditions. The integration of advanced deep learning techniques aims to improve diagnostic accuracy, ultimately benefiting patient outcomes.
The development reflects a broader trend in medical imaging where deep learning and attention mechanisms are increasingly employed to enhance image analysis across various modalities. Similar methodologies are being explored in other areas, such as orthopedic procedures and landmark detection, indicating a shift towards more automated and precise diagnostic processes in the medical field.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.LGa day ago

Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation

PositiveArtificial Intelligence

A recent study highlights the importance of automated crack detection in preserving cultural heritage artifacts through the use of semantic segmentation techniques. The research focuses on evaluating various U-Net architectures for pixel-level crack identification on statues and monuments, utilizing the OmniCrack30k dataset for quantitative assessments and real-world evaluations.

Read full article

via arXiv — cs.LG

arXiv — cs.CVa day ago

Coupled Physics-Gated Adaptation: Spatially Decoding Volumetric Photochemical Conversion in Complex 3D-Printed Objects

PositiveArtificial Intelligence

A new framework called Coupled Physics-Gated Adaptation (C-PGA) has been introduced to predict photochemical conversion in complex 3D-printed objects, utilizing a large dataset of optically printed specimens. This innovative approach addresses the limitations of conventional vision models in understanding the coupled interactions of optical and material physics that influence chemical states.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Vision-Language Models for Automated 3D PET/CT Report Generation

PositiveArtificial Intelligence

A new framework named PETRG-3D has been proposed for automated 3D PET/CT report generation, addressing the growing need for efficient reporting in oncology due to a shortage of trained specialists. This model utilizes a dual-branch architecture to separately encode PET and CT volumes while incorporating style-adaptive prompts to standardize reporting across different hospitals.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

HVAdam: A Full-Dimension Adaptive Optimizer

PositiveArtificial Intelligence

HVAdam, a novel full-dimension adaptive optimizer, has been introduced to address the performance gap between adaptive optimizers like Adam and non-adaptive methods such as SGD, particularly in training large-scale models. The new optimizer features continuously tunable adaptivity and a mechanism called incremental delay update (IDU) to enhance convergence across diverse optimization landscapes.

Read full article

via arXiv — cs.LG

arXiv — cs.CLa day ago

AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs

PositiveArtificial Intelligence

AraFinNews has been introduced as the largest publicly available Arabic financial news dataset, featuring 212,500 article-headline pairs from 2015 to 2025, aimed at enhancing Arabic financial text summarization using large language models (LLMs). The dataset serves as a benchmark for evaluating language understanding and generation in financial contexts, particularly through transformer-based models like mT5, AraT5, and FinAraT5.

Read full article

via arXiv — cs.CL

arXiv — cs.LGa day ago

FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models

PositiveArtificial Intelligence

FedPromo introduces a federated learning framework that allows for the efficient adaptation of large-scale foundation models to new domains by optimizing lightweight proxy models on client devices, significantly reducing computational demands while preserving data privacy.

Read full article

via arXiv — cs.LG

arXiv — cs.CL2 days ago

CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework

PositiveArtificial Intelligence

The introduction of CommonVoice-SpeechRE marks a significant advancement in Speech Relation Extraction (SpeechRE) by providing a large-scale dataset of nearly 20,000 real human speech samples, addressing the limitations of existing synthetic datasets. This new benchmark aims to enhance the extraction of relation triplets directly from speech, which has been a challenge due to the lack of diversity in previous datasets.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection

PositiveArtificial Intelligence

DualGazeNet has been introduced as a biologically inspired dual-gaze query network aimed at enhancing salient object detection (SOD) while minimizing architectural complexity. This framework seeks to overcome challenges faced by existing SOD methods, which often suffer from feature redundancy and performance bottlenecks due to their intricate designs. By simplifying the architecture, DualGazeNet aims to achieve state-of-the-art accuracy and computational efficiency.

Read full article

via arXiv — cs.CV