Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

arXiv — cs.CL•Tuesday, November 4, 2025 at 5:00:00 AM

A recent study highlights the importance of transparency in large language models (LLMs), particularly in healthcare. It reveals that many LLMs fail to provide explanations that accurately reflect the reasoning behind their predictions, which can erode clinician trust and potentially lead to unsafe decisions. By examining how inference and training choices impact explanation faithfulness, this research aims to improve the reliability of AI in critical settings, ensuring that healthcare professionals can make informed decisions based on trustworthy AI outputs.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

TechCrunch6 hours ago

Why January Ventures is funding underrepresented AI founders

PositiveArtificial Intelligence

January Ventures is focusing on funding underrepresented AI founders who possess deep expertise in traditional industries like healthcare, manufacturing, and supply chain. The firm aims to address the funding gap that exists in the AI startup ecosystem, particularly in San Francisco, where many promising companies are overlooked. By providing pre-seed checks, January Ventures seeks to empower these founders to innovate and transform their respective sectors.

Read full article

via TechCrunch

arXiv — cs.LG19 hours ago

Fair-GNE : Generalized Nash Equilibrium-Seeking Fairness in Multiagent Healthcare Automation

PositiveArtificial Intelligence

The article discusses Fair-GNE, a framework designed to ensure fair workload allocation among multiple agents in healthcare settings. It addresses the limitations of existing multi-agent reinforcement learning (MARL) approaches that do not guarantee self-enforceable fairness during runtime. By employing a generalized Nash equilibrium (GNE) framework, Fair-GNE enables agents to optimize their decisions while ensuring that no single agent can unilaterally improve its utility, thus promoting equitable resource sharing among healthcare workers.

Read full article

via arXiv — cs.LG

arXiv — cs.LG19 hours ago

Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare

NeutralArtificial Intelligence

The article discusses fairness in multi-agent reinforcement learning (MARL) within healthcare, emphasizing the need for equitable task allocation that considers both workload balance and agent expertise. It introduces FairSkillMARL, a framework that aims to align skill and task distribution to prevent burnout among healthcare workers. Additionally, MARLHospital is presented as a customizable environment for modeling team dynamics and scheduling impacts on fairness, addressing gaps in existing simulators.

Read full article

via arXiv — cs.LG

arXiv — cs.LG19 hours ago

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

PositiveArtificial Intelligence

The Virtual Human Generative Model (VHGM) is a generative model designed to approximate the joint probability of over 2000 healthcare-related human attributes. The core algorithm, VHGM-MAE, is a masked autoencoder specifically developed to manage high-dimensional, sparse healthcare data. It addresses challenges such as data heterogeneity, probability distribution modeling, systematic missingness, and the small-$n$-large-$p$ problem by employing a likelihood-based approach and a transformer-based architecture to capture complex dependencies.

Read full article

via arXiv — cs.LG

arXiv — cs.CL19 hours ago

DataSage: Multi-agent Collaboration for Insight Discovery with External Knowledge Retrieval, Multi-role Debating, and Multi-path Reasoning

PositiveArtificial Intelligence

DataSage is a novel multi-agent framework designed to enhance insight discovery in data analytics. It addresses limitations of existing data insight agents by incorporating external knowledge retrieval, a multi-role debating mechanism, and multi-path reasoning. These features aim to improve the depth of analysis and the accuracy of insights generated, thereby assisting organizations in making informed decisions in a data-driven environment.

Read full article

via arXiv — cs.CL

arXiv — cs.CL19 hours ago

Automatic Fact-checking in English and Telugu

NeutralArtificial Intelligence

The research paper explores the challenge of false information and the effectiveness of large language models (LLMs) in verifying factual claims in English and Telugu. It presents a bilingual dataset and evaluates various approaches for classifying the veracity of claims. The study aims to enhance the efficiency of fact-checking processes, which are often labor-intensive and time-consuming.

Read full article

via arXiv — cs.CL

arXiv — cs.LG19 hours ago

FlakyGuard: Automatically Fixing Flaky Tests at Industry Scale

PositiveArtificial Intelligence

Flaky tests, which unpredictably pass or fail, hinder developer productivity and delay software releases. FlakyGuard is introduced as a solution that leverages large language models (LLMs) to automatically repair these tests. Unlike previous methods like FlakyDoctor, FlakyGuard effectively addresses the context problem by structuring code as a graph and selectively exploring relevant contexts. Evaluation of FlakyGuard on real-world tests indicates a repair success rate of 47.6%, with 51.8% of fixes accepted by developers, marking a significant improvement over existing approaches.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

PositiveArtificial Intelligence

The article presents a new framework called GMAT, which enhances Multiple Instance Learning (MIL) for whole slide image (WSI) classification. By integrating vision-language models (VLMs), GMAT aims to improve the generation of clinical descriptions that are more expressive and medically specific. This addresses limitations in existing methods that rely on large language models (LLMs) for generating descriptions, which often lack domain grounding and detailed medical specificity, thus improving alignment with visual features.

Read full article

via arXiv — cs.CV