World PulseNowPowered by AI

Trending:

Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis

arXiv — cs.CV•Tuesday, November 4, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new interactive agent has been developed to enhance the verifiability of AI explanations in critical fields like medicine. By strategically seeking external visual evidence to support its diagnostic reasoning, this agent builds trust through an auditable sequence of actions. Utilizing reinforcement learning, it optimizes its policy for efficiency, making it a significant advancement in ensuring that AI models can provide reliable and understandable explanations, which is crucial for high-stakes decision-making.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings

DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control

arXiv — cs.LG2 days ago

DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control

PositiveArtificial Intelligence

The paper titled 'DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control' discusses the introduction of a disturbance-augmented Markov decision process (DAMDP) to enhance reinforcement learning in robotic control. It addresses the challenges of sim2real transfer, where models trained in simulation often fail to perform effectively in real-world scenarios due to discrepancies in system dynamics. The proposed approach aims to improve the robustness and stabilization of control responses in robotic systems.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm

PositiveArtificial Intelligence

The paper titled 'Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm' addresses vulnerabilities in sequential recommenders, particularly to adversarial attacks. It highlights the Profile Pollution Attack (PPA), which subtly contaminates user interactions to induce mispredictions. The authors propose a new method called CREAT, which combines bi-level optimization with reinforcement learning to enhance the stealthiness and effectiveness of such attacks, overcoming limitations of previous methods.

Read full article

via arXiv — cs.LG

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

arXiv — cs.CL2 days ago

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

PositiveArtificial Intelligence

The article presents Thinker, a hierarchical thinking model designed to enhance the reasoning capabilities of large language models (LLMs) through multi-turn interactions. Unlike previous methods that relied on end-to-end reinforcement learning without supervision, Thinker allows for a more structured reasoning process by breaking down complex problems into manageable sub-problems. Each sub-problem is represented in both natural language and logical functions, improving the coherence and rigor of the reasoning process.

Read full article

via arXiv — cs.CL

LDC: Learning to Generate Research Idea with Dynamic Control

arXiv — cs.CL2 days ago

LDC: Learning to Generate Research Idea with Dynamic Control

PositiveArtificial Intelligence

Recent advancements in large language models (LLMs) highlight their potential in automating scientific research ideation. Current methods often produce ideas that do not meet expert standards of novelty, feasibility, and effectiveness. To address these issues, a new framework is proposed that combines Supervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL) to enhance the quality of generated research ideas through a two-stage approach.

Read full article

via arXiv — cs.CL

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

arXiv — cs.CL2 days ago

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

PositiveArtificial Intelligence

DomainCQA is a proposed framework aimed at enhancing Chart Question Answering (CQA) by focusing on both visual comprehension and knowledge-intensive reasoning. Current benchmarks primarily assess superficial parsing of chart data, neglecting deeper scientific reasoning. The framework has been applied to astronomy, resulting in AstroChart, which includes 1,690 QA pairs across 482 charts. This benchmark reveals significant weaknesses in fine-grained perception, numerical reasoning, and domain knowledge integration among 21 Multimodal Large Language Models (MLLMs).

Read full article

via arXiv — cs.CL

Bridging Hidden States in Vision-Language Models

arXiv — cs.CV2 days ago

Bridging Hidden States in Vision-Language Models

PositiveArtificial Intelligence

Vision-Language Models (VLMs) are emerging models that integrate visual content with natural language. Current methods typically fuse data either early in the encoding process or late through pooled embeddings. This paper introduces a lightweight fusion module utilizing cross-only, bidirectional attention layers to align hidden states from both modalities, enhancing understanding while keeping encoders non-causal. The proposed method aims to improve the performance of VLMs by leveraging the inherent structure of visual and textual data.

Read full article

via arXiv — cs.CV

Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning

arXiv — cs.LG2 days ago

Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning

PositiveArtificial Intelligence

The paper titled 'Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning' introduces a new method called Bias-REstrained Prefix Representation FineTuning (BREP ReFT). This approach aims to enhance the mathematical reasoning capabilities of models by addressing the limitations of existing Representation finetuning (ReFT) methods, which struggle with mathematical tasks. The study demonstrates that BREP ReFT outperforms both standard ReFT and weight-based Parameter-Efficient finetuning (PEFT) methods through extensive experiments.

Read full article

via arXiv — cs.LG

Transformers know more than they can tell -- Learning the Collatz sequence

arXiv — cs.LG2 days ago

Transformers know more than they can tell -- Learning the Collatz sequence

NeutralArtificial Intelligence

The study investigates the ability of transformer models to predict long steps in the Collatz sequence, a complex arithmetic function that maps odd integers to their successors. The accuracy of the models varies significantly depending on the base used for encoding, achieving up to 99.7% accuracy for bases 24 and 32, while dropping to 37% and 25% for bases 11 and 3. Despite these variations, all models exhibit a common learning pattern, accurately predicting inputs with similar residuals modulo 2^p.

Read full article

via arXiv — cs.LG