KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA

arXiv — cs.CL•Wednesday, November 19, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

KnowCoder
The development of KnowCoder
This advancement reflects a broader trend in AI research, where there is a growing emphasis on developing models that not only answer questions but also understand and reason through complex data, addressing challenges in truthfulness and reliability in LLM outputs.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CL6 hours ago

ProRAC: A Neuro-symbolic Method for Reasoning about Actions with LLM-based Progression

PositiveArtificial Intelligence

ProRAC (Progression-based Reasoning about Actions and Change) is a neuro-symbolic framework that utilizes large language models (LLMs) to address reasoning about actions and changes (RAC) problems. The framework extracts essential elements from RAC problems, executes actions progressively to determine the final state, and evaluates queries against this state. Evaluations on various RAC benchmarks indicate that ProRAC demonstrates strong performance across diverse tasks and domains.

Read full article

via arXiv — cs.CL

arXiv — cs.CL6 hours ago

ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation

PositiveArtificial Intelligence

The paper presents ReFactX, a scalable method designed to enhance the reliability of Large Language Models (LLMs) by enabling them to access external knowledge without relying on additional models or services. This approach utilizes constrained generation with a prefix-tree index, allowing for efficient retrieval of factual information from a Knowledge Graph. The method aims to address persistent issues of knowledge gaps and hallucinations in LLM outputs.

Read full article

via arXiv — cs.CL

arXiv — cs.CL6 hours ago

Mathematical Analysis of Hallucination Dynamics in Large Language Models: Uncertainty Quantification, Advanced Decoding, and Principled Mitigation

NeutralArtificial Intelligence

Large Language Models (LLMs) are advanced linguistic tools that can produce outputs that may sound plausible but are often factually incorrect, a phenomenon known as hallucination. This study introduces a mathematical framework to analyze, quantify, and mitigate these hallucinations. It employs probabilistic modeling and Bayesian uncertainty estimation to develop refined metrics and strategies, including contrastive decoding and retrieval-augmented grounding, aimed at enhancing the reliability of LLMs.

Read full article

via arXiv — cs.CL

arXiv — cs.CL6 hours ago

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

PositiveArtificial Intelligence

MedBench v4 introduces a comprehensive benchmarking framework for evaluating Chinese medical language models, multimodal models, and intelligent agents. This cloud-based infrastructure features over 700,000 expert-curated tasks across various medical specialties. The evaluation process includes multi-stage refinement and clinician reviews, with results indicating that while base LLMs score an average of 54.1/100, safety and ethics ratings remain low at 18.4/100.

Read full article

via arXiv — cs.CL

arXiv — cs.CV6 hours ago

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

PositiveArtificial Intelligence

Large language models (LLMs) have shown impressive capabilities across various tasks, but their extensive size complicates real-world applications. Traditional pruning methods, like Wanda, require significant manual effort and expert knowledge, leading to high costs. This study introduces AutoPrune, a self-pruning method that allows LLMs to autonomously design optimal pruning algorithms, addressing the challenges of expert dependency and performance degradation due to uniform sparsity.

Read full article

via arXiv — cs.CV

arXiv — cs.LG6 hours ago

Teaching According to Students' Aptitude: Personalized Mathematics Tutoring via Persona-, Memory-, and Forgetting-Aware LLMs

PositiveArtificial Intelligence

The paper introduces TASA (Teaching According to Students' Aptitude), a personalized mathematics tutoring framework that utilizes Large Language Models (LLMs) to adapt instruction based on students' evolving knowledge and cognitive retention. TASA integrates a structured student persona and event memory to enhance learning by addressing individual proficiency levels and forgetting patterns, aiming to improve the effectiveness of mathematics education.

Read full article

via arXiv — cs.LG

arXiv — cs.CL6 hours ago

ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions

NeutralArtificial Intelligence

ConInstruct is a benchmark designed to evaluate Large Language Models (LLMs) on their ability to detect and resolve conflicts in user instructions. While many existing assessments focus on adherence to instructions, ConInstruct addresses the often-overlooked scenarios where conflicting constraints arise. Initial evaluations show that proprietary LLMs generally perform well in conflict detection, with DeepSeek-R1 and Claude-4.5-Sonnet achieving the highest F1-scores.

Read full article

via arXiv — cs.CL

arXiv — cs.LG6 hours ago

Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization

PositiveArtificial Intelligence

The paper introduces Group Turn Policy Optimization (GTPO), a novel reinforcement learning algorithm aimed at enhancing the training of Large Language Models (LLMs) for multi-turn Tool-Integrated Reasoning (TIR). GTPO addresses limitations of existing methods like Group Relative Policy Optimization (GRPO) by implementing turn-level reward assignments, return-based advantage estimation, and self-supervised reward shaping, which collectively improve learning signals for complex interactions.

Read full article

via arXiv — cs.LG