Investigating Hallucination in Conversations for Low Resource Languages

arXiv — cs.CL•Thursday, November 20, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

The study explores hallucinations in Large Language Models (LLMs) specifically in Hindi, Farsi, and Mandarin, highlighting the varying rates of factual inaccuracies across these languages.
Addressing hallucinations is vital for improving the reliability of LLMs, especially as they are increasingly utilized in diverse applications, including customer support and healthcare.
The findings contribute to ongoing discussions about the challenges of ensuring factual accuracy in AI

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CL5 hours ago

ProRAC: A Neuro-symbolic Method for Reasoning about Actions with LLM-based Progression

PositiveArtificial Intelligence

ProRAC (Progression-based Reasoning about Actions and Change) is a neuro-symbolic framework that utilizes large language models (LLMs) to address reasoning about actions and changes (RAC) problems. The framework extracts essential elements from RAC problems, executes actions progressively to determine the final state, and evaluates queries against this state. Evaluations on various RAC benchmarks indicate that ProRAC demonstrates strong performance across diverse tasks and domains.

Read full article

via arXiv — cs.CL

arXiv — cs.LG5 hours ago

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

PositiveArtificial Intelligence

The paper presents Group Relative Policy Optimization for Representation Model (GRPO-RM), a reinforcement learning method aimed at fine-tuning large language models (LLMs). It establishes a predefined output set to replace token sequence sampling, facilitating the generation of an output group essential for GRPO's optimization. A specialized reward function is also introduced to cater to representation models, with extensive experiments validating the method's effectiveness across various real-world datasets.

Read full article

via arXiv — cs.LG

arXiv — cs.CL5 hours ago

Mathematical Analysis of Hallucination Dynamics in Large Language Models: Uncertainty Quantification, Advanced Decoding, and Principled Mitigation

NeutralArtificial Intelligence

Large Language Models (LLMs) are advanced linguistic tools that can produce outputs that may sound plausible but are often factually incorrect, a phenomenon known as hallucination. This study introduces a mathematical framework to analyze, quantify, and mitigate these hallucinations. It employs probabilistic modeling and Bayesian uncertainty estimation to develop refined metrics and strategies, including contrastive decoding and retrieval-augmented grounding, aimed at enhancing the reliability of LLMs.

Read full article

via arXiv — cs.CL

arXiv — cs.CL5 hours ago

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

PositiveArtificial Intelligence

MedBench v4 introduces a comprehensive benchmarking framework for evaluating Chinese medical language models, multimodal models, and intelligent agents. This cloud-based infrastructure features over 700,000 expert-curated tasks across various medical specialties. The evaluation process includes multi-stage refinement and clinician reviews, with results indicating that while base LLMs score an average of 54.1/100, safety and ethics ratings remain low at 18.4/100.

Read full article

via arXiv — cs.CL

arXiv — cs.CL5 hours ago

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

NeutralArtificial Intelligence

Recent advancements in Large Reasoning Models (LRMs) have shown impressive performance in specialized reasoning tasks. However, a systematic evaluation reveals that acquiring deliberative reasoning capabilities significantly reduces foundational capabilities, leading to declines in helpfulness and harmlessness, along with increased inference costs. Adaptive reasoning methods can alleviate these drawbacks, highlighting the need for more versatile LRMs.

Read full article

via arXiv — cs.CL

arXiv — cs.CV5 hours ago

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

PositiveArtificial Intelligence

Large language models (LLMs) have shown impressive capabilities across various tasks, but their extensive size complicates real-world applications. Traditional pruning methods, like Wanda, require significant manual effort and expert knowledge, leading to high costs. This study introduces AutoPrune, a self-pruning method that allows LLMs to autonomously design optimal pruning algorithms, addressing the challenges of expert dependency and performance degradation due to uniform sparsity.

Read full article

via arXiv — cs.CV

arXiv — cs.CL5 hours ago

ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions

NeutralArtificial Intelligence

ConInstruct is a benchmark designed to evaluate Large Language Models (LLMs) on their ability to detect and resolve conflicts in user instructions. While many existing assessments focus on adherence to instructions, ConInstruct addresses the often-overlooked scenarios where conflicting constraints arise. Initial evaluations show that proprietary LLMs generally perform well in conflict detection, with DeepSeek-R1 and Claude-4.5-Sonnet achieving the highest F1-scores.

Read full article

via arXiv — cs.CL

arXiv — cs.LG5 hours ago

Teaching According to Students' Aptitude: Personalized Mathematics Tutoring via Persona-, Memory-, and Forgetting-Aware LLMs

PositiveArtificial Intelligence

The paper introduces TASA (Teaching According to Students' Aptitude), a personalized mathematics tutoring framework that utilizes Large Language Models (LLMs) to adapt instruction based on students' evolving knowledge and cognitive retention. TASA integrates a structured student persona and event memory to enhance learning by addressing individual proficiency levels and forgetting patterns, aiming to improve the effectiveness of mathematics education.

Read full article

via arXiv — cs.LG