World PulseNowPowered by AI

Trending:

ProRAC: A Neuro-symbolic Method for Reasoning about Actions with LLM-based Progression

arXiv — cs.CL•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

ProRAC has been introduced as a neuro
This development is significant as it enhances the capabilities of LLMs in reasoning tasks, potentially leading to improved applications in AI systems that require complex decision
The emergence of ProRAC aligns with ongoing advancements in AI, particularly in enhancing reasoning capabilities through innovative frameworks, which may address challenges faced by existing models in various domains.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings

Investigating Hallucination in Conversations for Low Resource Languages

arXiv — cs.CL5 hours ago

Investigating Hallucination in Conversations for Low Resource Languages

NeutralArtificial Intelligence

Large Language Models (LLMs) have shown exceptional ability in text generation but often produce factually incorrect statements, known as 'hallucinations'. This study investigates hallucinations in conversational data across three low-resource languages: Hindi, Farsi, and Mandarin. The analysis of various LLMs, including GPT-3.5 and GPT-4o, reveals that while Mandarin has few hallucinated responses, Hindi and Farsi exhibit significantly higher rates of inaccuracies.

Read full article

via arXiv — cs.CL

LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

arXiv — cs.CL5 hours ago

LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

PositiveArtificial Intelligence

LiveCLKTBench is an automated generation pipeline designed to evaluate cross-lingual knowledge transfer in large language models (LLMs). It isolates and measures knowledge transfer by identifying time-sensitive knowledge entities, filtering them based on temporal occurrence, and generating factual questions translated into multiple languages. The evaluation of several LLMs across five languages reveals that cross-lingual transfer is influenced by linguistic distance and is often asymmetric.

Read full article

via arXiv — cs.CL

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

arXiv — cs.CV5 hours ago

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

PositiveArtificial Intelligence

Large language models (LLMs) have shown impressive capabilities across various tasks, but their extensive size complicates real-world applications. Traditional pruning methods, like Wanda, require significant manual effort and expert knowledge, leading to high costs. This study introduces AutoPrune, a self-pruning method that allows LLMs to autonomously design optimal pruning algorithms, addressing the challenges of expert dependency and performance degradation due to uniform sparsity.

Read full article

via arXiv — cs.CV

HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning

arXiv — cs.CL5 hours ago

HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning

PositiveArtificial Intelligence

HSKBenchmark introduces a novel benchmark for modeling and assessing Chinese second language acquisition (SLA) using large language models (LLMs). This benchmark addresses the challenges of traditional language acquisition experiments, which are often impractical and ethically complex. HSKBenchmark encompasses HSK levels 3 to 6, featuring authentic textbooks and a comprehensive evaluation system, thereby enhancing the interpretability and scalability of LLMs in SLA.

Read full article

via arXiv — cs.CL

HalluClean: A Unified Framework to Combat Hallucinations in LLMs

arXiv — cs.CL5 hours ago

HalluClean: A Unified Framework to Combat Hallucinations in LLMs

PositiveArtificial Intelligence

HalluClean is a new framework designed to detect and correct hallucinations in large language models (LLMs). This task-agnostic approach enhances the reliability of LLM-generated text by decomposing the process into planning, execution, and revision stages. HalluClean utilizes minimal task-routing prompts for zero-shot generalization across various domains, significantly improving factual consistency in outputs.

Read full article

via arXiv — cs.CL

Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models

arXiv — cs.CL5 hours ago

Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models

PositiveArtificial Intelligence

Instruction tuning is a crucial method for aligning large language models (LLMs) with human intentions and safety requirements. This survey outlines the entire process, including data collection methods, fine-tuning strategies, and evaluation protocols. It categorizes data construction into expert annotation, distillation from larger models, and self-improvement mechanisms, each with unique trade-offs. The study also addresses challenges in evaluating model performance across multilingual and multimodal contexts.

Read full article

via arXiv — cs.CL

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

arXiv — cs.CL5 hours ago

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

PositiveArtificial Intelligence

MedBench v4 introduces a comprehensive benchmarking framework for evaluating Chinese medical language models, multimodal models, and intelligent agents. This cloud-based infrastructure features over 700,000 expert-curated tasks across various medical specialties. The evaluation process includes multi-stage refinement and clinician reviews, with results indicating that while base LLMs score an average of 54.1/100, safety and ethics ratings remain low at 18.4/100.

Read full article

via arXiv — cs.CL

Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities

arXiv — cs.CV5 hours ago

Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities

PositiveArtificial Intelligence

The study presents a benchmark for detecting long-term periodic workflows in human activities, addressing a gap in existing research. It includes 580 multimodal activity sequences and supports tasks such as unsupervised workflow detection and procedural anomaly detection. The proposed lightweight model aims to enhance understanding of complex human behaviors over extended periods.

Read full article

via arXiv — cs.CV