World PulseNowPowered by AI

Trending:

Do Large Language Models (LLMs) Understand Chronology?

arXiv — cs.LG•Wednesday, November 19, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study assessed the chronological understanding of large language models (LLMs) like GPT
The implications of these findings are significant for industries relying on LLMs for data analysis and decision
The challenges faced by LLMs in understanding chronology reflect broader issues in AI, including the need for improved training methodologies and frameworks to enhance their reasoning capabilities, especially in complex tasks involving temporal data.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings

Automatic Fact-checking in English and Telugu

arXiv — cs.CL10 hours ago

Automatic Fact-checking in English and Telugu

NeutralArtificial Intelligence

The research paper explores the challenge of false information and the effectiveness of large language models (LLMs) in verifying factual claims in English and Telugu. It presents a bilingual dataset and evaluates various approaches for classifying the veracity of claims. The study aims to enhance the efficiency of fact-checking processes, which are often labor-intensive and time-consuming.

Read full article

via arXiv — cs.CL

10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

arXiv — cs.LG10 hours ago

10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

PositiveArtificial Intelligence

10Cache is a new tensor caching and migration system designed to enhance the training of large language models (LLMs) in cloud environments. It addresses the challenges of memory bottlenecks associated with GPUs by optimizing memory usage across GPU, CPU, and NVMe tiers. By profiling tensor execution order and constructing prefetch policies, 10Cache improves memory efficiency and reduces training time and costs, making large-scale LLM training more feasible.

Read full article

via arXiv — cs.LG

Mitigating Label Length Bias in Large Language Models

arXiv — cs.CL10 hours ago

Mitigating Label Length Bias in Large Language Models

PositiveArtificial Intelligence

Large language models (LLMs) exhibit label length bias, where labels of varying lengths are treated inconsistently despite normalization efforts. This paper introduces normalized contextual calibration (NCC), a method that normalizes predictions at the full-label level, effectively addressing this bias. NCC demonstrates statistically significant improvements across multiple datasets and models, achieving up to 10% gains in F1 scores. Additionally, it extends bias mitigation to tasks like multiple-choice question answering, showing reduced sensitivity to few-shot example selection.

Read full article

via arXiv — cs.CL

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

arXiv — cs.CL10 hours ago

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

PositiveArtificial Intelligence

MedBench v4 is a new benchmarking infrastructure designed to evaluate Chinese medical language models, multimodal models, and intelligent agents. It features over 700,000 expert-curated tasks across various specialties, with evaluations conducted by clinicians from more than 500 institutions. The study assessed 15 advanced models, revealing that base LLMs scored an average of 54.1/100, while safety and ethics ratings were notably low at 18.4/100. Multimodal models performed even worse, indicating a need for improved evaluation frameworks in medical AI.

Read full article

via arXiv — cs.CL

SERL: Self-Examining Reinforcement Learning on Open-Domain

arXiv — cs.LG10 hours ago

SERL: Self-Examining Reinforcement Learning on Open-Domain

PositiveArtificial Intelligence

Self-Examining Reinforcement Learning (SERL) is a proposed framework that addresses challenges in applying Reinforcement Learning (RL) to open-domain tasks. Traditional methods face issues with subjectivity and reliance on external rewards. SERL innovatively positions large language models (LLMs) as both Actor and Judge, utilizing internal reward mechanisms. It employs Copeland-style pairwise comparisons to enhance the Actor's capabilities and introduces a self-consistency reward to improve the Judge's reliability, aiming to advance RL applications in open domains.

Read full article

via arXiv — cs.LG

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

arXiv — cs.CL10 hours ago

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

PositiveArtificial Intelligence

Recent advancements in vision-language models (VLMs) have utilized large language models (LLMs) to achieve performance comparable to proprietary systems like GPT-4V. However, deploying these models on resource-constrained devices poses challenges due to high computational requirements. To address this, a new framework called Generation after Recalibration (GenRecal) has been introduced, which distills knowledge from large VLMs into smaller, more efficient models by aligning feature representations across diverse architectures.

Read full article

via arXiv — cs.CL

Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy

arXiv — cs.CV10 hours ago

Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy

PositiveArtificial Intelligence

The integration of Large Language Models (LLMs) with 3D vision is revolutionizing robotic perception and autonomy. This approach enhances robotic sensing technologies, allowing machines to understand and interact with complex environments using natural language and spatial awareness. The review discusses the foundational principles of LLMs and 3D data, examines critical 3D sensing technologies, and highlights advancements in scene understanding, text-to-3D generation, and embodied agents, while addressing the challenges faced in this evolving field.

Read full article

via arXiv — cs.CV

Contextual Learning for Anomaly Detection in Tabular Data

arXiv — cs.LG10 hours ago

Contextual Learning for Anomaly Detection in Tabular Data

PositiveArtificial Intelligence

Anomaly detection is essential in fields like cybersecurity and finance, particularly with large-scale tabular data. Traditional unsupervised methods struggle due to their reliance on a single global distribution, which does not account for the diverse contexts present in real-world data. This paper introduces a contextual learning framework that models normal behavior variations across different contexts, focusing on conditional data distributions instead of a global joint distribution, enhancing anomaly detection effectiveness.

Read full article

via arXiv — cs.LG