The Empowerment of Science of Science by Large Language Models: New Tools and Methods

arXiv — cs.CL•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

Large language models (LLMs) are at the forefront of advancements in natural language understanding and generation, with significant implications for the Science of Science (SciSci). This review highlights their core technologies and potential applications in scientific evaluation.
The development of LLMs is crucial as they enhance capabilities in various domains, potentially transforming how scientific research is conducted and evaluated. Their integration into the SciSci framework could lead to more efficient knowledge generation.
The ongoing evolution of LLMs raises important discussions about their reliability and biases, particularly in structured outputs and hallucination mitigation, underscoring the need for robust frameworks to ensure accuracy and diversity in AI

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CL8 hours ago

COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation

PositiveArtificial Intelligence

The COMPASS (Context-Modulated PID Attention Steering System) is introduced as a framework designed to mitigate hallucinations in large language models (LLMs). It incorporates a feedback loop within the decoding process, utilizing the Context Reliance Score (CRS) to assess how attention heads utilize contextual evidence. This system aims to ensure factual consistency in generated outputs without the need for retraining or multiple decoding passes.

Read full article

via arXiv — cs.CL

arXiv — stat.ML8 hours ago

Selective Risk Certification for LLM Outputs via Information-Lift Statistics: PAC-Bayes, Robustness, and Skeleton Design

PositiveArtificial Intelligence

Large language models (LLMs) often generate confident yet incorrect outputs, necessitating reliable uncertainty quantification. This study introduces information-lift certificates that compare model probabilities to a skeleton baseline, utilizing sub-gamma PAC-Bayes bounds effective under heavy-tailed distributions. The method achieved 77.0% coverage at 2% risk across eight datasets, significantly outperforming entropy-based methods in blocking critical errors, making it practical for real-world applications.

Read full article

via arXiv — stat.ML

arXiv — cs.CLa day ago

Mitigating Label Length Bias in Large Language Models

PositiveArtificial Intelligence

Large language models (LLMs) exhibit label length bias, where labels of varying lengths are treated inconsistently despite normalization efforts. This paper introduces normalized contextual calibration (NCC), a method that normalizes predictions at the full-label level, effectively addressing this bias. NCC demonstrates statistically significant improvements across multiple datasets and models, achieving up to 10% gains in F1 scores. Additionally, it extends bias mitigation to tasks like multiple-choice question answering, showing reduced sensitivity to few-shot example selection.

Read full article

via arXiv — cs.CL

ZDNET — Artificial Intelligence2 days ago

You should still learn to code, says top Google AI exec - here's why

NeutralArtificial Intelligence

Andrew Ng, a prominent figure in AI at Google, emphasized the importance of learning to code during an interview at AI Dev 25 in New York. He discussed the future of developers, the significance of responsible AI, and expressed skepticism about the hype surrounding Artificial General Intelligence (AGI). Ng's insights reflect a broader conversation about the evolving role of technology and the skills needed in the AI landscape.

Read full article

via ZDNET — Artificial Intelligence