The Empowerment of Science of Science by Large Language Models: New Tools and Methods

arXiv — cs.CLThursday, November 20, 2025 at 5:00:00 AM
  • Large language models (LLMs) are at the forefront of advancements in natural language understanding and generation, with significant implications for the Science of Science (SciSci). This review highlights their core technologies and potential applications in scientific evaluation.
  • The development of LLMs is crucial as they enhance capabilities in various domains, potentially transforming how scientific research is conducted and evaluated. Their integration into the SciSci framework could lead to more efficient knowledge generation.
  • The ongoing evolution of LLMs raises important discussions about their reliability and biases, particularly in structured outputs and hallucination mitigation, underscoring the need for robust frameworks to ensure accuracy and diversity in AI
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation
PositiveArtificial Intelligence
The COMPASS (Context-Modulated PID Attention Steering System) is introduced as a framework designed to mitigate hallucinations in large language models (LLMs). It incorporates a feedback loop within the decoding process, utilizing the Context Reliance Score (CRS) to assess how attention heads utilize contextual evidence. This system aims to ensure factual consistency in generated outputs without the need for retraining or multiple decoding passes.
Selective Risk Certification for LLM Outputs via Information-Lift Statistics: PAC-Bayes, Robustness, and Skeleton Design
PositiveArtificial Intelligence
Large language models (LLMs) often generate confident yet incorrect outputs, necessitating reliable uncertainty quantification. This study introduces information-lift certificates that compare model probabilities to a skeleton baseline, utilizing sub-gamma PAC-Bayes bounds effective under heavy-tailed distributions. The method achieved 77.0% coverage at 2% risk across eight datasets, significantly outperforming entropy-based methods in blocking critical errors, making it practical for real-world applications.
Mitigating Label Length Bias in Large Language Models
PositiveArtificial Intelligence
Large language models (LLMs) exhibit label length bias, where labels of varying lengths are treated inconsistently despite normalization efforts. This paper introduces normalized contextual calibration (NCC), a method that normalizes predictions at the full-label level, effectively addressing this bias. NCC demonstrates statistically significant improvements across multiple datasets and models, achieving up to 10% gains in F1 scores. Additionally, it extends bias mitigation to tasks like multiple-choice question answering, showing reduced sensitivity to few-shot example selection.
You should still learn to code, says top Google AI exec - here's why
NeutralArtificial Intelligence
Andrew Ng, a prominent figure in AI at Google, emphasized the importance of learning to code during an interview at AI Dev 25 in New York. He discussed the future of developers, the significance of responsible AI, and expressed skepticism about the hype surrounding Artificial General Intelligence (AGI). Ng's insights reflect a broader conversation about the evolving role of technology and the skills needed in the AI landscape.