On the Entropy Calibration of Language Models

arXiv — cs.LG•Tuesday, November 18, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

The recent study on entropy calibration in language models reveals significant misalignment between a model's entropy and its log loss on human
Addressing entropy calibration is vital for enhancing the reliability and quality of language models, which are increasingly used in various applications, from content generation to natural language processing tasks.
The ongoing discourse around language model calibration intersects with broader themes of uncertainty quantification and model reliability, as seen in recent advancements aimed at improving the performance and alignment of these models with human expectations.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LGa day ago

Linear time small coresets for k-mean clustering of segments with applications

PositiveArtificial Intelligence

This study addresses the k-means clustering problem for a set of segments in Euclidean space, focusing on finding k centers that minimize the total distance from each point along a segment to a center. The research introduces the first coreset construction that effectively handles arbitrary input segments, allowing for efficient computation in various contexts. The findings have implications for applications such as real-time video tracking and clustering in high-dimensional spaces.

Read full article

via arXiv — cs.LG

arXiv — stat.MLa day ago

FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition

PositiveArtificial Intelligence

The paper titled 'FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition' presents a novel approach to time series forecasting. It addresses the challenges posed by spectral entanglement in non-stationary time series, which complicates the analysis of trends and periodicities. The proposed Frequency Decomposition Network (FreDN) includes a learnable Frequency Disentangler module that effectively separates these components in the frequency domain. Additionally, a ReIm Block is introduced to simplify complex-valued learning, enhancing computational effi…

Read full article

via arXiv — stat.ML

arXiv — cs.LGa day ago

Nearest Neighbor Projection Removal Adversarial Training

PositiveArtificial Intelligence

Deep neural networks have shown remarkable capabilities in image classification but are susceptible to adversarial examples. Traditional adversarial training improves robustness but often overlooks inter-class feature overlap, which contributes to vulnerability. This study introduces a new adversarial training framework that reduces inter-class proximity by projecting out dependencies from both adversarial and clean samples in the feature space. The method enhances feature separability and theoretically lowers the Lipschitz constant of neural networks, improving generalization.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

Codebook-Centric Deep Hashing: End-to-End Joint Learning of Semantic Hash Centers and Neural Hash Function

PositiveArtificial Intelligence

The article presents a novel approach to deep hashing called Center-Reassigned Hashing (CRH), which enhances traditional methods by dynamically reassigning hash centers from a preset codebook. This end-to-end framework optimizes the hash function while avoiding the inefficiencies of local similarity optimization and the complexities of two-stage methods. By adapting hash centers to data distribution without explicit optimization phases, CRH aims to improve performance and streamline the learning process in semantic hashing.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

On the Limitations of Language Targeted Pruning: Investigating the Calibration Language Impact in Multilingual LLM Pruning

NeutralArtificial Intelligence

Recent advancements in large language model (LLM) pruning have demonstrated state-of-the-art compression results without the need for post-training or retraining, while still maintaining high predictive performance. However, prior research predominantly focused on English text for calibration, overlooking the multilingual capabilities of modern LLMs. This paper presents a comprehensive empirical study analyzing the effects of different calibration languages on pruning multilingual models, revealing significant insights into performance and internal representation changes.

Read full article

via arXiv — cs.LG

arXiv — stat.MLa day ago

Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression

NeutralArtificial Intelligence

This paper presents a theoretical analysis of data scaling laws in linear regression, particularly focusing on the effects of training on limited datasets over multiple epochs. It investigates how much larger a dataset must be to achieve the same performance as training on a smaller dataset for multiple epochs. The study introduces the concept of the effective reuse rate, which quantifies the necessary dataset growth for one-pass training to match the test loss of multi-epoch training.

Read full article

via arXiv — stat.ML

arXiv — cs.CLa day ago

QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs

PositiveArtificial Intelligence

The paper titled 'QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs' introduces a new framework called QA-Noun, aimed at capturing noun-centered semantic relations. This framework utilizes nine question templates to address both explicit and implicit roles of nouns, producing interpretable question-answer pairs that enhance existing verbal QA-SRL methods. The authors provide a dataset of over 2,000 annotated noun mentions and a trained model that integrates with QA-SRL, achieving significant coverage of noun arguments and revealing additional contextual relatio…

Read full article

via arXiv — cs.CL

arXiv — cs.LGa day ago

VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction

PositiveArtificial Intelligence

VIR-Bench is a new benchmark designed to evaluate the geospatial and temporal understanding of multimodal large language models (MLLMs) through the reconstruction of travel video itineraries. It consists of 200 travel videos, addressing a gap in current benchmarks that primarily focus on indoor or short-range outdoor activities. The study highlights the challenges faced by state-of-the-art MLLMs in handling extended geospatial-temporal trajectories, which are crucial for real-world applications like AI planning and navigation.

Read full article

via arXiv — cs.LG