Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators

arXiv — stat.ML•Thursday, November 27, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new approach to training and testing predictive algorithms has been introduced, focusing on the use of multiple splits to enhance model inference. This method allows for more comprehensive data utilization by employing separate subsamples for training and testing, addressing the limitations of traditional sample-splitting techniques. The development includes a new central limit theorem applicable to a wide range of split-sample estimators, ensuring valid statistical inference without restrictions on model complexity.
This advancement is significant as it improves the reliability and reproducibility of predictive models, which are increasingly used across various sectors, including research and policy-making. By leveraging more data for training while maintaining robust testing protocols, the approach aims to enhance the accuracy of model predictions, ultimately benefiting decision-making processes in critical areas such as poverty alleviation and randomized experiments.
The introduction of this method aligns with ongoing efforts in the AI community to optimize model performance and efficiency. It reflects a broader trend towards integrating advanced statistical techniques with machine learning practices, as seen in recent studies exploring model merging, uncertainty estimation, and active learning. These developments underscore the importance of innovative methodologies in addressing the challenges posed by data scarcity and the need for more effective inference strategies in machine learning.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Https

Access multiple AI models seamlessly in one unified chat application.

AI & DataTry the app

FastML

Build and deploy machine learning pipelines with speed and efficiency.

Business & ProductivityTry the app

MicroEstimates

Generate precise cost estimates instantly to maximize your project profitability and efficiency.

AI & DataTry the app

Continue Readings

arXiv — stat.ML16 hours ago

Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification

PositiveArtificial Intelligence

A new geometric framework has been developed to enhance the calibration and uncertainty quantification of multi-class classification models in artificial intelligence. This framework addresses the limitations of existing models that fail to identify unreliable predictions, providing a method for translating calibrated probabilities into actionable uncertainty measures.

Read full article

via arXiv — stat.ML

arXiv — stat.ML16 hours ago

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

NeutralArtificial Intelligence

A comprehensive review on missing data imputation highlights the challenges posed by incomplete datasets across various fields, including healthcare and e-commerce. The study synthesizes decades of research, categorizing imputation methods from classical techniques to modern machine learning approaches, emphasizing the need for a unified framework to address missingness mechanisms and imputation goals.

Read full article

via arXiv — stat.ML

arXiv — cs.LG16 hours ago

From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures

PositiveArtificial Intelligence

A comprehensive analysis of text embedding models has been conducted, revealing the organization of embeddings in space and their impact on model interpretability and downstream task performance. The study introduces Unified Topological Signatures (UTS), a framework that characterizes embedding spaces and predicts model-specific properties, linking topological structure to document retrievability.

Read full article

via arXiv — cs.LG

arXiv — cs.LG16 hours ago

Adaptive Margin RLHF via Preference over Preferences

PositiveArtificial Intelligence

A new approach in reinforcement learning from human feedback (RLHF) has been proposed, focusing on adaptive margin optimization through modeling preferences over preferences. This method aims to enhance generalization and robustness in classification tasks by addressing the limitations of existing margin-based optimization techniques, which often overlook the varying strengths of preferences.

Read full article

via arXiv — cs.LG

arXiv — stat.ML16 hours ago

Emergent Riemannian geometry over learning discrete computations on continuous manifolds

NeutralArtificial Intelligence

A recent study has revealed insights into how neural networks learn to perform discrete computations on continuous data manifolds, specifically through the lens of Riemannian geometry. The research indicates that as neural networks learn, they develop a representational geometry that allows for the discretization of continuous input features and the execution of logical operations on these features.

Read full article

via arXiv — stat.ML

arXiv — cs.LG16 hours ago

Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains

NeutralArtificial Intelligence

A recent study investigates the challenges posed by heterogeneity in Big Data, focusing on classification strategies in both structured (Epsilon) and unstructured (Rest-Mex, IMDB) domains. Utilizing evolutionary and Bayesian optimization methods, the research highlights a 'complexity paradox' where simpler models often outperform complex ones in specific contexts.

Read full article

via arXiv — cs.LG

arXiv — cs.LG16 hours ago

Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

PositiveArtificial Intelligence

A new framework called Decomposition, Thresholding, and Scaling (DTS) has been proposed to enhance model merging for multi-task capabilities while preserving task-specific information. This method utilizes singular value decomposition to retain essential singular values and vectors, minimizing storage overhead and improving performance compared to traditional merging techniques.

Read full article

via arXiv — cs.LG

arXiv — stat.ML16 hours ago

When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing

PositiveArtificial Intelligence

A novel feature selection technique has been proposed that utilizes noise-based hypothesis testing to improve the identification of informative predictors in high-dimensional datasets. This method addresses the limitations of existing techniques, which often rely on heuristics and lack statistical rigor in evaluating feature importance.

Read full article

via arXiv — stat.ML