ADAM Optimization with Adaptive Batch Selection

arXiv — stat.ML•Tuesday, December 9, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of Adam with Combinatorial Bandit Sampling (AdamCB) enhances the widely used Adam optimizer by integrating combinatorial bandit techniques, allowing for adaptive sample selection during neural network training. This approach addresses the inefficiencies of treating all data samples equally, leading to improved convergence rates and theoretical guarantees over previous methods.
AdamCB's development is significant as it not only improves the performance of neural network training but also provides a more robust framework for utilizing feedback from multiple samples simultaneously. This advancement could lead to faster training times and better model accuracy, which are critical in the rapidly evolving field of artificial intelligence.
The emergence of AdamCB reflects a broader trend in optimization algorithms, where researchers are increasingly focusing on adaptive methods that leverage advanced sampling techniques. This shift is indicative of ongoing efforts to bridge the performance gap between adaptive and non-adaptive optimizers, as seen in other recent innovations like HVAdam and AdamNX, which also aim to enhance training efficiency and stability.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

One More Thing in AI

Master AI with curated tools and tutorials for practical, real-world applications.

AI & DataView app details

Solvice

Optimize your team's resources with AI-driven scheduling and task management.

AI & DataView app details

Keywords AI

Monitor and optimize your AI models with comprehensive observability tools.

Business & ProductivityView app details

Continue Readings

arXiv — cs.LG2 days ago

Correction of Decoupled Weight Decay

NeutralArtificial Intelligence

A recent study challenges the conventional approach to decoupled weight decay in optimization algorithms, specifically questioning the long-held assumption that it should be proportional to the learning rate. The research suggests that a proportionality to the square of the learning rate may be more appropriate, based on steady-state orthogonality arguments. However, findings indicate minimal impact on training dynamics when the perpendicular component of updates is removed.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics

PositiveArtificial Intelligence

The paper introduces Arc Gradient Descent (ArcGD), a new optimizer that reformulates traditional gradient descent methods to incorporate phase-aware and user-controlled step dynamics. The evaluation of ArcGD shows it outperforming the Adam optimizer on a non-convex benchmark and a real-world ML dataset, particularly in challenging scenarios like the Rosenbrock function and CIFAR-10 image classification.

Read full article

via arXiv — cs.LG

arXiv — stat.ML3 days ago

Stochastic Approximation with Block Coordinate Optimal Stepsizes

NeutralArtificial Intelligence

The recent study on stochastic approximation with block-coordinate optimal stepsizes introduces adaptive stepsize rules designed to minimize the expected distance from an unknown target point. These rules utilize online estimates of the second moment of the search direction, leading to a new method that competes effectively with the widely used Adam algorithm while requiring less memory and fewer hyper-parameters.

Read full article

via arXiv — stat.ML

arXiv — cs.LG3 days ago

FOAM: Blocked State Folding for Memory-Efficient LLM Training

PositiveArtificial Intelligence

The introduction of the Folded Optimizer with Approximate Moment (FOAM) presents a new approach to training large language models (LLMs) by compressing optimizer states through block-wise gradient means and a residual correction mechanism. This method aims to alleviate memory bottlenecks associated with traditional optimizers like Adam, which are often memory-intensive during training.

Read full article

via arXiv — cs.LG