Non-Euclidean SGD for Structured Optimization: Unified Analysis and Improved Rates

arXiv — cs.LGMonday, November 17, 2025 at 5:00:00 AM

Was this article worth reading? Share it

Recommended Readings
Convergence Bound and Critical Batch Size of Muon Optimizer
PositiveArtificial Intelligence
The paper titled 'Convergence Bound and Critical Batch Size of Muon Optimizer' presents a theoretical analysis of the Muon optimizer, which has shown strong empirical performance and is proposed as a successor to AdamW. The study provides convergence proofs for Muon across four practical settings, examining its behavior with and without Nesterov momentum and weight decay. It highlights that the inclusion of weight decay results in tighter theoretical bounds and identifies the critical batch size that minimizes training costs, validated through experiments in image classification and language modeling.