arXiv:2510.20905v1 Announce Type: new 
Abstract: Stochastic gradient descent (SGD) and its variants enable modern artificial intelligence. However, theoretical understanding lags far behind their empirical success. It is widely believed that SGD has a curious ability to avoid sharp local minima in the loss landscape, which are associated with poor generalization. To unravel this mystery and further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics. In this paper, we develop a set of technical machinery based on the recent large deviations and metastability analysis in Wang and Rhee (2023) and obtain sharp characterization of the global dynamics of heavy-tailed SGDs. In particular, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and achieve better generalization performance for the test data. Simulation and deep learning experiments confirm our theoretical prediction that heavy-tailed SGD with gradient clipping finds local minima with a more flat geometry and achieves better generalization performance.

تستكشف دراسة جديدة ديناميات الانحدار العشوائي (SGD) في المناظر الطبيعية غير المحدبة للخسارة، موضحةً قدرته على تجنب القيعان المحلية الحادة التي تعيق التعميم. هذه الأبحاث مهمة لأنها لا تعزز فقط فهمنا النظري لـ SGD، بل تهدف أيضًا إلى تحسين أدائه في تطبيقات الذكاء الاصطناعي. من خلال معالجة الفجوة بين النجاح التجريبي والمعرفة النظرية، يمكن أن يؤدي هذا العمل إلى أنظمة ذكاء اصطناعي أكثر قوة، مما يجعله مساهمة مهمة في هذا المجال.

Un nuevo estudio explora la dinámica del descenso de gradiente estocástico (SGD) en paisajes de pérdida no convexos, arrojando luz sobre su capacidad para evitar mínimos locales agudos que obstaculizan la generalización. Esta investigación es crucial, ya que no solo mejora nuestra comprensión teórica del SGD, sino que también busca mejorar su rendimiento en aplicaciones de inteligencia artificial. Al abordar la brecha entre el éxito empírico y el conocimiento teórico, este trabajo podría llevar a sistemas de IA más robustos, lo que lo convierte en una contribución significativa al campo.

Une nouvelle étude explore la dynamique de la descente de gradient stochastique (SGD) dans des paysages de perte non convexes, mettant en lumière sa capacité à éviter les minima locaux aigus qui nuisent à la généralisation. Cette recherche est cruciale car elle améliore non seulement notre compréhension théorique du SGD, mais vise également à améliorer ses performances dans les applications d'intelligence artificielle. En comblant le fossé entre le succès empirique et la connaissance théorique, ce travail pourrait conduire à des systèmes d'IA plus robustes, ce qui en fait une contribution significative au domaine.

A new study explores the dynamics of stochastic gradient descent (SGD) in nonconvex loss landscapes, shedding light on its ability to avoid sharp local minima that hinder generalization. This research is crucial as it not only enhances our theoretical understanding of SGD but also aims to improve its performance in artificial intelligence applications. By addressing the gap between empirical success and theoretical knowledge, this work could lead to more robust AI systems, making it a significant contribution to the field.

Global Dynamics of Heavy-Tailed SGDs in Nonconvex Loss Landscape: Characterization and Control

arXiv:2601.08039v1 Announce Type: new 
Abstract: In this paper, we study Riemannian zeroth-order optimization in settings where the underlying Riemannian metric $g$ is geodesically incomplete, and the goal is to approximate stationary points with respect to this incomplete metric. To address this challenge, we construct structure-preserving metrics that are geodesically complete while ensuring that every stationary point under the new metric remains stationary under the original one. Building on this foundation, we revisit the classical symmetric two-point zeroth-order estimator and analyze its mean-squared error from a purely intrinsic perspective, depending only on the manifold's geometry rather than any ambient embedding. Leveraging this intrinsic analysis, we establish convergence guarantees for stochastic gradient descent with this intrinsic estimator. Under additional suitable conditions, an $\epsilon$-stationary point under the constructed metric $g'$ also corresponds to an $\epsilon$-stationary point under the original metric $g$, thereby matching the best-known complexity in the geodesically complete setting. Empirical studies on synthetic problems confirm our theoretical findings, and experiments on a practical mesh optimization task demonstrate that our framework maintains stable convergence even in the absence of geodesic completeness.

دراسة حديثة تقدم تقدمًا في تحسين الزرث-أوردر ريمان، مع التركيز على تقريب النقاط الثابتة في الفضاءات غير المكتملة جغرافيًا. يقترح المؤلفون مقاييس تحافظ على الهيكل تضمن بقاء النقاط الثابتة تحت المقياس الجديد ثابتة تحت المقياس الأصلي، مما يعزز تحليل متوسط مربع الخطأ لمقدّر النقاط الثابتة المتماثل.

Un estudio reciente presenta avances en la optimización de zeroth-order riemanniana, centrándose en la aproximación de puntos estacionarios en variedades geodésicamente incompletas. Los autores proponen métricas que preservan la estructura, asegurando que los puntos estacionarios bajo la nueva métrica permanezcan estacionarios bajo la métrica original, mejorando así el análisis del error cuadrático medio del estimador simétrico de dos puntos.

Une étude récente présente des avancées dans l'optimisation de zeroth-order riemannienne, en se concentrant sur l'approximation des points stationnaires dans des variétés géodésiquement incomplètes. Les auteurs proposent des métriques préservant la structure qui garantissent que les points stationnaires sous la nouvelle métrique restent stationnaires sous la métrique originale, améliorant ainsi l'analyse de l'erreur quadratique moyenne de l'estimateur symétrique à deux points.

A recent study presents advancements in Riemannian zeroth-order optimization, focusing on approximating stationary points in geodesically incomplete manifolds. The authors propose structure-preserving metrics that ensure stationary points under the new metric remain stationary under the original metric, enhancing the classical symmetric two-point zeroth-order estimator's mean-squared error analysis.

Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds

One More Thing in AI – Your Shortcut to AI Mastery

Global Dynamics of Heavy-Tailed SGDs in Nonconvex Loss Landscape: Characterization and Control

Was this article worth reading? Share it

One More Thing in AI

LucidQuery AI

SVGenius

Dyad

Portfolio Backtest

GPTHumanizer

Ready to build your own newsroom?