MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
On November 12, 2025, researchers introduced MAC, an innovative optimization method that enhances the training of neural networks by efficiently approximating the Fisher information matrix (FIM). This algorithm addresses the computational challenges associated with second-order optimization methods like KFAC, which, while effective in improving convergence through curvature information, are often resource-intensive. MAC stands out as the first algorithm to apply Kronecker factorization specifically to the FIM of attention layers in transformers, integrating attention scores into its preconditioning process. The study demonstrates that MAC not only converges to global minima under specific conditions but also significantly outperforms KFAC and other state-of-the-art methods in terms of accuracy, training time, and memory usage. This advancement is crucial for the ongoing development of AI technologies, as it promises to streamline the training process and enhance the performance of neur…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about