Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

arXiv — cs.LG•Thursday, December 4, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of the Modified Rectified Power Unit (MRePU) activation function addresses critical issues faced by deep Rectified Power Unit (RePU) networks, such as instability during training due to vanishing or exploding values. This new function retains the advantages of differentiability and universal approximation while ensuring stable training conditions, as demonstrated through extensive theoretical analysis and experiments.
The development of MRePU is significant as it enhances the performance of neural networks, particularly in applications requiring stable training dynamics. By overcoming the limitations of RePU, MRePU offers a promising alternative for researchers and practitioners in the field of artificial intelligence, potentially leading to more robust and efficient neural network architectures.
This advancement reflects a broader trend in the AI community towards improving activation functions to enhance neural network performance. Innovations like SmartMixed and VeLU also aim to optimize activation functions, addressing challenges such as gradient sparsity and dead neurons. The ongoing exploration of physics-informed neural networks further emphasizes the importance of integrating domain knowledge into neural architectures, showcasing a collective effort to refine machine learning methodologies.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataTry the app

Prdkit

AI-powered PRDs to capture and analyze user feedback efficiently.

Marketing & CommerceTry the app

HubRE AI

AI agents that boost user engagement, ensure compliance, and streamline knowledge management.

AI & DataTry the app

Continue Readings

arXiv — cs.LG2 days ago

ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

PositiveArtificial Intelligence

ATHENA, an innovative framework for managing the computational research lifecycle in Scientific Computing and Scientific Machine Learning, has been introduced. It utilizes the HENA loop, a knowledge-driven process that operates as a Contextual Bandit problem, enabling the system to autonomously select actions based on prior trials and expert blueprints.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

CoGraM: Context-sensitive granular optimization method with rollback for robust model fusion

PositiveArtificial Intelligence

CoGraM (Contextual Granular Merging) is a newly introduced optimization method designed to enhance the merging of neural networks without retraining, addressing issues of accuracy and stability that are prevalent in existing methods like Fisher merging. This multi-stage, context-sensitive approach utilizes rollback mechanisms to prevent harmful updates, thereby improving the robustness of the merged network.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Learning to Solve Constrained Bilevel Control Co-Design Problems

NeutralArtificial Intelligence

A new framework for Learning to Optimize (L2O) has been proposed to address the challenges of solving constrained bilevel control co-design problems, which are often complex and time-sensitive. This framework utilizes modern differentiation techniques to enhance the efficiency of finding solutions to these optimization problems.

Read full article

via arXiv — cs.LG

arXiv — stat.ML2 days ago

Comparison of neural network training strategies for the simulation of dynamical systems

PositiveArtificial Intelligence

A recent study has compared two neural network training strategies—parallel and series-parallel training—specifically for simulating nonlinear dynamical systems. The empirical analysis involved five neural network architectures and practical examples, including a pneumatic valve test bench and an industrial robot benchmark. The findings indicate that while series-parallel training is prevalent, parallel training offers superior long-term prediction accuracy.

Read full article

via arXiv — stat.ML

arXiv — cs.LG3 days ago

Modeling and Inverse Identification of Interfacial Heat Conduction in Finite Layer and Semi-Infinite Substrate Systems via a Physics-Guided Neural Framework

PositiveArtificial Intelligence

A new framework named HeatTransFormer has been introduced to model interfacial heat conduction in semiconductor devices, addressing the challenges posed by steep temperature gradients at the interface between a finite chip layer and a semi-infinite substrate. This physics-guided Transformer architecture aims to enhance the transient thermal response without the excessive discretization required by conventional numerical solvers.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Mixed precision accumulation for neural network inference guided by componentwise forward error analysis

PositiveArtificial Intelligence

A new study proposes a mixed precision accumulation strategy for neural network inference, utilizing a componentwise forward error analysis to optimize error propagation in linear layers. This method suggests that the precision of each output component should be inversely proportional to the condition numbers of the weights and activation functions involved, potentially enhancing computational efficiency.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

NeutralArtificial Intelligence

Sparse Autoencoders (SAEs) have been analyzed to determine their effectiveness in uncovering meaningful concepts within neural network representations. A unified framework has been introduced, framing SAEs as solutions to a bilevel optimization problem, which highlights the inherent biases in concept detection based on the structural assumptions of different SAE architectures.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Verifying Closed-Loop Contractivity of Learning-Based Controllers via Partitioning

PositiveArtificial Intelligence

A recent study has introduced a method for verifying closed-loop contractivity in nonlinear control systems using neural networks for both controllers and contraction metrics. This approach employs interval analysis and a domain partitioning strategy to ensure that the dominant eigenvalue of a symmetric Metzler matrix remains nonpositive, which is essential for confirming contractivity. The method was validated on an inverted pendulum system, showcasing its effectiveness in training neural network controllers.

Read full article

via arXiv — cs.LG