Algorithm-Relative Trajectory Valuation in Policy Gradient Control
NeutralArtificial Intelligence
The study on algorithm-relative trajectory valuation in policy-gradient control, published on arXiv, investigates the dependence of trajectory value on the learning algorithm. It identifies a negative correlation between Persistence of Excitation (PE) and marginal value when using the REINFORCE algorithm, with a correlation coefficient of approximately -0.38. The research further explains a variance-mediated mechanism where higher PE leads to lower gradient variance for fixed energy, while increased variance near saddle points enhances escape probability, thereby raising marginal contributions. When stabilization methods, such as state whitening or Fisher preconditioning, are applied, this variance channel is neutralized, flipping the correlation to a positive value of around +0.29. The experiments conducted validate these mechanisms and demonstrate that decision-aligned scores can complement Shapley for pruning, while Shapley effectively identifies toxic subsets. This work underscores…
— via World Pulse Now AI Editorial System
