Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
NeutralArtificial Intelligence
arXiv:2507.08977v2 Announce Type: replace-cross
Abstract: Scientific modeling faces a tradeoff: mechanistic models provide scientific grounding but struggle with real-world complexity, while machine learning models achieve strong predictive performance but require large labeled datasets and are not interpretable. We introduce Simulation-Grounded Neural Networks (SGNNs), which use mechanistic simulations as training data for neural networks. SGNNs are pretrained on synthetic corpora spanning diverse model structures, parameter regimes, stochasticity, and observational artifacts. Simulation-grounded learning has been applied in multiple domains (e.g., surrogate models in physics, forecasting in epidemiology). We provide a unified framework for simulation-grounded learning and evaluated SGNNs across scientific disciplines and modeling tasks. We found that SGNNs were successful across domains: for prediction tasks, they nearly tripled COVID-19 forecasting skill versus CDC baselines, reduced chemical yield prediction error by one-third, and maintained accuracy in ecological forecasting where task-specific models failed. For inference tasks, SGNNs also accurately classified the source of information spread in simulated social networks and enabled supervised learning for unobservable targets, such as estimating COVID-19 transmissibility more accurately than traditional methods even in early outbreaks. Finally, SGNNs enable back-to-simulation attribution, a form of mechanistic interpretability. Back-to-simulation attribution matches real-world observations to the training simulations the model considers most similar, identifying which mechanistic processes the model believes best explain the observed data. By providing a unified framework for simulation-grounded learning, we establish when and how mechanistic simulations can serve as effective training data for robust, interpretable scientific inference.
— via World Pulse Now AI Editorial System