arXiv:2511.08419v1 Announce Type: cross 
Abstract: Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution of the state variables. The unpredictable evolution of state variables poses a significant challenge for meeting predefined constraints using various control methods. To address this, we present a new algorithm that computes safe policies to determine the safety level across a finite state set. This algorithm reduces the safety objective to the standard average reward Markov Decision Process (MDP) objective. This reduction enables us to use standard techniques, such as linear programs, to compute and analyze safe policies. We validate the proposed method numerically on the Double Integrator and the Inverted Pendulum systems. Results indicate that the average-reward MDPs solution is more comprehensive, converges faster, and offers higher quality compared to the minimum discounted-reward solution.

تم تقديم خوارزمية جديدة لضمان السلامة في أنظمة التحكم العشوائية، والتي تحسب سياسات آمنة بثقة عالية على الرغم من الضوضاء العشوائية. تبسط هذه الطريقة الهدف المتعلق بالسلامة إلى عملية قرار ماركوف (MDP) ذات مكافأة متوسطة، مما يسمح باستخدام تقنيات معروفة لتحليل السياسات. تظهر النتائج على أنظمة مثل المدمج المزدوج والعمود المقلوب أن هذا النهج أكثر فعالية وكفاءة مقارنة بالطرق التقليدية، مما يبرز أهميته في تعزيز السلامة في البيئات غير المؤكدة.

Se ha introducido un nuevo algoritmo para garantizar la seguridad en sistemas de control estocásticos, que calcula políticas seguras con alta confianza a pesar del ruido aleatorio. Este método simplifica el objetivo de seguridad a un Proceso de Decisión de Markov (MDP) de recompensa promedio, permitiendo el uso de técnicas establecidas para analizar políticas. La validación en sistemas como el Doble Integrador y el Péndulo Invertido muestra que este enfoque es más efectivo y eficiente que los métodos tradicionales, destacando su importancia para mejorar la seguridad en entornos inciertos.

Un nouvel algorithme pour garantir la sécurité dans les systèmes de contrôle stochastiques a été introduit, permettant de calculer des politiques sûres avec une grande confiance malgré le bruit aléatoire. Cette méthode simplifie l'objectif de sécurité à un processus de décision de Markov (MDP) à récompense moyenne, permettant l'utilisation de techniques établies pour analyser les politiques. La validation sur des systèmes tels que le Double Integrateur et le Pendule Inversé montre que cette approche est plus efficace et efficiente que les méthodes traditionnelles, soulignant son importance pour améliorer la sécurité dans des environnements incertains.

A new algorithm for ensuring safety in stochastic control systems has been introduced, which computes safe policies with high confidence despite random noise. This method simplifies the safety objective to an average reward Markov Decision Process (MDP), allowing for the use of established techniques to analyze policies. Validation on systems like the Double Integrator and Inverted Pendulum shows that this approach is more effective and efficient than traditional methods, highlighting its significance in enhancing safety in uncertain environments.

Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs

arXiv:1906.07172v5 Announce Type: replace-cross 
Abstract: Equivariant neural networks are a class of neural networks designed to preserve symmetries inherent in the data. In this paper, we introduce a general method for modifying a neural network to enforce equivariance, a process we refer to as equivarification. We further show that group convolutional neural networks (G-CNNs) arise as a special case of our framework.

Equivariant neural networks and equivarification

arXiv:2311.13745v4 Announce Type: replace-cross 
Abstract: Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a number of recent works have studied the iteration complexity of sampling, assuming access to an accurate diffusion model. In this work, we focus on understanding the sample complexity of training such a model; how many samples are needed to learn an accurate diffusion model using a sufficiently expressive neural network? Prior work showed bounds polynomial in the dimension, desired Total Variation error, and Wasserstein error. We show an exponential improvement in the dependence on Wasserstein error and depth, along with improved dependencies on other relevant parameters.

Improved Sample Complexity Bounds for Diffusion Model Training

arXiv:2410.22995v2 Announce Type: replace-cross 
Abstract: A hallmark of advanced artificial intelligence is the capacity to progress from passive visual perception to the strategic modification of visual information to facilitate complex reasoning. This advanced capability, however, remains critically underdeveloped in current Large Multi-modal Models (LMMs). The deficiency is often masked by evaluation metrics that prioritize final-answer accuracy, creating an illusion of competence where genuine reasoning is absent. Using the domain of geometric problem-solving as a precise instrument, we probe this issue through tasks that require constructing visual aids. To this end, we introduce \textbf{VisAidMath}, a challenging benchmark, and our novel Three-Layered Funnel Evaluation Framework. This framework moves beyond simple accuracy (ACCU) to scrutinize the generation of valid visual aids (PVA) and the soundness of subsequent reasoning steps (SPRS). Our extensive experiments on state-of-the-art models, including Doubao-Seed-1.6 and o4, reveal a profound ``Reasoning Illusion''. We observe that high surface-level accuracy conceals a catastrophic failure in the models' ability to produce valid visual aids or to reason from them. Our findings expose a fundamental schism between visual perception and logical deduction in modern LMMs. We host an evaluation platform at CodaBench for testing publicly. Homepage: https://nlp2ct.github.io/VisAidMathHomepage/ Evaluation: https://www.codabench.org/competitions/7634/

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning

arXiv:2508.01249v3 Announce Type: replace-cross 
Abstract: Large Language Model (LLM) agents offer a powerful new paradigm for solving various problems by combining natural language reasoning with the execution of external tools. However, their dynamic and non-transparent behavior introduces critical security risks, particularly in the presence of prompt injection attacks. In this work, we propose a novel insight that treats the agent runtime traces as structured programs with analyzable semantics. Thus, we present AgentArmor, a program analysis framework that converts agent traces into graph intermediate representation-based structured program dependency representations (e.g., CFG, DFG, and PDG) and enforces security policies via a type system. AgentArmor consists of three key components: (1) a graph constructor that reconstructs the agent's runtime traces as graph-based intermediate representations with control and data flow described within; (2) a property registry that attaches security-relevant metadata of interacted tools \& data, and (3) a type system that performs static inference and checking over the intermediate representation. By representing agent behavior as structured programs, AgentArmor enables program analysis for sensitive data flow, trust boundaries, and policy violations. We evaluate AgentArmor on the AgentDojo benchmark, the results show that AgentArmor can reduce the ASR to 3\%, with the utility drop only 1\%.

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

arXiv:2511.14691v1 Announce Type: cross 
Abstract: Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transformers. Transformers now underpin large language models (LLMs) such as GPT, but at the cost of massive training and inference energy, leading to a large carbon footprint. While brain attention emerges from neural circuits, Transformer attention relies on dot-product similarity to weight elements in the input sequence. Neuromorphic computing, especially spiking neural networks (SNNs), offers a brain-inspired path to energy-efficient intelligence. Despite recent work on attention-based spiking Transformers, the core attention layer remains non-neuromorphic. Current spiking attention (i) relies on dot-product or element-wise similarity suited to floating-point operations, not event-driven spikes; (ii) keeps attention matrices that suffer from the von Neumann bottleneck, limiting in-memory computing; and (iii) still diverges from brain-like computation. To address these issues, we propose the Spiking STDP Transformer (S$^{2}$TDPT), a neuromorphic Transformer that implements self-attention through spike-timing-dependent plasticity (STDP), embedding query--key correlations in synaptic weights. STDP, a core mechanism of memory and learning in the brain and widely studied in neuromorphic devices, naturally enables in-memory computing and supports non-von Neumann hardware. On CIFAR-10 and CIFAR-100, our model achieves 94.35\% and 78.08\% accuracy with only four timesteps and 0.49 mJ on CIFAR-100, an 88.47\% energy reduction compared to a standard ANN Transformer. Grad-CAM shows that the model attends to semantically relevant regions, enhancing interpretability. Overall, S$^{2}$TDPT illustrates how biologically inspired attention can yield energy-efficient, hardware-friendly, and explainable neuromorphic models.

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

arXiv:2511.08955v2 Announce Type: replace-cross 
Abstract: Simulating microstructure evolution (MicroEvo) is vital for materials design but demands high numerical accuracy, efficiency, and physical fidelity. Although recent studies on deep learning (DL) offer a promising alternative to traditional solvers, the field lacks standardized benchmarks. Existing studies are flawed due to a lack of comparing specialized MicroEvo DL models with state-of-the-art spatio-temporal architectures, an overemphasis on numerical accuracy over physical fidelity, and a failure to analyze error propagation over time. To address these gaps, we introduce MicroEvoEval, the first comprehensive benchmark for image-based microstructure evolution prediction. We evaluate 14 models, encompassing both domain-specific and general-purpose architectures, across four representative MicroEvo tasks with datasets specifically structured for both short- and long-term assessment. Our multi-faceted evaluation framework goes beyond numerical accuracy and computational cost, incorporating a curated set of structure-preserving metrics to assess physical fidelity. Our extensive evaluations yield several key insights. Notably, we find that modern architectures (e.g., VMamba), not only achieve superior long-term stability and physical fidelity but also operate with an order-of-magnitude greater computational efficiency. The results highlight the necessity of holistic evaluation and identify these modern architectures as a highly promising direction for developing efficient and reliable surrogate models in data-driven materials science.

MicroEvoEval: A Systematic Evaluation Framework for Image-Based Microstructure Evolution Prediction

arXiv:2511.11402v1 Announce Type: new 
Abstract: Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.

يتناول المقال إطار عمل جديد قائم على التعلم المعزز (RL) باستخدام المحولات، يهدف إلى تحسين مسارات المركبات الفضائية عبر مراحل متعددة من المهمة، مثل الإطلاق وإدخال المدار. تتناول هذه الطريقة التحديات المتعلقة بتطوير سياسات تكيفية، والتي كانت تتطلب تقليديًا سياسات منفصلة لكل مرحلة، مما يعقد العمليات. من خلال استخدام بنية المحولات، يعزز الإطار تماسك الذاكرة والتكيف، مما يظهر أداءً قريبًا من الأمثل في المعايير ذات المرحلة الواحدة.

El artículo discute un nuevo marco de aprendizaje por refuerzo (RL) basado en transformadores destinado a optimizar las trayectorias de naves espaciales a través de múltiples fases de misión, como el lanzamiento y la inserción en órbita. Este enfoque aborda los desafíos del desarrollo de políticas adaptativas, que tradicionalmente requerían políticas separadas para cada fase, complicando así las operaciones. Al utilizar una arquitectura de transformador, el marco mejora la coherencia de la memoria y la adaptabilidad, demostrando un rendimiento casi óptimo en benchmarks de fase única.

L'article présente un nouveau cadre d'apprentissage par renforcement (RL) basé sur des transformateurs, visant à optimiser les trajectoires des engins spatiaux à travers plusieurs phases de mission, telles que le lancement et l'insertion en orbite. Cette approche répond aux défis du développement de politiques adaptatives, qui nécessitaient traditionnellement des politiques distinctes pour chaque phase, compliquant ainsi les opérations. En utilisant une architecture de transformateur, le cadre améliore la cohérence de la mémoire et l'adaptabilité, démontrant des performances quasi optimales da…

The article discusses a novel transformer-based reinforcement learning (RL) framework aimed at optimizing spacecraft trajectories across multiple mission phases, such as launch and orbit insertion. This approach addresses the challenges of adaptive policy development, which traditionally required separate policies for each phase, thus complicating operations. By utilizing a transformer architecture, the framework enhances memory coherence and adaptability, demonstrating near-optimal performance in single-phase benchmarks.

Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs

Was this article worth reading? Share it