arXiv:2502.05672v2 Announce Type: replace 
Abstract: This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers. These algorithms performed competitively across various benchmarks, from games to robotic tasks, but their theoretical understanding is limited to specific environmental conditions. This work initiates a theoretical foundation for algorithms that build on the broad paradigm of approaching reinforcement learning through supervised learning or sequence modeling. At the core of this investigation lies the analysis of conditions on the underlying environment, under which the algorithms can identify optimal solutions. We also assess whether emerging solutions remain stable in situations where the environment is subject to tiny levels of noise. Specifically, we study the continuity and asymptotic convergence of command-conditioned policies, values and the goal-reaching objective depending on the transition kernel of the underlying Markov Decision Process. We demonstrate that near-optimal behavior is achieved if the transition kernel is located in a sufficiently small neighborhood of a deterministic kernel. The mentioned quantities are continuous (with respect to a specific topology) at deterministic kernels, both asymptotically and after a finite number of learning cycles. The developed methods allow us to present the first explicit estimates on the convergence and stability of policies and values in terms of the underlying transition kernels. On the theoretical side we introduce a number of new concepts to reinforcement learning, like working in segment spaces, studying continuity in quotient topologies and the application of the fixed-point theory of dynamical systems. The theoretical study is accompanied by a detailed investigation of example environments and numerical experiments.

تدرس دراسة حديثة تقارب واستقرار ثلاثة خوارزميات متقدمة: التعلم المعزز المقلوب الإيبيسودي، التعلم الخاضع للهدف، والمحولات القرارية عبر الإنترنت. لقد أظهرت هذه الخوارزميات أداءً تنافسياً عبر مجموعة متنوعة من المعايير، لكن فهمها النظري محدود بشروط بيئية محددة. يعد فهم هذه الخوارزميات أمرًا بالغ الأهمية لأنها تمثل خطوة مهمة في دمج التعلم المعزز مع تقنيات التعلم الخاضع للإشراف.

Un estudio reciente analiza la convergencia y estabilidad de tres algoritmos avanzados: el Aprendizaje por Refuerzo Inverso Episódico, el Aprendizaje Supervisado Condicionado por Objetivos y los Transformadores de Decisión en Línea. Estos algoritmos han mostrado un rendimiento competitivo en varios benchmarks, pero su comprensión teórica está limitada a condiciones ambientales específicas. Entender estos algoritmos es crucial, ya que representan un paso significativo en la fusión del aprendizaje por refuerzo con técnicas de aprendizaje supervisado.

Une étude récente analyse la convergence et la stabilité de trois algorithmes avancés : l'apprentissage par renforcement inversé épisodique, l'apprentissage supervisé conditionné par un objectif et les transformateurs de décision en ligne. Ces algorithmes ont montré des performances compétitives dans divers benchmarks, mais leurs fondements théoriques sont limités à des conditions environnementales spécifiques. Comprendre ces algorithmes est crucial car ils représentent une avancée significative dans la fusion de l'apprentissage par renforcement avec des techniques d'apprentissage supervisé.

A recent study analyzes the convergence and stability of three advanced algorithms: Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers. These algorithms have shown competitive performance across various benchmarks, but their theoretical foundations are limited to specific environmental conditions. Understanding these algorithms is crucial as they represent a significant step in merging reinforcement learning with supervised learning techniques.

On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers

The SanDisk ExtremeFit USB-C flash drive is barely three grams, but offers 1TB of external storage and impressive speeds.

I refused to believe this coin-sized gadget was a storage drive, until I tried it for myself

The Tabwee T60 Pro is a strong value pick for Android tablet users. Here's how.

Yes, there exists $200 Android tablets that are actually worth the money - this one proves it

We tested the best tablets from brands like Apple, Samsung, and OnePlus. These are our top picks, plus tablet deals you can find for Black Friday.

The best tablets of 2025: Lab-tested recommendations

GPT-5.1-Codex-Max is ready to take on your next massive coding job. Here's what's new.

OpenAI's Codex Max solves one of my biggest AI coding annoyances - and it's a lot faster

Georgia Tech researchers are using AI to quickly train exoskeleton devices, making it much more practical to develop, improve, and ultimately deploy wearable robots for people with impaired mobility.

Real-world helper exoskeletons come closer to reality with AI training

Lenovo's 16-inch Legion Pro 7i is a high-performance beast with a 240Hz OLED display. It's currently over 30% off ahead of Black Friday.

My favorite gaming laptop of 2025 is $1,150 off in this pop-up early Black Friday deal

arXiv:2511.13911v1 Announce Type: cross 
Abstract: Despite recent progress in predicting biomarker trajectories from real clinical data, uncertainty in the predictions poses high-stakes risks (e.g., misdiagnosis) that limit their clinical deployment. To enable safe and reliable use of such predictions in healthcare, we introduce a conformal method for uncertainty-calibrated prediction of biomarker trajectories resulting from randomly-timed clinical visits of patients. Our approach extends conformal prediction to the setting of randomly-timed trajectories via a novel nonconformity score that produces prediction bands guaranteed to cover the unknown biomarker trajectories with a user-prescribed probability. We apply our method across a wide range of standard and state-of-the-art predictors for two well-established brain biomarkers of Alzheimer's disease, using neuroimaging data from real clinical studies. We observe that our conformal prediction bands consistently achieve the desired coverage, while also being tighter than baseline prediction bands. To further account for population heterogeneity, we develop group-conditional conformal bands and test their coverage guarantees across various demographic and clinically relevant subpopulations. Moreover, we demonstrate the clinical utility of our conformal bands in identifying subjects at high risk of progression to Alzheimer's disease. Specifically, we introduce an uncertainty-calibrated risk score that enables the identification of 17.5% more high-risk subjects compared to standard risk scores, highlighting the value of uncertainty calibration in real-world clinical decision making. Our code is available at github.com/vatass/ConformalBiomarkerTrajectories.

تم تطوير طريقة جديدة للتنبؤ بمسارات المؤشرات الحيوية في مرض الزهايمر، حيث تعالج عدم اليقين في التنبؤات السريرية. تتيح هذه التقنية التنبؤية المتوافقة إجراء تنبؤات موثوقة من الزيارات السريرية التي تتم في أوقات عشوائية، مما يضمن أن تغطي نطاقات التنبؤ المسارات الفعلية للمؤشرات الحيوية باحتمالية محددة. تم اختبار النهج باستخدام بيانات التصوير العصبي من دراسات سريرية معروفة، مما يوضح فعاليته في تعزيز أمان التطبيقات السريرية.

Se ha desarrollado un nuevo método para predecir las trayectorias de biomarcadores en la enfermedad de Alzheimer, abordando la incertidumbre en las predicciones clínicas. Esta técnica de predicción conforme permite pronósticos fiables a partir de visitas clínicas programadas aleatoriamente, asegurando que las bandas de predicción cubran las trayectorias reales de los biomarcadores con una probabilidad especificada. El enfoque se probó utilizando datos de neuroimagen de estudios clínicos establecidos, demostrando su eficacia para mejorar la seguridad de las aplicaciones clínicas.

Une nouvelle méthode de prédiction des trajectoires de biomarqueurs dans la maladie d'Alzheimer a été développée, abordant l'incertitude des prédictions cliniques. Cette technique de prédiction conforme permet des prévisions fiables à partir de visites cliniques chronométrées de manière aléatoire, garantissant que les bandes de prédiction couvrent les trajectoires réelles des biomarqueurs avec une probabilité spécifiée. L'approche a été testée à l'aide de données d'imagerie cérébrale provenant d'études cliniques établies, démontrant son efficacité pour améliorer la sécurité des applications cl…

A new method for predicting biomarker trajectories in Alzheimer's disease has been developed, addressing the uncertainty in clinical predictions. This conformal prediction technique allows for reliable forecasts from randomly-timed clinical visits, ensuring that the prediction bands cover the actual biomarker trajectories with a specified probability. The approach was tested using neuroimaging data from established clinical studies, demonstrating its effectiveness in enhancing the safety of clinical applications.

Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands

arXiv:2511.14441v1 Announce Type: cross 
Abstract: To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ in bivariate models, that is, distinguish the two graphs $X \to Y$ and $Y \to X$. Location-scale noise models (LSNMs), in which the effect $Y$ is modeled based on the cause $X$ as $Y = f(X) + g(X)N$, form a flexible class of models that is general and identifiable in most cases. Estimating these models for arbitrary noise terms $N$, however, is challenging. Therefore, practical estimators are typically restricted to symmetric distributions, such as the normal distribution. As we showcase in this paper, when $N$ is a skewed random variable, which is likely in real-world domains, the reliability of these approaches decreases. To approach this limitation, we propose SkewD, a likelihood-based algorithm for bivariate causal discovery under LSNMs with skewed noise distributions. SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise. For parameter estimation, we employ a combination of a heuristic search and an expectation conditional maximization algorithm. We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets. Throughout our experiments, SkewD exhibits a strong performance and, in comparison to prior work, remains robust under high skewness.

تناقش الورقة الحاجة إلى التمييز بين السبب والأثر في النماذج الثنائية المتغيرة لاكتشاف الأسباب، وخاصة ضمن نماذج الضوضاء ذات الموقع والمقياس (LSNMs). تسلط الضوء على التحدي المتمثل في تقدير هذه النماذج عندما تكون مصطلحات الضوضاء مائلة، وهو أمر شائع في السيناريوهات الواقعية. يقدم المؤلفون SkewD، وهو خوارزمية قائمة على الاحتمالية مصممة لتحسين اكتشاف الأسباب الثنائية في هذه الظروف، مع معالجة القيود التي تواجهها المقدرات التقليدية التي تعتمد عادةً على التوزيعات المتماثلة.

El artículo discute la necesidad de distinguir entre causa y efecto en modelos bivariados para el descubrimiento causal, específicamente dentro de los modelos de ruido de localización-escala (LSNMs). Destaca el desafío de estimar estos modelos cuando los términos de ruido son sesgados, lo cual es común en escenarios del mundo real. Los autores presentan SkewD, un algoritmo basado en la verosimilitud diseñado para mejorar el descubrimiento causal bivariado en estas condiciones, abordando las limitaciones de los estimadores tradicionales que generalmente dependen de distribuciones simétricas.

Cet article aborde la nécessité de distinguer la cause de l'effet dans les modèles bivariés pour la découverte causale, en particulier dans les modèles de bruit de localisation-échelle (LSNMs). Il met en évidence le défi d'estimer ces modèles lorsque les termes de bruit sont asymétriques, ce qui est courant dans les scénarios du monde réel. Les auteurs introduisent SkewD, un algorithme basé sur la vraisemblance conçu pour améliorer la découverte causale bivariée dans ces conditions, en répondant aux limitations des estimateurs traditionnels qui reposent généralement sur des distributions symét…

The paper discusses the need to distinguish between cause and effect in bivariate models for causal discovery, specifically within location-scale noise models (LSNMs). It highlights the challenge of estimating these models when noise terms are skewed, which is common in real-world scenarios. The authors introduce SkewD, a likelihood-based algorithm designed to improve bivariate causal discovery under these conditions, addressing the limitations of traditional estimators that typically rely on symmetric distributions.

Skewness-Robust Causal Discovery in Location-Scale Noise Models

arXiv:2412.09498v3 Announce Type: replace-cross 
Abstract: Gradient descent is one of the most widely used iterative algorithms in modern statistical learning. However, its precise algorithmic dynamics in high-dimensional settings remain only partially understood, which has limited its broader potential for statistical inference applications.
  This paper provides a precise, non-asymptotic joint distributional characterization of gradient descent iterates and their debiased statistics in a broad class of empirical risk minimization problems, in the so-called mean-field regime where the sample size is proportional to the signal dimension. Our non-asymptotic state evolution theory holds for both general non-convex loss functions and non-Gaussian data, and reveals the central role of two Onsager correction matrices that precisely characterize the non-trivial dependence among all gradient descent iterates in the mean-field regime.
  Leveraging the joint state evolution characterization, we show that the gradient descent iterate retrieves approximate normality after a debiasing correction via a linear combination of all past iterates, where the debiasing coefficients can be estimated by the proposed gradient descent inference algorithm. This leads to a new algorithmic statistical inference framework based on debiased gradient descent, which (i) applies to a broad class of models with both convex and non-convex losses, (ii) remains valid at each iteration without requiring algorithmic convergence, and (iii) exhibits a certain robustness to possible model misspecification. As a by-product, our framework also provides algorithmic estimates of the generalization error at each iteration. As canonical examples, we demonstrate our theory and inference methods in the single-index regression model and a generalized logistic regression model, where the natural loss functions may exhibit arbitrarily non-convex landscapes.

يُعتبر الانحدار التدرجي خوارزمية تكرارية تُستخدم على نطاق واسع في التعلم الإحصائي، إلا أن دينامياتها في الإعدادات عالية الأبعاد لا تزال غير مفهومة تمامًا. يقدم هذا البحث توصيفًا مشتركًا غير حدّي لتكرارات الانحدار التدرجي وإحصائياتها المُعالجة في مشاكل تقليل المخاطر التجريبية، لا سيما في نظام الحقل المتوسط. تسلط النتائج الضوء على الدور المركزي لمصفوفات تصحيح أونساجر في فهم الاعتماديات بين جميع تكرارات الانحدار التدرجي، القابلة للتطبيق على كل من دوال الخسارة غير المحدبة والبيانات غير الغاوسية.

El descenso de gradiente es un algoritmo iterativo ampliamente utilizado en el aprendizaje estadístico, pero su dinámica en entornos de alta dimensión no se comprende completamente. Este artículo proporciona una caracterización no asintótica de los iterados del descenso de gradiente y sus estadísticas desbiaseadas en problemas de minimización del riesgo empírico, especialmente en el régimen de campo medio. Los hallazgos destacan el papel central de las matrices de corrección de Onsager en la comprensión de las dependencias entre los iterados del descenso de gradiente, aplicables tanto a funcio…

La descente de gradient est un algorithme itératif largement utilisé dans l'apprentissage statistique, mais sa dynamique dans des contextes de haute dimension n'est pas entièrement comprise. Cet article présente une caractérisation non asymptotique des itérations de la descente de gradient et de leurs statistiques débiaisées dans des problèmes de minimisation du risque empirique, en particulier dans le régime de champ moyen. Les résultats mettent en évidence l'importance des matrices de correction d'Onsager pour comprendre les dépendances entre les itérations de la descente de gradient, applic…

Gradient descent is a widely used iterative algorithm in statistical learning, yet its dynamics in high-dimensional settings are not fully understood. This paper presents a non-asymptotic characterization of gradient descent iterates and their debiased statistics in empirical risk minimization problems, particularly in the mean-field regime. The findings highlight the significance of Onsager correction matrices in understanding the dependencies among gradient descent iterates, applicable to both non-convex loss functions and non-Gaussian data.

On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers

Was this article worth reading? Share it