arXiv:2510.21473v1 Announce Type: new 
Abstract: Recent advances in diffusion language models (DLMs) have presented a promising alternative to traditional autoregressive large language models (LLMs). However, DLMs still lag behind LLMs in reasoning performance, especially as the number of denoising steps decreases. Our analysis reveals that this shortcoming arises primarily from the independent generation of masked tokens across denoising steps, which fails to capture the token correlation. In this paper, we define two types of token correlation: intra-sequence correlation and inter-sequence correlation, and demonstrate that enhancing these correlations improves reasoning performance. To this end, we propose a Multi-Reward Optimization (MRO) approach, which encourages DLMs to consider the token correlation during the denoising process. More specifically, our MRO approach leverages test-time scaling, reject sampling, and reinforcement learning to directly optimize the token correlation with multiple elaborate rewards. Additionally, we introduce group step and importance sampling strategies to mitigate reward variance and enhance sampling efficiency. Through extensive experiments, we demonstrate that MRO not only improves reasoning performance but also achieves significant sampling speedups while maintaining high performance on reasoning benchmarks.

تسلط الأبحاث الحديثة الضوء على إمكانيات نماذج اللغة القائمة على الانتشار (DLM) كبديل قوي لنماذج اللغة التلقائية التقليدية (LLM). على الرغم من أن DLM قد أظهرت وعودًا، إلا أنها لا تزال تواجه صعوبات في قدرات التفكير، خاصة عندما يتم تقليل عدد خطوات إزالة الضوضاء. تحدد هذه الدراسة أن المشكلة تنبع من توليد الرموز المmasked بشكل مستقل، مما يتجاهل العلاقات المهمة بين الرموز. من خلال معالجة هذه القيود من خلال تحسين المكافآت المتعددة، يمكن أن تعزز النتائج بشكل كبير من قدرات التفكير لدى DLM، مما يجعلها أكثر تنافسية في مجال معالجة اللغة الطبيعية.

Investigaciones recientes destacan el potencial de los modelos de lenguaje de difusión (DLM) como una alternativa sólida a los modelos de lenguaje autoregresivos tradicionales (LLM). Aunque los DLM han mostrado promesas, aún luchan con las capacidades de razonamiento, especialmente cuando se reduce el número de pasos de desruido. Este estudio identifica que el problema proviene de la generación independiente de tokens enmascarados, lo que pasa por alto las correlaciones importantes entre los tokens. Al abordar esta limitación a través de la optimización de múltiples recompensas, los hallazgos podrían mejorar significativamente las habilidades de razonamiento de los DLM, haciéndolos más competitivos en el campo del procesamiento del lenguaje natural.

Des recherches récentes mettent en lumière le potentiel des modèles de langage de diffusion (DLM) comme une alternative solide aux modèles de langage autoregressifs traditionnels (LLM). Bien que les DLM aient montré des promesses, ils ont encore des difficultés avec les capacités de raisonnement, en particulier lorsque le nombre d'étapes de débruitage est réduit. Cette étude identifie que le problème provient de la génération indépendante de tokens masqués, ce qui néglige les corrélations importantes entre les tokens. En abordant cette limitation grâce à l'optimisation multi-récompense, les résultats pourraient améliorer considérablement les capacités de raisonnement des DLM, les rendant plus compétitifs dans le domaine du traitement du langage naturel.

Recent research highlights the potential of diffusion language models (DLMs) as a strong alternative to traditional autoregressive large language models (LLMs). While DLMs have shown promise, they still struggle with reasoning capabilities, particularly when the number of denoising steps is reduced. This study identifies that the issue stems from the independent generation of masked tokens, which overlooks the important correlations between tokens. By addressing this limitation through multi-reward optimization, the findings could significantly enhance the reasoning abilities of DLMs, making them more competitive in the field of natural language processing.

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

arXiv:2502.03930v4 Announce Type: replace-cross 
Abstract: Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes. In this work, we propose Diffusion Transformer Autoregressive Modeling (DiTAR), a patch-based autoregressive framework combining a language model with a diffusion transformer. This approach significantly enhances the efficacy of autoregressive models for continuous tokens and reduces computational demands. DiTAR utilizes a divide-and-conquer strategy for patch generation, where the language model processes aggregated patch embeddings and the diffusion transformer subsequently generates the next patch based on the output of the language model. For inference, we propose defining temperature as the time point of introducing noise during the reverse diffusion ODE to balance diversity and determinism. We also show in the extensive scaling analysis that DiTAR has superb scalability. In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.

يمثل تقديم DiTAR، أو نمذجة التحويل التلقائي بواسطة نموذج الانتشار، تقدمًا كبيرًا في مجال توليد الكلام من خلال دمج نموذج لغة مع نموذج انتشار. يتناول هذا الإطار المبتكر التحديات الحسابية التي واجهتها النماذج التلقائية السابقة، مما يعزز فعاليتها في توليد رموز الكلام المستمرة.

La introducción de DiTAR, o Modelado Autoregresivo de Transformador de Difusión, representa un avance significativo en el campo de la generación de voz al integrar un modelo de lenguaje con un transformador de difusión. Este marco innovador aborda los desafíos computacionales que enfrentaban los modelos autoregresivos anteriores, mejorando su eficiencia para la generación de tokens de voz continua.

L'introduction de DiTAR, ou Modélisation Autoregressive par Transformateur de Diffusion, représente une avancée significative dans le domaine de la génération de la parole en intégrant un modèle de langage avec un transformateur de diffusion. Ce cadre innovant répond aux défis computationnels rencontrés par les modèles autoregressifs précédents, améliorant leur efficacité pour la génération de tokens de parole continue.

The introduction of DiTAR, or Diffusion Transformer Autoregressive Modeling, represents a significant advancement in the field of speech generation by integrating a language model with a diffusion transformer. This innovative framework addresses the computational challenges faced by previous autoregressive models, enhancing their efficiency for continuous speech token generation.

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

arXiv:2508.12970v2 Announce Type: replace-cross 
Abstract: The autoregressive time series model is a popular second-order stationary process, modeling a wide range of real phenomena. However, in applications, autoregressive signals are often corrupted by additive noise. Further, the autoregressive process and the corruptive noise may be highly impulsive, stemming from an infinite-variance distribution. The model estimation techniques that account for additional noise tend to show reduced efficacy when there is very strong noise present in the data, especially when the noise is heavy-tailed. In this paper, we propose a novel self-supervised learning method to denoise the additive noise-corrupted autoregressive model. Our approach is motivated by recent work in computer vision and does not require full knowledge of the noise distribution. We use the proposed method to recover exemplary finite- and infinite-variance autoregressive signals, namely, Gaussian and alpha-stable distributed signals, respectively, from their noise-corrupted versions. The simulation study conducted on both synthetic and semi-synthetic data demonstrates strong denoising performance of our method compared to several baseline methods, particularly when the corruption is significant and impulsive in nature. Finally, we apply the presented methodology to forecast the pure autoregressive signal from the noise-corrupted data.

تم اقتراح طريقة جديدة للتعلم الذاتي المراقب لإزالة الضوضاء من النماذج الذاتية الانحدار المتأثرة بالضوضاء المضافة، حيث تتناول كل من حالات التباين المحدود وغير المحدود. يستند هذا النهج إلى رؤى من رؤية الكمبيوتر ولا يتطلب معرفة كاملة بتوزيع الضوضاء، مما يعزز استعادة الإشارات مثل التوزيعات الغاوسية والمستقرة ألفا.

Se ha propuesto un nuevo método de aprendizaje auto-supervisado para eliminar el ruido de los modelos autorregresivos afectados por ruido aditivo, abordando tanto los casos de varianza finita como infinita. Este enfoque se basa en conocimientos de la visión por computadora y no requiere un conocimiento completo de la distribución del ruido, mejorando la recuperación de señales como las distribuciones gaussiana y alfa-estable.

Une nouvelle méthode d'apprentissage auto-supervisé a été proposée pour débruiter les modèles autorégressifs affectés par du bruit additif, abordant à la fois les cas de variance finie et infinie. Cette approche s'inspire de la vision par ordinateur et ne nécessite pas une connaissance complète de la distribution du bruit, améliorant ainsi la récupération de signaux tels que les distributions gaussiennes et alpha-stables.

A novel self-supervised learning method has been proposed for denoising autoregressive models that are affected by additive noise, addressing both finite and infinite variance cases. This approach leverages insights from computer vision and does not require complete knowledge of the noise distribution, enhancing the recovery of signals such as Gaussian and alpha-stable distributions.

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Was this article worth reading? Share it

LucidQuery AI

Airparser

Magicley AI