Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale

arXiv — cs.CVThursday, November 27, 2025 at 5:00:00 AM
  • Recent advancements in Large Language Models (LLMs) have led to the development of a multi-reward Group Relative Policy Optimization (GRPO) framework aimed at enhancing the stability and prosody of single-codebook text-to-speech (TTS) systems. This framework integrates various rule-based rewards to optimize token generation policies, addressing issues such as unstable prosody and speaker drift that have plagued existing models.
  • The introduction of this GRPO framework is significant as it promises to improve the naturalness and intelligibility of TTS outputs, which are critical for applications in voice synthesis and conversational AI. By directly optimizing the token generation process, this approach could lead to more human-like speech synthesis, enhancing user experience across various platforms.
  • This development reflects ongoing efforts in the AI community to refine LLMs and TTS technologies, particularly in addressing disparities in model performance and ensuring fairness in output quality. The integration of reinforcement learning techniques and the focus on prosody align with broader trends in AI research, emphasizing the need for models that not only perform well but also exhibit stable and natural characteristics in their outputs.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Attention Projection Mixing and Exogenous Anchors
NeutralArtificial Intelligence
A new study introduces ExoFormer, a transformer model that utilizes exogenous anchor projections to enhance attention mechanisms, addressing the challenge of balancing stability and computational efficiency in deep learning architectures. This model demonstrates improved performance metrics, including a notable increase in downstream accuracy and data efficiency compared to traditional internal-anchor transformers.
SwiftMem: Fast Agentic Memory via Query-aware Indexing
PositiveArtificial Intelligence
SwiftMem has been introduced as a query-aware agentic memory system designed to enhance the efficiency of large language model (LLM) agents by enabling sub-linear retrieval through specialized indexing techniques. This system addresses the limitations of existing memory frameworks that rely on exhaustive retrieval methods, which can lead to significant latency issues as memory storage expands.
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
NeutralArtificial Intelligence
A new framework for user-oriented multi-turn dialogue generation has been developed, leveraging large reasoning models (LRMs) to create dynamic, domain-specific tools for task completion. This approach addresses the limitations of existing datasets that rely on static toolsets, enhancing the interaction quality in human-agent collaborations.
Detecting Mental Manipulation in Speech via Synthetic Multi-Speaker Dialogue
NeutralArtificial Intelligence
A new study has introduced the SPEECHMENTALMANIP benchmark, marking the first exploration of mental manipulation detection in spoken dialogues, utilizing synthetic multi-speaker audio to enhance a text-based dataset. This research highlights the challenges of identifying manipulative speech tactics, revealing that models trained on audio exhibit lower recall compared to text.
Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation
NeutralArtificial Intelligence
The recent development in financial compliance checking involves the introduction of Compliance-to-Code, which leverages Regulatory Technology and Large Language Models to automate the conversion of complex regulatory text into executable compliance logic. This innovation aims to address the challenges posed by intricate financial regulations, particularly in the context of Chinese-language regulations, where existing models have shown suboptimal performance due to various limitations.
RULERS: Locked Rubrics and Evidence-Anchored Scoring for Robust LLM Evaluation
PositiveArtificial Intelligence
The recent introduction of RULERS (Rubric Unification, Locking, and Evidence-anchored Robust Scoring) addresses challenges in evaluating large language models (LLMs) by transforming natural language rubrics into executable specifications, thereby enhancing the reliability of assessments.
QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models
NeutralArtificial Intelligence
The introduction of QuantEval marks a significant advancement in evaluating Large Language Models (LLMs) in financial quantitative tasks, focusing on knowledge-based question answering, mathematical reasoning, and strategy coding. This benchmark incorporates a backtesting framework that assesses the performance of model-generated strategies using financial metrics, providing a more realistic evaluation of LLM capabilities.
PrivGemo: Privacy-Preserving Dual-Tower Graph Retrieval for Empowering LLM Reasoning with Memory Augmentation
PositiveArtificial Intelligence
PrivGemo has been introduced as a privacy-preserving framework designed for knowledge graph (KG)-grounded reasoning, addressing the risks associated with using private KGs in large language models (LLMs). This dual-tower architecture maintains local knowledge while allowing remote reasoning through an anonymized interface, effectively mitigating semantic and structural exposure.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about