Group-Aware Reinforcement Learning for Output Diversity in Large Language Models

arXiv — cs.LG•Tuesday, November 18, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

Researchers have developed Group
The introduction of GAPO is significant as it not only improves the diversity of LLM responses but also ensures accuracy across established benchmarks. This advancement could lead to more effective applications of LLMs in various tasks, enhancing their utility in real

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LG7 hours ago

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

PositiveArtificial Intelligence

The paper presents Group Relative Policy Optimization for Representation Model (GRPO-RM), a reinforcement learning method aimed at fine-tuning large language models (LLMs). It establishes a predefined output set to replace token sequence sampling, facilitating the generation of an output group essential for GRPO's optimization. A specialized reward function is also introduced to cater to representation models, with extensive experiments validating the method's effectiveness across various real-world datasets.

Read full article

via arXiv — cs.LG

arXiv — cs.LG7 hours ago

Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization

PositiveArtificial Intelligence

The paper introduces Group Turn Policy Optimization (GTPO), a novel reinforcement learning algorithm aimed at enhancing the training of Large Language Models (LLMs) for multi-turn Tool-Integrated Reasoning (TIR). GTPO addresses limitations of existing methods like Group Relative Policy Optimization (GRPO) by implementing turn-level reward assignments, return-based advantage estimation, and self-supervised reward shaping, which collectively improve learning signals for complex interactions.

Read full article

via arXiv — cs.LG

arXiv — cs.LG7 hours ago

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

PositiveArtificial Intelligence

The paper discusses the development of Foundational Automatic Reasoning Evaluators (FARE), which are generative evaluators designed to enhance evaluation processes in reasoning-centric domains. By fine-tuning these evaluators with a dataset of 2.5 million samples across five evaluation tasks, the study aims to improve scalability and performance during training and testing. The FARE models, with 8B and 20B parameters, challenge existing evaluators and set new benchmarks for open-source evaluation.

Read full article

via arXiv — cs.LG

VentureBeat — AI2 days ago

Meta’s DreamGym framework trains AI agents in a simulated world to cut reinforcement learning costs

PositiveArtificial Intelligence

Researchers at Meta, the University of Chicago, and UC Berkeley have developed DreamGym, a new framework that reduces the costs and complexities of training AI agents using reinforcement learning (RL). This framework simulates an RL environment, allowing agents to learn progressively by adjusting task difficulty. Experiments indicate that DreamGym enhances RL training efficiency, achieving results comparable to established algorithms while significantly lowering data collection costs.

Read full article

via VentureBeat — AI

arXiv — cs.LG2 days ago

Reasoning: From Reflection to Solution

PositiveArtificial Intelligence

The paper titled 'Reasoning: From Reflection to Solution' explores the concept of reasoning, a topic that has been the focus of philosophical inquiry for centuries. It questions whether modern large language models, which have shown superhuman performance on benchmarks like GSM8K and HumanEval, have truly learned to reason or simply pattern-match. The author proposes a definition of reasoning as iterative operator application in state spaces, leading to fixed points. This definition has significant implications for understanding the limitations of current systems and the development of genuine…

Read full article

via arXiv — cs.LG