Flow-GRPO: Training Flow Matching Models via Online RL

arXiv — cs.CV•Tuesday, October 28, 2025 at 4:00:00 AM

The introduction of Flow-GRPO marks a significant advancement in the field of flow matching models by integrating online policy gradient reinforcement learning. This innovative method employs a unique ODE-to-SDE conversion, allowing for enhanced statistical sampling and exploration in RL. This development is crucial as it opens new avenues for improving model accuracy and efficiency, potentially transforming how we approach complex systems in various applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

CoGrader

AI-powered essay grading for instant, accurate feedback and scores.

AI & DataTry the app

Gonogo by Trampoline

Streamline sales, RFPs, bids, and projects with AI-powered management tools.

AI & DataTry the app

Gleemo.ai

Streamline influencer campaigns with automated outreach and performance tracking.

Marketing & CommerceTry the app

Continue Readings

arXiv — cs.CVa day ago

HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning

PositiveArtificial Intelligence

HiCoGen introduces a Hierarchical Compositional Generative framework that enhances text-to-image generation in diffusion models by utilizing a Chain of Synthesis paradigm. This method decomposes complex prompts into semantic units, synthesizing them iteratively to improve compositional accuracy and visual context in generated images.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

PositiveArtificial Intelligence

OmniRefiner has been introduced as a detail-aware refinement framework aimed at improving reference-guided image generation. This framework addresses the limitations of current diffusion models, which often fail to retain fine-grained visual details during image refinement due to inherent VAE-based latent compression issues. By employing a two-stage correction process, OmniRefiner enhances pixel-level consistency and structural fidelity in generated images.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Learning Massively Multitask World Models for Continuous Control

PositiveArtificial Intelligence

A new benchmark has been introduced to advance research in reinforcement learning (RL) for continuous control, featuring 200 diverse tasks with language instructions and demonstrations. The study presents Newt, a language-conditioned multitask world model that is pretrained on demonstrations and optimized through online interaction across all tasks.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning

PositiveArtificial Intelligence

A new study has introduced differential smoothing as a method to mitigate diversity collapse in large language models (LLMs) during reinforcement learning (RL) fine-tuning. This approach provides a formal proof of the selection and reinforcement bias leading to reduced output variety and proposes a solution that enhances both correctness and diversity in model outputs.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

Quantum-Enhanced Reinforcement Learning for Accelerating Newton-Raphson Convergence with Ising Machines: A Case Study for Power Flow Analysis

PositiveArtificial Intelligence

A recent study has introduced a quantum-enhanced reinforcement learning (RL) approach to optimize the initialization of the Newton-Raphson method, which is critical for solving power flow equations. This method aims to improve convergence rates, particularly in scenarios with high renewable energy penetration where traditional methods struggle.

Read full article

via arXiv — cs.LG

arXiv — stat.MLa day ago

Optimization and Regularization Under Arbitrary Objectives

NeutralArtificial Intelligence

A recent study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, particularly through a two-block MCMC framework that alternates between Metropolis-Hastings and Gibbs sampling. The research highlights that the performance of these methods is significantly influenced by the sharpness of the likelihood form used, introducing a sharpness parameter to explore its effects on regularization and in-sample performance.

Read full article

via arXiv — stat.ML

arXiv — cs.LGa day ago

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

PositiveArtificial Intelligence

A new approach to combinatorial optimization has emerged with the introduction of Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent designed to enhance the efficiency of branch-and-bound (B&B) solvers in Mixed-Integer Linear Programming (MILP). This method aims to learn optimal branching strategies tailored to specific MILP distributions, moving beyond traditional static heuristics.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints

PositiveArtificial Intelligence

A recent study introduces a novel approach to latent safety filters that enhance Hamilton-Jacobi reachability, enabling safe visuomotor control under complex constraints. The research highlights the limitations of current methods that rely on discrete policy switching, which may compromise performance in high-dimensional environments.

Read full article

via arXiv — cs.LG