Systematic Reward Gap Optimization for Mitigating VLM Hallucinations

arXiv — cs.CL•Tuesday, November 25, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A novel framework called Topic-level Preference Rewriting (TPR) has been introduced to systematically optimize reward gaps in Vision Language Models (VLMs), addressing the challenge of hallucinations in these models. TPR aims to enhance the precision of reward gap configuration by selectively replacing semantic topics in VLM responses with resampled candidates, thereby improving the quality of generated outputs.
This development is significant as it enhances the effectiveness of Direct Preference Optimization (DPO) in VLMs, potentially leading to more reliable and accurate model outputs. By refining the reward gap configuration, TPR could mitigate the issues of hallucinations that have plagued VLMs, making them more robust for various applications.
The introduction of TPR aligns with ongoing efforts to improve VLM capabilities, particularly in areas such as spatial reasoning and object interaction. As VLMs continue to evolve, addressing the dual challenges of hallucinations and spatial intelligence becomes crucial, highlighting a broader trend in AI research focused on enhancing model performance through innovative optimization techniques.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Usercall

Conduct AI-moderated voice interviews to gather user feedback efficiently.

AI & DataTry the app

Scop.ai

Generate task-specific AI prompts tailored to your model's requirements.

AI & DataTry the app

MyCVBot

AI-powered resume optimization and tailored job applications for your career advancement.

AI & DataTry the app

Continue Readings

arXiv — cs.CVa day ago

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

PositiveArtificial Intelligence

A new framework named BideDPO has been proposed to enhance conditional image generation by addressing conflicts between text prompts and conditioning images. This method utilizes a bidirectionally decoupled approach to optimize the alignment of text and conditions, aiming to reduce gradient entanglement that hampers performance in existing models.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

TRANSPORTER: Transferring Visual Semantics from VLM Manifolds

PositiveArtificial Intelligence

The paper introduces TRANSPORTER, a model-independent approach designed to enhance video generation by transferring visual semantics from Vision Language Models (VLMs). This method addresses the challenge of understanding how VLMs derive their predictions, particularly in complex scenes with various objects and actions. TRANSPORTER generates videos that reflect changes in captions across diverse attributes and contexts.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation

PositiveArtificial Intelligence

A recent study highlights the limitations of Direct Preference Optimization (DPO) in diffusion models, particularly the issue of likelihood displacement, where the probabilities of preferred samples decrease during training. This phenomenon can lead to suboptimal performance in video generation tasks, which are increasingly relevant in AI applications.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions

NeutralArtificial Intelligence

Recent research indicates that Vision Language Models (VLMs) often exhibit biases learned during training, particularly when tasked with specific queries about visual properties, such as counting objects in images. A new synthetic benchmark dataset and evaluation framework have been developed to assess how counting performance varies with different image and prompt characteristics.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation

PositiveArtificial Intelligence

A new framework called Multi-Value Alignment (MVA) has been proposed to address the challenges of aligning large language models (LLMs) with multiple human values, particularly when these values conflict. This framework aims to improve the stability and efficiency of multi-value optimization, overcoming limitations seen in existing methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO).

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality

NeutralArtificial Intelligence

A recent study evaluated the alignment of large language models (LLMs) in infertility care, assessing four strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Group Relative Policy Optimization (GRPO), and In-Context Learning (ICL). The findings revealed that GRPO achieved the highest algorithmic accuracy, while clinicians preferred SFT for its clearer reasoning and therapeutic feasibility.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

BOP-ASK: Object-Interaction Reasoning for Vision-Language Models

PositiveArtificial Intelligence

A new dataset named BOP-ASK has been introduced to enhance object-interaction reasoning in Vision Language Models (VLMs). This dataset addresses the limitations of existing benchmarks that focus on high-level spatial relationships while neglecting fine-grained spatial understanding necessary for real-world applications. BOP-ASK includes over 150,000 images and 33 million questions, derived from detailed 6D object poses and annotations.

Read full article

via arXiv — cs.CV