Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

arXiv — cs.CV•Thursday, December 4, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new approach called margin-aware preference optimization (MaPO) has been introduced to address the challenges of reference mismatch in aligning text-to-image diffusion models. This method allows for effective adaptation without relying on a reference model, which has been a limitation in existing preference alignment techniques like Direct Preference Optimization (DPO).
The significance of MaPO lies in its ability to optimize the likelihood margin between preferred and dispreferred outputs, facilitating better performance in tasks such as learning new artistic styles and personalizing outputs for specific objects.
This development reflects a broader trend in AI research, where methods are evolving to overcome limitations of traditional models, such as likelihood displacement and overfitting, thereby enhancing the robustness and adaptability of AI systems across diverse applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataTry the app

Augmeta

AI peers for collaborative problem-solving and enhanced team productivity.

AI & DataTry the app

Meteoria

Ensure your brand is accurately referenced and cited by AI models.

AI & DataTry the app

Continue Readings

arXiv — cs.CL14 hours ago

QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA

PositiveArtificial Intelligence

The introduction of QA-LIGN represents a significant advancement in the alignment of large language models (LLMs) by decomposing scalar rewards into interpretable evaluations based on principles such as helpfulness and honesty. This structured approach allows models to learn through a draft, critique, and revise pipeline, leading to improved safety and performance metrics, including a reduction in attack success rates by up to 68.7% while maintaining a low false refusal rate.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

PositiveArtificial Intelligence

A recent study has introduced Proximalized Preference Optimization (DPO), a refined approach to direct alignment methods for large language models (LLMs). This method addresses the issue of likelihood underdetermination, which has been observed to suppress absolute likelihoods of responses, leading to unexpected model behaviors. The reformulated DPO loss allows for a broader range of feedback types and reveals the underlying causes of these limitations.

Read full article

via arXiv — cs.CL

arXiv — cs.CV3 days ago

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

PositiveArtificial Intelligence

The introduction of Diffusion-SDPO, a safeguarded update rule for Direct Preference Optimization (DPO) in text-to-image diffusion models, addresses the challenge of aligning generated images with human preferences. This new approach mitigates the issue of increasing reconstruction errors in less-preferred outputs, ensuring that the quality of preferred images remains stable during optimization.

Read full article

via arXiv — cs.CV

arXiv — cs.LG3 days ago

Aligning Compound AI Systems via System-level DPO

PositiveArtificial Intelligence

A recent study introduces SysDPO, a framework designed to align compound AI systems, which consist of multiple interacting components like large language models (LLMs) and foundation models. This approach addresses the challenges of aligning these systems with human preferences, particularly due to non-differentiable interactions and the complexity of translating system-level preferences to component-level preferences.

Read full article

via arXiv — cs.LG