Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

arXiv — cs.CVThursday, December 4, 2025 at 5:00:00 AM
  • A new approach called margin-aware preference optimization (MaPO) has been introduced to address the challenges of reference mismatch in aligning text-to-image diffusion models. This method allows for effective adaptation without relying on a reference model, which has been a limitation in existing preference alignment techniques like Direct Preference Optimization (DPO).
  • The significance of MaPO lies in its ability to optimize the likelihood margin between preferred and dispreferred outputs, facilitating better performance in tasks such as learning new artistic styles and personalizing outputs for specific objects.
  • This development reflects a broader trend in AI research, where methods are evolving to overcome limitations of traditional models, such as likelihood displacement and overfitting, thereby enhancing the robustness and adaptability of AI systems across diverse applications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA
PositiveArtificial Intelligence
The introduction of QA-LIGN represents a significant advancement in the alignment of large language models (LLMs) by decomposing scalar rewards into interpretable evaluations based on principles such as helpfulness and honesty. This structured approach allows models to learn through a draft, critique, and revise pipeline, leading to improved safety and performance metrics, including a reduction in attack success rates by up to 68.7% while maintaining a low false refusal rate.
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
PositiveArtificial Intelligence
A recent study has introduced Proximalized Preference Optimization (DPO), a refined approach to direct alignment methods for large language models (LLMs). This method addresses the issue of likelihood underdetermination, which has been observed to suppress absolute likelihoods of responses, leading to unexpected model behaviors. The reformulated DPO loss allows for a broader range of feedback types and reveals the underlying causes of these limitations.
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
PositiveArtificial Intelligence
The introduction of Diffusion-SDPO, a safeguarded update rule for Direct Preference Optimization (DPO) in text-to-image diffusion models, addresses the challenge of aligning generated images with human preferences. This new approach mitigates the issue of increasing reconstruction errors in less-preferred outputs, ensuring that the quality of preferred images remains stable during optimization.
Aligning Compound AI Systems via System-level DPO
PositiveArtificial Intelligence
A recent study introduces SysDPO, a framework designed to align compound AI systems, which consist of multiple interacting components like large language models (LLMs) and foundation models. This approach addresses the challenges of aligning these systems with human preferences, particularly due to non-differentiable interactions and the complexity of translating system-level preferences to component-level preferences.