CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
The recent introduction of Confidence-Aware Preference Optimization (CAPO) marks a significant advancement in the field of preference optimization for multilingual preferences. Traditional methods like Direct Preference Optimization (DPO) have been effective in English but often fail to generalize to other languages, leading to suboptimal performance. CAPO addresses this limitation by implementing a dynamic loss scaling mechanism based on the confidence in preference pairs, enhancing robustness against noisy comparisons. Empirical results demonstrate that CAPO outperforms existing baselines by at least 16% in reward accuracy and improves alignment by widening the gap between preferred and dispreferred responses across various languages. This development is crucial as it not only enhances the performance of large language models (LLMs) but also ensures that they can better cater to diverse linguistic contexts, ultimately leading to more accurate and human-like interactions.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about