MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
MENLO is a newly developed framework aimed at enhancing the evaluation of native-like quality in responses generated by large language models (LLMs) across 47 different languages. By creating a dataset of 6,423 human-annotated prompt-response pairs, MENLO assesses four quality dimensions with high inter-annotator agreement. The findings indicate that LLM judges, although benefiting from pairwise evaluations and structured rubrics, still do not match the performance of human annotators. The research suggests that fine-tuning LLMs through reinforcement learning, reward shaping, and multi-task learning can lead to significant improvements in their multilingual proficiency. However, discrepancies with human judgment persist, indicating that while progress is being made, further refinement is necessary. The release of the MENLO dataset and evaluation framework is expected to support ongoing research in scalable multilingual evaluation and preference alignment.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Incorporating Cognitive Biases into Reinforcement Learning for Financial Decision-Making
NeutralArtificial Intelligence
A recent study published on arXiv explores the integration of cognitive biases into reinforcement learning (RL) frameworks for financial decision-making, highlighting how human behavior influenced by biases like overconfidence and loss aversion can affect trading strategies. The research aims to demonstrate that RL models incorporating these biases can achieve better risk-adjusted returns compared to traditional models that assume rationality.
On the Sample Complexity of Differentially Private Policy Optimization
NeutralArtificial Intelligence
A recent study on differentially private policy optimization (DPPO) has been published, focusing on the sample complexity of policy optimization (PO) in reinforcement learning (RL). This research addresses privacy concerns in sensitive applications such as robotics and healthcare by formalizing a definition of differential privacy tailored to PO and analyzing the sample complexity of various PO algorithms under DP constraints.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about