Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
The study on moral responses of large language models (LLMs) under persona role-play sheds light on how these models navigate moral judgments in social contexts. By employing the Moral Foundations Questionnaire (MFQ), researchers established a benchmark to quantify moral susceptibility and robustness. Results indicated that the Claude model is the most robust, significantly outperforming Gemini and GPT-4, while larger variants of models tend to be more susceptible to moral shifts. This suggests that the model family plays a crucial role in determining moral robustness, as it accounts for most of the variance observed. Furthermore, the positive correlation between robustness and susceptibility at the family level emphasizes the intricate dynamics of LLMs, which are increasingly relevant as these technologies integrate into societal frameworks.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Disney star debuts AI avatars of the dead
NeutralArtificial Intelligence
Disney star has introduced AI avatars representing deceased individuals, marking a significant development in the intersection of entertainment and artificial intelligence. This debut showcases the potential of AI technology to create lifelike representations of those who have passed away, raising questions about ethics and the future of digital personas. The event took place on November 17, 2025, and is expected to attract attention from both fans and industry experts alike.
LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models
PositiveArtificial Intelligence
The paper titled 'LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models' introduces a novel method for fine-tuning large language models (LLMs) in the financial sector. This method, called Layer-wise Adaptive Ensemble Tuning (LAET), selectively fine-tunes effective layers while freezing less critical ones, significantly reducing computational demands. The approach aims to enhance task-specific performance in financial NLP tasks, addressing accessibility issues faced by many organizations.
PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models
PositiveArtificial Intelligence
PustakAI is a framework designed to create interactive textbooks aligned with the NCERT curriculum for grades 6 to 8 in India. Utilizing Large Language Models (LLMs), it aims to enhance personalized learning experiences, particularly in areas with limited educational resources. The initiative addresses challenges in adapting LLMs to specific curricular content, ensuring accuracy and pedagogical relevance.
From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems
NeutralArtificial Intelligence
The article investigates the impact of task framing on the conviction of large language models (LLMs) in dialogue systems. It explores how LLMs assess tasks requiring social judgment, contrasting their performance on factual queries with conversational judgment tasks. The study reveals that reframing a task can significantly alter an LLM's judgment, particularly under conversational pressure, highlighting the complexities of LLM decision-making in social contexts.
Expert-Guided Prompting and Retrieval-Augmented Generation for Emergency Medical Service Question Answering
PositiveArtificial Intelligence
Large language models (LLMs) have shown potential in medical question answering but often lack the domain-specific expertise required in emergency medical services (EMS). The study introduces EMSQA, a dataset with 24.3K questions across 10 clinical areas and 4 certification levels, along with knowledge bases containing 40K documents and 2M tokens. It also presents Expert-CoT and ExpertRAG, strategies that enhance performance by integrating clinical context, resulting in improved accuracy and exam pass rates for EMS certification.
Can LLMs Detect Their Own Hallucinations?
PositiveArtificial Intelligence
Large language models (LLMs) are capable of generating fluent responses but can sometimes produce inaccurate information, referred to as hallucinations. A recent study investigates whether these models can recognize their own inaccuracies. The research formulates hallucination detection as a classification task and introduces a framework utilizing Chain-of-Thought (CoT) to extract knowledge from LLM parameters. Experimental results show that GPT-3.5 Turbo with CoT detected 58.2% of its own hallucinations, suggesting that LLMs can identify inaccuracies if they possess sufficient knowledge.
Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages:A Cross-Lingual Benchmark Across Cantonese, Japanese, and Turkish
NeutralArtificial Intelligence
A recent study evaluates the performance of seven advanced large language models (LLMs) on low-resource and morphologically rich languages, specifically Cantonese, Japanese, and Turkish. The research highlights the models' effectiveness in tasks such as open-domain question answering, document summarization, translation, and culturally grounded dialogue. Despite impressive results in high-resource languages, the study indicates that the effectiveness of LLMs in these less-studied languages remains underexplored.
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
PositiveArtificial Intelligence
The article presents Thinker, a hierarchical thinking model designed to enhance the reasoning capabilities of large language models (LLMs) through multi-turn interactions. Unlike previous methods that relied on end-to-end reinforcement learning without supervision, Thinker allows for a more structured reasoning process by breaking down complex problems into manageable sub-problems. Each sub-problem is represented in both natural language and logical functions, improving the coherence and rigor of the reasoning process.