BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models

arXiv — cs.CL•Tuesday, November 25, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study titled 'BiasJailbreak' investigates ethical biases and jailbreak vulnerabilities in large language models (LLMs), particularly focusing on the GPT-4o model. The research highlights how these biases can be exploited to generate harmful content, revealing a significant disparity in jailbreak success rates based on the demographic context of keywords used in prompts.
This development is crucial as it underscores the potential safety risks associated with LLMs, emphasizing the need for improved safety alignments and ethical considerations in AI development. The findings call for urgent attention to mitigate the risks posed by biased outputs.
The issues raised by the study reflect broader concerns in the AI community regarding the reliability and ethical implications of LLMs. As these models are increasingly utilized in various applications, the need for frameworks to evaluate their performance and address inherent biases becomes paramount, highlighting ongoing debates about trustworthiness and accountability in AI technologies.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

PaletteBrain

Instantly access ChatGPT in any macOS app with a simple keyboard shortcut.

AI & DataTry the app

Jazzberry

AI agent that automatically finds and reports bugs in your code.

Business & ProductivityTry the app

GPTBox

ChatGPT and auto-type in any Windows app for instant AI assistance.

AI & DataTry the app

Continue Readings

arXiv — cs.CL14 hours ago

TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation

PositiveArtificial Intelligence

A new approach called TRIM has been introduced to address the high inference costs associated with Large Language Models (LLMs). This method optimizes language generation by allowing LLMs to omit semantically irrelevant words during inference, followed by reconstruction of the output using a smaller, cost-effective model. Experimental results indicate an average token saving of 19.4% for GPT-4o with minimal impact on evaluation metrics.

Read full article

via arXiv — cs.CL

arXiv — cs.CL14 hours ago

Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models

NeutralArtificial Intelligence

A recent study evaluated the safety of four leading multimodal large language models (MLLMs) under adversarial conditions, revealing significant differences in their vulnerability to harmful prompts. The models tested included GPT-4o, Claude Sonnet 3.5, Pixtral 12B, and Qwen VL Plus, with Pixtral 12B showing a harmful response rate of approximately 62%, while Claude Sonnet 3.5 demonstrated the highest resistance at around 10%.

Read full article

via arXiv — cs.CL

arXiv — cs.CL14 hours ago

OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas

PositiveArtificial Intelligence

OmniStruct has been introduced as a comprehensive benchmark to evaluate the capabilities of Large Language Models (LLMs) in generating structured outputs across various tasks, including information extraction and table generation. This initiative aims to address the uncertainty regarding LLMs' performance in text-to-structure tasks, which are essential for diverse applications.

Read full article

via arXiv — cs.CL

arXiv — cs.CL14 hours ago

Toward Trustworthy Difficulty Assessments: Large Language Models as Judges in Programming and Synthetic Tasks

NegativeArtificial Intelligence

Large Language Models (LLMs) like GPT-4o have been evaluated for their effectiveness in assessing the difficulty of programming tasks, specifically through a comparison with a Light-GBM ensemble model. The study revealed that Light-GBM achieved 86% accuracy in classifying LeetCode problems, while GPT-4o only reached 37.75%, indicating significant limitations in LLMs for structured assessments.

Read full article

via arXiv — cs.CL

arXiv — cs.CL14 hours ago

Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration

PositiveArtificial Intelligence

A new framework called BeMyEyes has been proposed to enhance the capabilities of Large Language Models (LLMs) by integrating them with Vision-Language Models (VLMs) through a multi-agent collaboration approach. This modular system aims to improve multimodal reasoning by allowing efficient VLMs to act as perceivers while powerful LLMs serve as reasoners, facilitating better interaction and understanding of complex data.

Read full article

via arXiv — cs.CL

arXiv — cs.CL14 hours ago

Bridging Symbolic Control and Neural Reasoning in LLM Agents: The Structured Cognitive Loop

PositiveArtificial Intelligence

A new architecture called Structured Cognitive Loop (SCL) has been introduced to address fundamental issues in large language model agents, such as entangled reasoning and memory volatility. SCL separates cognition into five distinct phases: Retrieval, Cognition, Control, Action, and Memory, while employing Soft Symbolic Control to enhance explainability and controllability. Empirical tests show SCL achieves zero policy violations and maintains decision traceability.

Read full article

via arXiv — cs.CL

arXiv — cs.CL14 hours ago

Lessons from Studying Two-Hop Latent Reasoning

NeutralArtificial Intelligence

Recent research has focused on the latent reasoning capabilities of large language models (LLMs), specifically through a study on two-hop question answering. The investigation revealed that LLMs, including Llama 3 and GPT-4o, struggle with this basic reasoning task without employing chain-of-thought (CoT) techniques, which are essential for complex agentic tasks.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study

PositiveArtificial Intelligence

A recent study evaluated the performance of various large language models (LLMs) in restoring diacritics in Romanian texts, highlighting the importance of automatic diacritic restoration for effective text processing in languages rich in diacritical marks. Models tested included OpenAI's GPT-3.5, GPT-4, and Google's Gemini 1.0 Pro, among others, with GPT-4o achieving notable accuracy in diacritic restoration.

Read full article

via arXiv — cs.CL